hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-10 08:32:09 +00:00

Author	SHA1	Message	Date
helix4u	aef04b2b53	docs(security): fix secret redaction default docs	2026-05-29 12:06:22 -07:00
teknium1	ddaf2f6712	style: restore PEP8 blank-line separation after dead-code removal The deletions in the salvaged commit left some top-level defs/classes separated by a single blank line. Restore the 2-blank-line separation.	2026-05-29 04:22:27 -07:00
kshitijk4poor	dc235e93cb	chore: remove dead code — 28 unused functions/classes across 16 files Vulture + per-symbol verification (whole-repo grep incl. tests, string literals, getattr, decorator/registry/argparse dispatch) confirmed each of these has zero callers anywhere — not reachable via any dynamic-dispatch path, not referenced by tests, not re-exported. Removed: - acp_adapter/tools.py: _build_patch_mode_content - agent/anthropic_adapter.py: read_claude_managed_key (diagnostics-only, never called) - agent/bedrock_adapter.py: get_bedrock_model_ids - agent/browser_registry.py: get_active_browser_provider - agent/chat_completion_helpers.py: _take_request_client (x2 nested closures, never invoked) - gateway/platforms/weixin.py: _rewrite_headers_for_weixin, _rewrite_table_block_for_weixin - hermes_cli/banner.py: _skin_branding - hermes_cli/debug.py: _delete_hint - hermes_cli/gateway.py: _setup_email, _setup_sms, _setup_yuanbao (platform keys absent from the _builtin_setup_fn dispatch dict; handled by the _setup_standard_platform fallback) - hermes_cli/kanban_db.py: set_max_runtime, active_run - hermes_cli/kanban_diagnostics.py: severity_of_highest, _latest_clean_event_ts - hermes_cli/main.py: _build_provider_choices, cmd_portal (portal subcommand is wired via portal_cli.add_parser, not this wrapper) - hermes_cli/model_switch.py: CustomAutoResult (orphaned by the switch_model() extraction) - hermes_cli/models.py: format_model_pricing_table, fetch_nous_account_tier - hermes_cli/portal_cli.py: _nous_portal_base_url - hermes_cli/proxy/server.py: handle_models_fallback (defined but never registered on the router) - tools/computer_use/cua_backend.py: _parse_element, _is_arm_mac - tools/file_operations.py: _get_safe_write_root (prod uses the imported agent.file_safety.get_safe_write_root directly) - tools/skills_tool.py: _load_category_description Also dropped two imports left unused by the removals: - tools/file_operations.py: get_safe_write_root alias - tools/computer_use/cua_backend.py: import platform Pure deletion: -551 LOC. No behavior change. Test files covering the edited modules pass (640/640); the broader suite's pre-existing/env-dependent failures reproduce unchanged on origin/main.	2026-05-29 04:22:27 -07:00
Teknium	e4b9532c18	feat: embedder environment-hint hook for the system prompt (#34574 ) * fix(security): block AWS SDK creds from subprocess env * fix(security): narrow Bedrock subprocess strip to inference bearer token only Scopes the AWS_SDK subprocess strip down from the full AWS credential chain to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed inference secret (analogous to OPENAI_API_KEY). The general AWS credential chain (AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE / config + role pointers) is intentionally left inheritable. Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator shell. Hard-blocklisting the general chain would (a) regress every user who runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users, since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be unrecoverable, because env_passthrough.py refuses to re-allow anything in _HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes the reported leak (opencode enumerating the Bedrock catalog off the leaked bearer token) with no capability loss. Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future SDK-cred provider is covered automatically. Tests: bearer token stripped + general chain preserved (no-regression guard), on both the runtime strip path and the blocklist-membership path. Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp> * feat: embedder environment-hint hook for the system prompt Adds HERMES_ENVIRONMENT_HINT env var (and config.yaml agent.environment_hint) so a host wrapping Hermes (sandbox runner, managed platform) can describe the runtime environment — proxy, credential handling, mount layout — in the system prompt's environment-hints block, without editing the identity slot (SOUL.md). Read once at prompt-build time, so it lands in the stable, cache-safe portion of the system prompt. Env var overrides the config key (build-time/container mechanism). Empty by default — no behavior change for existing installs. --------- Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>	2026-05-29 04:10:05 -07:00
firefly	21aeefe5fd	fix(code-exec): propagate agent-turn context into tool worker threads Worker threads that dispatch Hermes tools started with an empty contextvars.Context and no thread-local approval/sudo callbacks. Add tools/thread_context.propagate_context_to_thread factoring that capture/install/clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern), and refactor agent/tool_executor onto it so the security-critical logic lives in one audited place. Update the contextvar-propagation source guard for the new call shape. Refs #33057	2026-05-29 03:44:49 -07:00
kshitijk4poor	a22c250001	refactor(auth): remove vestigial Nous min_key_ttl/inference_auth_mode params After the legacy session-key path was removed, two parameters became dead surface on the Nous runtime-resolution chain: - min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through / telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state, _nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the now-deleted agent-key mint TTL and drives no behavior. - inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally identical; the value only fed _normalize_nous_inference_auth_mode validation and oauth trace output, never a branch. Removing inference_auth_mode orphaned its whole supporting cluster (NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES, _normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here. Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter, runtime_provider, web_server, main, auth_commands, setup) and pruned the matching test kwargs. Deleted two tests that exercised the removed surface (test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode). No behavior change: net -134 LOC of dead code.	2026-05-29 02:24:48 -07:00
kshitijk4poor	95cf8f9842	refactor(auth): drop weak JWT-shape fallback in auxiliary _nous_api_key The import-failure fallback returned any 3-segment token without scope/ expiry validation, a divergent reimplementation of the canonical _nous_invoke_jwt_is_usable check. The import is from the same module that provides resolve_nous_runtime_credentials, so a failure means the whole auxiliary Nous path is unavailable anyway; return "" instead so the caller falls through to the clear 'run: hermes auth add nous' guidance rather than handing back an unvalidated token.	2026-05-29 02:24:48 -07:00
Robin Fernandes	7e958dafc2	fix(auth): address Nous JWT fallback review	2026-05-29 02:24:48 -07:00
Robin Fernandes	41ff6e5937	refactor(auth): Disable Nous legacy session key fallback	2026-05-29 02:24:48 -07:00
teknium1	7427b9d581	fix(tool-search): scope bridge catalog + dispatch to the session's toolsets Tool Search read its catalog from the global registry (get_tool_definitions with no toolset scope = 'start with everything'), so a restricted-toolset session — subagent, kanban worker, curated gateway session — could: 1. tool_search the entire process registry, not just its granted tools, and 2. tool_call any registered plugin/MCP tool it was never given, because registry.dispatch() has no enabled_tools gate for non-execute_code tools. A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26 and successfully invoked an out-of-scope plugin tool via tool_call. Fix: - handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge dispatch scopes get_tool_definitions to them (also stops polluting the process-global _last_resolved_tool_names with out-of-scope tools, which leaked into execute_code's sandbox-tool fallback). - A defense-in-depth gate rejects any tool_call'd name not in the scoped deferrable catalog. - tool_executor's unwrap (both concurrent + sequential paths) enforces the same scope before dispatch, since it unwraps tool_call -> underlying name and bypasses the bridge branch. New _tool_search_scoped_names() helper, cached per-agent on registry generation + toolset scope. - New scoped_deferrable_names() helper in tool_search.py shared by both sites. Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped catalog, out-of-scope tool_call rejection, no global pollution, helper).	2026-05-29 02:04:12 -07:00
teknium1	369075dc95	feat(tools): progressive tool disclosure for MCP and plugin tools Adds Tool Search, a structured-tools progressive-disclosure layer that replaces MCP and non-core plugin tools in the model-visible tools array with three bridge tools (tool_search / tool_describe / tool_call) when the deferrable surface would consume more than a configurable percentage of the active model's context window. Core Hermes tools are never deferred. Default mode is 'auto' with a 10% context threshold, so small toolsets pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off' to disable. Design carefully reflects the OpenClaw production failure modes documented in the openclaw-tool-search-report: - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the 'tools silently missing from isolated cron turns' regression class (openclaw#84141) by construction: there is no code path that can drop a core tool. - Catalog is stateless across turns — rebuilt from the live tool-defs list on every assembly. No session-keyed Map that can drift out of sync with the registry. - tool_call unwraps the bridge call before any hook fires, so plugin pre/post hooks, guardrails, approval flows, and the activity feed all see the underlying tool name, not the bridge (addresses openclaw#85588 and the verbose-mode complaint on openclaw#79823). - The unwrap happens in both the parallel and sequential paths of agent/tool_executor.py and also in handle_function_call, so direct callers (sandboxed code, eval harnesses) are covered too. - Bridge tools cannot invoke each other (recursion guard) and cannot invoke core tools (those must be called directly). - Tools mode only — no JS-sandbox code-mode. Keeps the surface small. - Token estimation via cheap char/4 heuristic; precision isn't needed for the threshold decision. Files: - tools/tool_search.py — new module (BM25 retrieval, classification, threshold gate, bridge dispatch, unwrap helper). - tests/tools/test_tool_search.py — 35 tests including the OpenClaw #84141 regression guard. - model_tools.py — wires assembly into _compute_tool_definitions as the final step, adds skip_tool_search_assembly kwarg so the bridge can see the real catalog, dispatches the three bridge tools. - agent/tool_executor.py — unwraps tool_call in both parallel and sequential parsing loops so checkpointing, guardrails, plugin hooks, and tool-progress callbacks all observe the underlying tool name. - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block. - website/docs/user-guide/features/tool-search.md — user docs. Validation: - 35/35 new tests pass. - Existing tool/registry/model_tools/config/coercion/executor tests (82 + 74 + small adjacents) green. - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns 3 bridges, tool_search returns top 3 hits, tool_describe returns full schema, tool_call dispatches to the real underlying handler and the underlying result is what the model sees. - Reserved-name recursion guard verified live. - Core-tool refusal via tool_call verified live.	2026-05-29 02:04:12 -07:00
teknium1	73d73f1f0d	fix(codex): relax no-byte TTFB watchdog default from 12s to 120s The chatgpt.com/backend-api/codex endpoint can spend tens of seconds in backend admission / prompt prefill before emitting its first SSE event. The 12s no-byte TTFB cutoff aborted those still-valid streams, surfacing as 'Codex stream produced no bytes within 12s' through all retries (Discord reports). The OpenAI SDK's own streaming read timeout is 600s, so 12s was ~50x more aggressive than the transport layer would have tolerated. Default the no-byte cutoff to 120s and raise the openai-codex MAX cap default to 120s so it no longer clamps the new default back to 20s. Disabling stays available via HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0; the 25k-token auto-disable, _STRICT override, and post-first-event idle watchdog are unchanged. Co-authored-by: Gille <4317663+helix4u@users.noreply.github.com>	2026-05-29 02:02:25 -07:00
Teknium	db2ce9e7d2	fix(compression): fail open when lock subsystem is missing (version skew) (#34475 ) A process running mismatched module versions — conversation_compression.py re-imported with the post-#34351 lock code while a long-lived hermes_state.SessionDB stays bound to the pre-#34351 class in memory — has the try_acquire_compression_lock call site but not the method. The AttributeError it raises is NOT a sqlite3.Error, so the method's own fail-open guard never runs; the exception escapes to the outer agent loop, which prints the error and retries. Compression never succeeds, the token count never drops, and the loop re-triggers compaction forever (the 'API call #47/#48/#49 ... has no attribute try_acquire_compression_lock' spin a user hit after an update). Wrap the lock acquire so any unexpected exception fails OPEN: skip locking and proceed with compression. Skipping the lock risks a rare concurrent-compression session fork; an infinite no-progress loop that never compresses at all is strictly worse. The remediation hint in the log points at the real fix (restart / hermes update to resync the stale module). Also guards get_compression_lock_holder against the same skew. Adds a regression test simulating the version skew (real SessionDB wrapped so only the lock methods raise AttributeError) — asserts _compress_context proceeds and rotates instead of raising.	2026-05-29 01:32:32 -07:00
Teknium	c01a2df0a3	fix(auth): don't launch a text-mode browser inside the terminal for OAuth (#34479 ) OAuth auto-open only checked _is_remote_session() (SSH + cloud-shell env vars). On a headless/CLI-only Linux box with no GUI browser, none of those trip, so webbrowser.open() resolved to a console browser (w3m/lynx/links) and launched it INSIDE the terminal — hijacking the user's TTY with the xAI 'Account Management' login page instead of letting them copy the URL. Add _can_open_graphical_browser(): returns False when webbrowser would resolve to a known console browser, when $BROWSER names one, when there's no display server on Linux, or when no browser resolves at all. Gate all 5 OAuth auto-open callsites (xAI loopback, Spotify loopback, MiniMax device code, Anthropic, Google) on it in addition to the existing remote check. Headless boxes now print the URL / fall through to manual-paste instead.	2026-05-29 01:23:06 -07:00
moikapy	f6a2ba6261	fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error xAI returns HTTP 403 (not 401) with unauthenticated:bad-credentials when an OAuth2 access token has expired or is invalid. The existing _is_auth_error() only checked for 401 status codes, so these tokens were never refreshed and the 403 propagated as a generic permission denied error. Three fixes: 1. _is_auth_error: Recognize xAI's 403+bad-credentials pattern as an auth failure, triggering token refresh instead of silent failure. 2. _refresh_provider_credentials: Add xai-oauth branch with pool-level refresh (try_refresh_current with select to ensure current entry) then fallback to singleton resolver with force_refresh=True. 3. _recoverable_pool_provider: Map api.x.ai host to xai-oauth pool for auto-resolved providers, matching existing pattern for openai-codex/openrouter/nous/anthropic. Includes 14 tests covering the new detection logic, host mapping, and graceful fallback behavior. Signed-off-by: moikapy <moikapy@devmoi.com>	2026-05-29 00:28:02 -07:00
teknium1	592a4ffb6b	fix(kanban): close three blocked/iteration-exhausted handling gaps (#29747 ) Reporter diagnosed three independent gaps that together allowed infinite 'unblock → re-stuck' loops with no surfacing or escalation: GAP 1: `_rule_stuck_in_blocked` resets timer on any `commented`/`unblocked` event, so a task that cycles every few minutes is invisible to it regardless of how many times it cycles. Fix: new `_rule_block_unblock_cycling` rule (`hermes_cli/kanban_diagnostics.py`) that counts block→unblock cycles in a sliding window. Default threshold 3 cycles within 24h, configurable via `block_cycle_threshold` / `block_cycle_window_seconds`. Walks events in arrival order (event id) since multiple events can share the same `created_at` second. Fires as a warning with a CLI hint to inspect the block reasons. GAP 2: Iteration-budget-exhausted runs in kanban workers map to `kanban_block` (status=blocked, but a clean exit from the kernel's perspective). `_rule_repeated_failures` reads `consecutive_failures`, which `_record_task_failure` increments only for crashed/timed_out/ spawn_failed — `blocked` outcome bypasses the failure counter, so the `kanban.failure_limit` circuit breaker never trips on budget-exhaustion loops. Fix: `agent/conversation_loop.py` budget-exhaustion path now calls `_record_task_failure(outcome="timed_out")` instead of `kanban_block`. Budget exhaustion is genuinely a timeout-shaped failure (the task ran out of allowed iterations), so this is more honest semantics; it also routes through the unified failure counter, so repeated budget exhaustions trip the circuit breaker and the task auto-blocks with `gave_up` after `failure_limit` retries. GAP 3: `release_stale_claims` uses `_pid_alive(worker_pid)` only and ignores `last_heartbeat_at`. Reporter observed a 91-min run that held its claim with frozen heartbeat because the worker entered a logic loop with no tool calls — `_pid_alive` kept returning True so the claim was extended every 15 minutes indefinitely. Fix: heartbeat-stale backstop. If `last_heartbeat_at` is set AND older than `DEFAULT_CLAIM_HEARTBEAT_MAX_STALE_SECONDS` (default 1h), reclaim even if the PID is alive. NULL `last_heartbeat_at` preserves backward compatibility (no heartbeat yet = extend, as before). The reclaim event payload now includes a `heartbeat_stale` boolean so operators see why a live-PID worker was reclaimed. This works cleanly in concert with PR #34418 (#31752 runtime → heartbeat bridge): once `_touch_activity` keeps `last_heartbeat_at` fresh as a side effect of normal API traffic, the backstop only fires for genuinely wedged workers (no chunks, no tool results, no progress at all). Co-authored-by: baofuen <45189813+baofuen@users.noreply.github.com>	2026-05-29 00:13:29 -07:00
teknium1	40217aa194	fix(kanban): tell workers not to use clarify; route to kanban_block instead (#32167 ) Kanban workers run headless — no live user is on the other side of `clarify`, so the call times out (~120s default) and the task sits silently in `running` with no signal to the operator that input is needed. Reporter observed a real incident where a worker asked 'promote to production, or check staging first?' via clarify, the call timed out, the agent hallucinated a fallback, and the task sat 'running' for hours. Fix: explicit 'do not call clarify' bullet in two surfaces every kanban worker sees — - `agent/prompt_builder.py` KANBAN_GUIDANCE `## Do NOT` section (auto-injected into every dispatcher-spawned worker run). - `skills/devops/kanban-worker/SKILL.md` `## Do NOT` section (the bundled worker skill). Both point at the right pattern: `kanban_comment` (context) + `kanban_block` (decision needed) — the task surfaces on the board as blocked, the operator sees it, unblocks with their answer in a comment, and the worker respawns with the thread. Co-authored-by: kweiner <17778+kweiner@users.noreply.github.com>	2026-05-28 23:57:20 -07:00
Teknium	86a389fee2	fix(credential-pool): STATUS_DEAD for terminal OAuth failures (#32849 ) (#34412 ) When OpenAI Codex returns 401 token_invalidated or token_revoked, the credential is broken upstream — retrying after a TTL cooldown cannot fix it. The existing code treated every 401/429 the same way: STATUS_EXHAUSTED with a TTL cooldown (5 min for 401, 1 hour for 429). After the TTL elapsed, the broken credential re-entered rotation and immediately failed again with the same 401, surfacing as 'Failed to generate context summary' on every context-compression cycle. Reporter observed 7 separate 401 token_invalidated failures from the same revoked credential in a single day; the only workaround was removing it manually via 'hermes auth'. Add a STATUS_DEAD terminal state. Only 401 responses whose error.code/reason matches a known terminal OAuth state (token_invalidated, token_revoked, invalid_token, invalid_grant, unauthorized_client, refresh_token_reused) transition to DEAD. Everything else keeps the existing TTL semantics — 429 rate limits are transient and should recover. DEAD entries are excluded from rotation unconditionally. They only clear when an explicit write-side re-auth sync rewrites the tokens (the existing _sync_codex_pool_entries / _sync__entry_from_auth_store paths already clear last_status to None). The read-side auth.json-sync paths also now fire on DEAD so an in-flight pool entry can adopt fresh tokens written by another process without needing explicit re-auth. After 24 hours, DEAD manual entries (source='manual:') are pruned from the pool automatically so dead state doesn't accumulate forever. Singleton-seeded DEAD entries (source='device_code' etc.) are kept because _seed_from_singletons would recreate them on the next load with the same stale tokens — pruning would be pointless. The audit trail stays visible (label, last_error_reason, timestamps). Closes #32849.	2026-05-28 23:45:42 -07:00
AhmetArif0	4126da65ae	fix(security): add bws_cache.json to file_safety read guard The Bitwarden Secrets Manager disk cache introduced in #31968 stores plaintext secret values at <hermes_home>/cache/bws_cache.json to avoid re-fetching across back-to-back CLI invocations. The file was not added to get_read_block_error()'s credential_file_names list, leaving the agent able to read it directly via the read_file tool. Add os.path.join("cache", "bws_cache.json") to credential_file_names so both HERMES_HOME and the global root are covered, matching the existing pattern used for auth.json, .anthropic_oauth.json, etc. Other files under cache/ (images, documents, audio) are unaffected — the check is an exact-file match, not a prefix match. Verified: 11/11 exploit/regression scenarios pass; 38/38 existing file_safety tests pass.	2026-05-28 23:31:20 -07:00
Gabor Barany	1386a7e478	fix(xai-sanitize): deepcopy tools_for_api before in-place mutation (#27907 ) The xAI tool-schema sanitizers (strip_slash_enum, strip_pattern_and_format) mutate their input in place — that's their documented contract. The two call sites (chat_completion_helpers.build_api_kwargs and the auxiliary client) were passing agent.tools straight through, so the first xAI request would permanently strip slash-containing enum constraints and pattern/format keywords from the per-agent tool registry. Effect: any subsequent non-xAI call from the same agent (auxiliary task routed to Anthropic, OpenRouter fallback, mid-session model switch) saw the already-stripped schema with no way for the user to notice from their config. Fix: deepcopy tools_for_api before sanitizing at both call sites. The slash-enum bug itself (xAI 400ing on enums with '/') was fixed earlier by #32443 (Nami4D) — that PR landed the strip but used the sanitizers directly without copying. This salvages #27907's correctness contribution (the deepcopy) while skipping its redundant parallel sanitizer (strip_xai_incompatible_enum_values is functionally equivalent to the existing strip_slash_enum) and its preflight- neutrality argument (we chose model-gated preflight in #32443). 3 new tests in tests/run_agent/test_run_agent_codex_responses.py: - strips_slash_enum_from_outgoing_request — outgoing kwargs has no slash-containing enum values (functional contract preserved). - does_not_mutate_agent_tools — headline #27907 regression. Snapshot agent.tools before build_api_kwargs, assert it survives intact after. Pre-fix this assertion would have caught the mutation. - is_idempotent_across_repeated_calls — three xAI requests in a row each strip cleanly AND don't progressively erode the source schema. 344/344 across tests/agent/test_auxiliary_client.py, tests/agent/transports/test_codex_transport.py, tests/run_agent/test_run_agent_codex_responses.py, and tests/tools/test_schema_sanitizer.py. Co-authored-by: Gabor Barany <barany.gabor@gmail.com>	2026-05-28 23:29:59 -07:00
kshitijk4poor	66827f8947	chore: prune unused imports and duplicate import redefinitions Remove unused imports (F401) and duplicate/shadowed import redefinitions (F811) across the codebase using ruff's safe autofixes. No behavioral changes -- imports only. - ~1400 safe autofixes applied across 644 files (net -1072 lines) - __init__.py re-exports preserved (excluded from F401 removal so public re-export surfaces stay intact) - Re-exports that are imported or monkeypatched by tests but look unused in their defining module are kept with explicit # noqa: F401 (gateway/run.py load_dotenv; run_agent re-exports from agent.message_sanitization, agent.context_compressor, agent.retry_utils, agent.prompt_builder, agent.process_bootstrap, agent.codex_responses_adapter) - Unsafe F841 (unused-variable) fixes deliberately skipped -- those can change behavior when the RHS has side effects - ruff lints remain disabled in pyproject.toml (only PLW1514 is selected); this is a one-time cleanup, not a config change Verification: - python -m compileall: clean - pytest --collect-only: all 27161 tests collect (zero import errors) - core entry points import clean (run_agent, model_tools, cli, toolsets, hermes_state, batch_runner, gateway) - static scan: every name any test imports directly from an edited module still resolves	2026-05-28 22:26:25 -07:00
Teknium	a4d8f0f62a	feat(prompt): universal task-completion guidance + local Python toolchain probe (#34340 ) * fix(codex): surface error code in Responses 'failed' status errors When a Codex Responses turn ends with status=failed, the response carries the failure details under `response.error` as `{code, message, param, ...}`. The previous extractor pulled only `message`, so users seeing a rate-limit failure got a bare "Slow down" string indistinguishable from a generic stream truncation; an internal_error with empty message degraded to a dict dump ("{'code': 'internal_error', 'message': ''}"). Extract a `_format_responses_error()` helper that: - prefixes `code` when both code and message are present (e.g. 'rate_limit_exceeded: Slow down') - falls back to the bare `code` when message is empty - accepts both dict and attribute-style payloads (SDK and JSON-RPC paths) - preserves the prior status-only fallback when no error payload exists Apply the same helper at the sibling site in `codex_app_server_session.run_turn()` so codex-CLI subprocess turn failures get the same treatment. Tests: - 8 new unit tests for `_format_responses_error` covering both shapes, empty/missing fields, non-string fields, and the status-only fallback. - 2 regression tests on `_normalize_codex_response` for failed status with and without a code, asserting the exact RuntimeError message. - All 3603 tests in tests/agent/ pass. Adapted from anomalyco/opencode#28757. * feat(prompt): universal task-completion guidance + local Python toolchain probe Two cross-model failure modes get a single-line answer in the cached system prompt. Both gated by config (default on), both add zero overhead when not needed, both verified via real AIAgent prompt builds. ## What changed `TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models. Targets two failure modes observed on a real Sarasota real-estate build task: (1) Opus stopped after writing an 85-byte stub and gave a prose response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed through a PEP-668 wall, then returned fabricated listings instead of admitting the blocker. Both behaviors are model-family-agnostic, so the guidance lives outside the existing tool_use_enforcement gate (~192 tokens, paid once per session via prefix cache). `tools/env_probe.py` — local Python toolchain probe. Detects python3/pip/uv/PEP-668 state and emits ONE short line in the system prompt when something is non-default. Emits NOTHING when the env is clean (zero token cost for normal users). Skipped entirely for remote terminal backends (docker/modal/ssh) — they have their own probe. Example output on a broken environment (the actual case): Python toolchain: python3=3.11.15 (no pip module), python=missing (use python3), pip→python3.12 (mismatch), PEP 668=yes (use venv or uv). ## Config Both flags live under `agent.` in config.yaml, default True: agent: task_completion_guidance: true # universal "finish the job" block environment_probe: true # local Python toolchain hints Neither addition required a `_config_version` bump — deep-merge fills defaults in for existing user configs. ## Validation \| Test surface \| Result \| \|---\|---\| \| tests/tools/test_env_probe.py \| 10/10 pass (probe unit) \| \| tests/run_agent/test_run_agent.py — new classes \| 8/8 pass (integration) \| \| TestToolUseEnforcementConfig \| 17/17 pass (no regression) \| \| TestBuildSystemPrompt \| 9/9 pass (no regression) \| \| TestInvalidateSystemPrompt \| 2/2 pass (no regression) \| \| tests/agent/test_prompt_builder.py \| 124/124 pass (no regression) \| \| tests/hermes_cli/ \| 5662/5662 pass (config defaults) \| \| E2E AIAgent build (broken env) \| Both blocks present, 2,178 chars \| \| E2E AIAgent build (clean env) \| 771-char net overhead, env probe silent \|	2026-05-28 22:26:09 -07:00
Teknium	a30480bd2b	fix(compression): prevent session-id fork from concurrent compressions (#34351 ) * fix(compression): prevent session-id fork from concurrent compressions When two AIAgent instances share the same session_id (most commonly the parent-turn agent and its background-review fork, which inherits session_id verbatim via background_review.py L451), both can call compress_context() on overlapping snapshots of the same conversation. Each ends the parent and creates its own NEW child session in state.db, both parented to the same old id. The gateway SessionEntry only catches one rotation; the other becomes an orphan that silently accumulates writes — Damien's incident shape (parent 20260527_234659_e65f0e → two children, only one visible). Adds a state.db-backed per-session compression lock. Acquired before the rotation in conversation_compression.compress_context(); on failure, the caller returns messages unchanged so the auto-compress retry loop stops cleanly. TTL (5min default) reclaims locks abandoned by crashed compressors. Lock holder identity (pid:tid:agent:nonce) is preserved for diagnostics via get_compression_lock_holder(). Schema bumped 13 -> 14 to track the new compression_locks table. Reconciled additively via the existing declarative-column pattern; no data migration needed for existing DBs. Regression test reproduces Damien's shape: two threads racing _compress_context on a shared parent_sid. Without the lock the test deterministically produces 2 child sessions; with the lock, exactly 1. Covers all six compression entry points (preflight in conversation_loop, mid-turn fallback, hygiene compression in gateway, /compact, CLI /compress, TUI /compress). ACP /compress was already protected by nulling out _session_db before its compress call. * ci: trigger rerun (transient GitHub API rate limit on CodeQL workflow)	2026-05-28 21:40:39 -07:00
teknium1	8cf6b3da9d	fix(opencode-go): cap mimo-v2.5-pro max_tokens at 131072 The opencode-go relay defaults max_tokens to 262144 when none is sent, but Xiami mimo-v2.5-pro only supports 131072 completion tokens — every request 400s with "max_tokens is too large: 262144" before the agent can do anything. Add a get_max_tokens(model) hook on ProviderProfile (default returns default_max_tokens) so profiles fronting multiple upstreams can vary the cap per-model. Wire chat_completions transport through the hook. Override on OpenCodeGoProfile with mimo-v2.5-pro=131072. Only mimo-v2.5-pro is capped — other opencode-go models (kimi, glm, qwen, minimax, other mimo variants) unchanged.	2026-05-28 20:49:53 -07:00
hinotoi-agent	042c1d6bb0	test: cover fallback dropped-turn handoff	2026-05-28 20:34:40 -07:00
Hinotoi Agent	6dc068ef04	fix: broaden deterministic compression fallback coverage	2026-05-28 20:34:40 -07:00
Hinotoi Agent	e785c0ad70	fix: preserve context when summary generation fails	2026-05-28 20:34:40 -07:00
Teknium	769ee86cd2	feat(kanban): attach images referenced in task bodies to worker vision (#34210 ) Kanban workers now scan the task body for local image paths and http(s) image URLs and attach them to the worker's first user turn — matching the CLI/gateway behaviour for inbound images. Before, a user pasting `/home/me/screenshot.png` or `https://example.com/img.png` into a kanban task description had it sent to the model as plain text and the pixels were never seen. How it works: * agent/image_routing.py gains extract_image_refs(text) → (paths, urls) that mirrors gateway/platforms/base.py:extract_local_files (absolute / ~-relative paths, image extensions only, ignores fenced/inline code). * build_native_content_parts() accepts an optional image_urls= kwarg and emits passthrough image_url parts for remote URLs alongside the base64 data: URLs used for local paths. * cli.py (single-query/quiet branch — the path every dispatcher-spawned worker takes) detects HERMES_KANBAN_TASK, reads the task body via kanban_db.get_task, runs extract_image_refs, and threads the results into the existing image-routing decision (native vs text). Best-effort: enrichment failures never block worker startup. Tested: * tests/agent/test_image_routing.py — 22 new tests for extract_image_refs and URL pass-through in build_native_content_parts. * tests/hermes_cli/test_kanban_worker_image_extraction.py — 10 new tests driving real kanban_db round-trip (create task → read body → extract refs → build parts). * E2E: created a fake kanban task with a body referencing both a local PNG and an https URL; verified the worker pipeline produces a multimodal user turn with 1 text part + 2 image_url parts (data URL for the local file, passthrough URL for the remote).	2026-05-28 17:50:42 -07:00
Dave Heritage	5a95fb2e14	feat: expose completed-turn message context to memory providers Adds an optional `messages` keyword to the `MemoryProvider.sync_turn` contract so external/community memory plugins can receive the OpenAI-style conversation message list for the completed turn — including assistant tool calls and tool result content — not just the final assistant text. Dispatch uses signature inspection (`_provider_sync_accepts_messages`): only providers that declare a `messages` parameter (or `**kwargs`) receive it; all existing in-tree providers keep their legacy text-only signature and are called unchanged. No structured-trace envelope is added to core — providers reconstruct whatever they need from the standard message list. Also documents Memori as a standalone community memory provider. Salvaged from #28065 — rebased onto current main. Co-authored-by: Dave Heritage <david@memorilabs.ai>	2026-05-29 02:16:43 +05:30
yanghd	7a3c38d0b7	fix: stop probe stepdown without provider context limit	2026-05-28 12:26:53 -07:00
Teknium	5f66c36470	fix(redact): pass web URLs through unchanged (#34029 ) * fix(redact): pass web URLs through unchanged Magic-link checkout URLs, OAuth callbacks the agent is meant to follow, and pre-signed share URLs were getting `?token=*` / `?code=` / `?signature=` blanket-redacted by parameter NAME, which breaks any skill that has to round-trip a URL through history (the model's tool call arguments get sanitized before persistence — the live call fires with the real URL, but the next turn sees ``). Joe Rinaldi Johnson hit this with a checkout-acceleration skill that uses magic links in URLs. Drops three call sites from `redact_sensitive_text`: - `_redact_url_query_params` (was redacting `access_token`, `token`, `api_key`, `code`, `signature`, `key`, `auth`, etc.) - `_redact_url_userinfo` (was redacting `https://user:pass@host`) - `_redact_http_request_target_query_params` (was redacting access-log request targets like `"POST /hook?password=... HTTP/1.1"`) The helpers themselves are kept in the module — still importable by anything that wants to opt in explicitly. Still redacted (unchanged): - Vendor-prefix credential shapes (sk-, ghp_, AKIA, gAAAA, etc.) anywhere they appear, including inside URLs — see the `test_known_prefix_inside_url_still_redacted` case. - JWTs (`eyJ...`) - DB connection-string passwords (`postgres://admin:pw@host`) — these are connection strings, not web URLs the agent navigates to. - Authorization headers, ENV assignments, JSON `apiKey`/`token` fields, Telegram bot tokens, private key blocks, Discord mentions, E.164 phone numbers, and form-urlencoded bodies (request bodies, not URLs). Tests: replaces `TestUrlQueryParamRedaction` + `TestUrlUserinfoRedaction` with `TestWebUrlsNotRedacted`, asserting representative URLs (OAuth callback, magic link, S3 pre-signed, websocket, userinfo, access log) pass through unchanged. Adds positive cases proving the prefix and DB connstr nets still fire. 74 redact tests + 10 browser-exfil + 16 PII redaction tests all pass. test(codex_app_server): drop URL-query assertion from stderr-tail redaction test The test bundled (a) sk-live-* credential-prefix redaction with (b) URL query-param redaction. (a) is still in effect via _PREFIX_RE; (b) was the contract we just removed in the parent commit so the 'querysecret12345' assertion stopped holding. Keep the credential-shape assertion, drop the URL-query one. Send-message tool's local _URL_SECRET_QUERY_RE in tools/send_message_tool.py is independent of agent/redact.py and unchanged — its tests (test_top_level_send_failure_redacts_query_token, test_http_error_redacts_access_token_in_exception_text) still pass.	2026-05-28 11:32:39 -07:00
kshitij	1a74795735	feat: add claude-opus-4.8 and claude-opus-4.8-fast (#34003 ) Anthropic released Claude Opus 4.8 on 2026-05-27, available on OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS: - https://openrouter.ai/anthropic/claude-opus-4.8 - https://openrouter.ai/anthropic/claude-opus-4.8-fast The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast) priced at 2x of the base model — a notable improvement over the 6x premium on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request parameter like Opus 4.6; Anthropic's native fast-mode beta still only covers Opus 4.6. Changes: hermes_cli/models.py - Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to the OpenRouter fallback snapshot and the Nous Portal curated list (live catalogs surface them automatically when reachable; the fallback list matters when the manifest fetch fails). - Add claude-opus-4-8 to the Anthropic-native picker list. agent/model_metadata.py - Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS with 1M tokens (matches 4.6/4.7). agent/anthropic_adapter.py - Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and _NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the Opus 4.7 API contract: adaptive thinking only, xhigh effort level supported, sampling parameters (temperature/top_p/top_k) return 400. - Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output, same as 4.7). Matches by substring so claude-opus-4-8-fast and date-stamped variants resolve correctly. agent/usage_pricing.py - Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50 cache read, $6.25 cache write (same as 4.6/4.7). - Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00 cache read, $12.50 cache write. Per OpenRouter, the 2x premium is the only differentiator from regular Opus 4.8. - OpenRouter routes still pull pricing from the live /models API, so no static OpenRouter entry is needed. tests/agent/test_model_metadata.py - Extend the Claude 4.6+ context-length tag list with 4.8/4-8. website/static/api/model-catalog.json - Regenerated via `python scripts/build_model_catalog.py` to pick up the new entries in the OpenRouter and Nous Portal fallback lists. E2E verification (isolated sys.path import against the worktree): - _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params all return True for claude-opus-4.8 and claude-opus-4.8-fast. - _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays False for 4.8 — fast mode is a separate model ID on OpenRouter, not a parameter Anthropic accepts on the base model. - DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations. - resolve_billing_route + _lookup_official_docs_pricing resolve the correct $5/$25 (regular) and $10/$50 (fast) pricing for both dot-notation and dash-notation inputs. - 4.7 and 4.6 regression: behavior unchanged. Unit tests: 305 passed across tests/agent/test_usage_pricing.py, test_model_metadata.py, tests/hermes_cli/test_model_catalog.py, test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py.	2026-05-28 10:31:59 -07:00
kshitij	0554ef1aa3	fix(agent): fallback immediately on provider content-policy blocks (#33883 ) * fix(agent): fallback immediately on provider content-policy blocks Provider safety-filter refusals (e.g. OpenAI Codex 'flagged for possible cybersecurity risk', OpenAI moderation 'violates our usage policies', Anthropic safety-system rejections, Azure content_filter) are deterministic decisions about a specific prompt. Retrying the same prompt up to api_max_retries times just reproduces the same refusal and burns paid attempts before surfacing the generic 'API failed after 3 retries — <provider message>' to Telegram / cron with no indication that the failure came from the model provider rather than Hermes itself. Classify these as a new FailoverReason.content_policy_blocked (non-retryable, should_fallback=True) and route them through the existing is_client_error path so the loop: - skips the 3x retry backoff - activates a configured fallback model immediately - emits a clear provider-safety message to the user (not the generic 'Non-retryable error (HTTP None)') and surfaces actionable guidance when no fallback is configured (rephrase, narrow context, or set fallback_model in hermes config) - returns a final_response that explicitly tells the user this came from the model provider, so gateway delivery is unambiguous and cron last_status reflects the safety block rather than a vague 'agent reported failure' Patterns are intentionally narrow — verbatim refusal phrasings keyed to specific provider safety pipelines, not generic words like 'policy' or 'violation' that would collide with billing / format / auth errors. Regression guards in test_18028_content_policy_blocked.py verify billing 402s, generic 400s, and OpenRouter account-level provider_policy_blocked remain distinct classifications. Salvaged from #18164 onto current main (file restructure: loop logic moved from run_agent.py to agent/conversation_loop.py, _emit_status → _buffer_status), broadened patterns beyond the original OpenAI Codex cybersecurity case to cover OpenAI moderation, Anthropic safety system, and Azure content_filter; added user-actionable guidance and a clear final_response so cron/gateway surfaces the policy block instead of a generic non-retryable error, and added a regression-guard test module mirroring the is_client_error predicate. Addresses #18028. Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com> * chore: add kchuang1015 to AUTHOR_MAP --------- Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>	2026-05-28 07:28:24 -07:00
Teknium	67011cc0d7	feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816 ) Users report that the CLI/gateway floods them with confusing retry chatter during transient failures: a single 429 can produce 10+ "Provider/Endpoint/ Retrying in 5s..." lines before the request eventually succeeds. The same firehose hits Telegram, Discord, Slack, etc. via _emit_status. This patch defers all retry/fallback/compression status messages until we know the outcome: - if the turn ultimately succeeds (any path: primary recovers, fallback activates, compression unsticks the request), the buffer is silently dropped — the user sees nothing. - if every retry and fallback exhausts and the turn fails, the buffer is flushed at the terminal-failure return so the user sees the full retry trace alongside the final error. Backend logging (agent.log) is unchanged — every emission site still writes to logger.warning/info, so post-mortem diagnosis is intact. ## What changed run_agent.py: four new methods on AIAgent: _buffer_status(msg) — defer an _emit_status call _buffer_vprint(msg) — defer a _vprint(force=True) line _clear_status_buffer() — drop pending messages on success _flush_status_buffer() — replay pending messages on terminal failure agent/conversation_loop.py: - converted ~30 mid-process emit/vprint sites in the retry, fallback, compression, empty-response, and stream-watchdog paths to the buffered helpers - added _flush_status_buffer() at every terminal-failure return so users still see the trace when it actually matters - added _clear_status_buffer() at the "non-empty assistant content" point (NOT at "API call returned bytes" — empty responses still loop through the empty-retry path and would otherwise lose their trace between iterations) - silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error, retrying..." spinner final-frame messages — the spinner now stops cleanly so retries leave no visible residue agent/chat_completion_helpers.py: same conversion for codex TTFB / stale- stream / fallback-activation status messages. agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting directly. ## Tests tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op, re-buffer after flush, exception swallowing. Updated 3 existing tests that mocked _emit_status to also mock (or use) _buffer_status: - tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway - tests/run_agent/test_stream_drop_logging.py (2 tests) - tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test) ## Validation Live test: hermes chat -q against an unreachable endpoint with no fallback exhausts retries and prints the full trace at the end. Same flow against a working endpoint prints zero retry chatter.	2026-05-28 04:53:27 -07:00
Teknium	5e1f793430	chore(web): remove web_crawl tool + provider crawl plumbing (#33824 ) The web_crawl_tool() function was an orphan — no model schema registered it, no skill or CLI command called it, and the agent had no way to invoke it. PR #32608 proposed wiring it up as a model-callable tool; we've decided not to expose crawl as a separate capability since web_search + web_extract cover the use cases we want models to have. Removed: - tools/web_tools.py: web_crawl_tool() (~230 LOC) - plugins/web/firecrawl/provider.py: supports_crawl() + crawl() - plugins/web/tavily/provider.py: supports_crawl() + crawl() - plugins/web/xai/provider.py: supports_crawl() override - agent/web_search_provider.py: supports_crawl() + crawl() ABC methods - agent/web_search_registry.py: get_active_crawl_provider() + the 'crawl' branch in _resolve() - agent/display.py: web_crawl tool-progress rendering - hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools - tools/website_policy.py: stale comment reference - Tests: removed TestWebCrawlTavily class, the two website-policy web_crawl tests, the searxng/ddgs/brave-free crawl-error tests, the integration test_web_crawl method, and the test_unconfigured_crawl_emits_top_level_error test. Trimmed the capability-flag parametrize list and the WebSearchProvider ABC conformance tests. - Docs: trimmed the Crawl column from capability tables in both EN and zh-Hans, updated the developer-guide ABC table. Net: 25 files, +115/-1067. Closes #33762 (the schema-text bug only existed if #32608 landed). Supersedes #32608.	2026-05-28 04:52:42 -07:00
Biser Perchinkov	b5495db701	fix(agent): re-pad reasoning_content on cross-provider fallback to require-side providers api_messages is built once before the retry loop while the primary provider is active. When a mid-conversation fallback switches to a require-side thinking provider (DeepSeek/Kimi/MiMo), assistant turns built under a non-require primary (e.g. Codex) go out without reasoning_content and the new provider rejects the request with HTTP 400 ("reasoning_content must be passed back"). Re-apply the echo-back pad against the current provider immediately before building the request kwargs. Idempotent and a no-op unless the active provider enforces echo-back, so it covers all fallback paths without affecting normal or reject-side operation. Drafted by Claude (Opus 4.7) under human review while fixing a personal deployment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 03:21:00 -07:00
teknium1	9b5dae17a5	feat(context-engine): host contract for external context engines Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373 into a minimal generic host contract that external context engine plugins (e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that duplicated existing infrastructure or had marginal value. Five concrete changes: 1. `_transition_context_engine_session()` on AIAgent — generic lifecycle helper that fires on_session_end → on_session_reset → on_session_start → optional carry_over_new_session_context. Engines implement only the hooks they need; missing hooks are skipped. Built-in compressor keeps its existing reset-only behavior because callers default to no metadata. `reset_session_state()` now optionally accepts previous_messages / old_session_id / carry_over_context and delegates to the transition helper when provided. (#16453) 2. `conversation_id` passed to `on_session_start()` — both the agent-init call site and the compression-boundary call site now forward `self._gateway_session_key` so plugin engines have a stable conversation identity that survives session_id rotation (compression splits, /new, resume). The key already existed on AIAgent; it just wasn't reaching engines. (#16453) 3. Canonical cache buckets forwarded to engines — the usage dict passed to `update_from_response()` now includes input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of the legacy prompt/completion/total keys. Engines can make decisions on cache-hit ratios and reasoning costs instead of only aggregates. ABC docstring updated. (#17453) 4. Plugin-registered context engines visible in the picker — `_discover_context_engines()` in plugins_cmd.py now also includes engines registered via `ctx.register_context_engine()` from plugin manifests, deduplicating by name so repo-shipped descriptions win on collision. (#16451) 5. `_EngineCollector.register_command()` — context engines using the standard `register(ctx)` pattern can now expose slash commands (e.g. `/lcm`). Routes to the global plugin command registry with the same conflict-rejection policy regular plugins use (no shadowing built-ins, no clobbering other plugins). Previously these calls hit a no-op and the slash commands silently never appeared. (#17600) Dropped from the original 5 PRs: - Compression boundary signal (`boundary_reason="compression"`) from #16453 — already on main at `agent/conversation_compression.py:412-424`, landed via the bg-review extraction. - `discover_plugins()` before fallback in run_agent.py from #16451 — redundant: `get_plugin_context_engine()` already routes through `_ensure_plugins_discovered()` which is idempotent. - Runtime identity diagnostics method + helpers from #13373 (+251 LOC) — operators can already read engine state via `engine.get_status()`; the diagnostics view added marginal value relative to its surface area. - The 553-LOC slash-command machinery from #17600 — replaced with a 20-LOC `register_command` method on the collector that reuses the existing plugin command registry instead of building a parallel one. Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs ~1,176 LOC across the original 5 PRs. Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com> Closes #16453. Closes #17453. Closes #16451. Closes #17600. Closes #13373. Related: stephenschoettler/hermes-lcm#68.	2026-05-28 01:45:30 -07:00
helix4u	459d7694d3	fix(agent): preload jiter native parser	2026-05-28 00:20:11 -07:00
Robin Fernandes	406901b27d	feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly.	2026-05-28 00:19:31 -07:00
stephenschoettler	4a6f1863ac	test: cover ci-unblocker production regressions Snapshot review_agent._session_messages before teardown so close() can clean per-session state without dropping the user-visible self-improvement summary. Adds two regressions: - bg-review summarizer receives captured review-agent tool messages after review_agent.close() runs - context-compressor protected-head handoff rehydration populates _previous_summary and keeps the old handoff out of newly summarized turns Salvaged from PR #26039 onto current main after agent/background_review.py extraction. Original commit `63eaf6055`; bg-review test updated to patch the module-level summarize_background_review_actions in agent.background_review instead of the now-forwarder AIAgent._summarize_background_review_actions.	2026-05-27 22:14:53 -07:00
wysie	ee80dfdea0	fix: preserve skill packages during curator consolidation	2026-05-27 13:39:58 -07:00
xxxigm	fc47b7285c	fix(codex): omit tools key from Codex Responses kwargs when no tools registered Salvages the transport-side fix from #32911 (@xxxigm). Closes #32892. The openai SDK's responses.stream() / responses.parse() eagerly call _make_tools(tools), which iterates tools without a None guard. Passing tools=None raises TypeError: 'NoneType' object is not iterable before any HTTP request is issued (openai==2.24.0). PR #33042 already removed responses.stream() from our own Codex call paths, so the specific iteration crash inside _make_tools is no longer on the hot path. But the right API contract is to omit tools entirely when there are no functions to expose — passing tools=None to the backend is semantically wrong regardless of the SDK's iteration behavior, and we'd hit it again on any future code path that hasn't migrated off responses.stream(). This applies the transport-level part of @xxxigm's fix: move 'tools': response_tools into the if response_tools: branch so the key is omitted when there are no tools, just like tool_choice and parallel_tool_calls already are. Skips the run_agent.py-side _strip_sdk_none_iterables helper from their PR — that path is now obsolete because the SDK helper that needed defending is gone. Tests - tests/run_agent/test_codex_no_tools_nonetype.py: 6 tests trimmed from @xxxigm's original 13-test file. Drops the obsolete tests for _strip_sdk_none_iterables and _RecordingResponsesStream (helpers that don't exist on main anymore), keeps the transport behavior tests + the SDK contract sanity check that ensures we notice if upstream ever fixes _make_tools(None). - 6/6 passing locally. Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>	2026-05-27 11:46:17 -07:00
Brixyy	dc9d677d59	fix(agent): classify TypeError('NoneType ... not iterable') as retryable provider shape error Salvages the intent of #33136 (@Brixyy) onto current main. The original PR was written against the pre-refactor monolithic run_agent.py and added a top-level _is_nonretryable_local_validation_error() helper. Both target functions have since been extracted to agent/conversation_loop.py:2869, so the salvage applies the equivalent guard inline at that canonical location rather than reintroducing the helper. ## Why After #33042 made our own Codex consumer structurally immune to NoneType crashes, third-party shims, mocked clients, and any future code path that hasn't migrated could still surface TypeError: 'NoneType' object is not iterable as a wire-shape mismatch. The agent loop's classifier currently treats ALL TypeError as a local programming bug and aborts non-retryable — users on stale Telegram/gateway turns saw bare "Non-retryable error (HTTP None)" with no recovery. This is a provider/SDK shape mismatch, not a local programming bug. The retry/fallback path should run, not be short-circuited. ## What agent/conversation_loop.py: extend is_local_validation_error to exclude TypeErrors whose message matches the NoneType-not-iterable shape (case- insensitive, both "NoneType" and "not iterable" must appear). tests/run_agent/test_jsondecodeerror_retryable.py: - update the mirror predicate to match the production check - add TestNoneTypeNotIterableIsRetryable class with 3 tests (the basic shape, message variants, unrelated TypeErrors still abort) - add TestAgentLoopSourceHasNoneTypeCarveOut to enforce the source-level invariant matches the test mirror ## Validation tests/run_agent/test_jsondecodeerror_retryable.py + tests/run_agent/test_31273_402_not_retried.py → 14/14 passing Co-authored-by: Brixyy <subrtt@gmail.com>	2026-05-27 11:30:55 -07:00
Sanghyuk Seo	283bb810e7	fix(agent): tolerate large codex stream prefill	2026-05-27 11:19:55 -07:00
teknium1	486d632cc2	fix(auxiliary): coerce None final.output to empty list in Codex aux adapter Closes #33368. `_CodexCompletionsAdapter.create()` iterates `final.output` from the Codex Responses stream. The event-driven consumer (introduced in #33042) always sets `final.output` to a list, so this shape can't come from our own code path. But: - Mocked clients in tests can return a typed Response with `output=None` - Third-party shims / compatibility layers that bypass the consumer can do the same - A future code path that wraps a different consumer could regress The old code `getattr(final, "output", [])` returns `None` (not the default `[]`) when the attribute EXISTS but is `None`. Iterating `None` then raises `TypeError: 'NoneType' object is not iterable` — the exact error logged by title-generation when this fires. Fix: `getattr(final, "output", None) or []` — single-line defensive coerce. Cheap; zero risk. Regression test asserts the auxiliary path handles a final whose `.output` is `None` (via monkey-patched consumer) without raising and returns the expected chat.completions-shaped response. Reporter: @pavegrid-1 (issue #33368).	2026-05-27 11:08:21 -07:00
mavrickdeveloper	2e3c6627ce	Add Honcho runtime peer mapping (cherry picked from commit `864cdb3d2e`)	2026-05-27 10:49:33 -07:00
zccyman	2e181602a1	fix(agent): isolate credential pool on provider fallback Closes #33163. When _try_activate_fallback() switches from one provider to another (e.g. openai-codex → openrouter), the credential pool still belongs to the primary provider. This causes two compounding bugs: 1. The pool retains the primary's base_url. Downstream pool recovery (rate_limit / billing / auth) calls _swap_credential() with a primary entry which overwrites the agent's base_url back to the primary's endpoint. Every fallback request then 404s against the wrong host. 2. Pool recovery acting on errors from the FALLBACK provider mutates the PRIMARY's pool state (#33088 reported a related corruption pattern), exhausting/rotating entries that have nothing to do with the failure. Two layered fixes: a) try_activate_fallback (agent/chat_completion_helpers.py): on fallback activation, clear agent._credential_pool when the fallback provider doesn't match the pool's provider. Pool is preserved when the fallback shares the pool's provider (e.g. multiple openrouter entries). b) recover_with_credential_pool (agent/agent_runtime_helpers.py): defensive guard rejects any pool mutation when agent.provider doesn't match pool.provider. Defense-in-depth — should never fire after (a) is in place, but covers any future path that attaches a stale pool. Salvaged from @zccyman's PR #33217. The original PR was written against the pre-refactor monolithic run_agent.py; both target functions have since been extracted to module-level helpers. Behavior is identical — the guards live in the canonical extracted locations. Tests - New tests/run_agent/test_fallback_credential_isolation.py (7 tests covering: fallback clears mismatched pool, fallback preserves matching pool, recovery rejects mismatched pool, recovery accepts matching pool, 429-from-z.ai-doesn't-exhaust-codex-pool, _client_kwargs base_url survives pool clear, _swap_credential doesn't restore primary URL after fallback). - Cross-verified: 77/77 passing across fallback isolation tests + agent/test_credential_pool.py — no regression. Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>	2026-05-27 10:45:26 -07:00
Teknium	f0de3cd0a0	fix(agent): roll back switch_model() state when client rebuild fails (#33228 ) Closes #33175. switch_model() in agent/agent_runtime_helpers.py mutated agent.model and agent.provider before rebuilding the client, with no try/except to restore them on failure. If the rebuild raised (bad API key, network error, build_anthropic_client failure, etc.) the agent was left with the new model+provider name paired with the OLD client — producing HTTP 400s like "claude-sonnet-4-6 is not supported on openai-codex" on the next turn. Callers in cli.py, gateway/run.py, and tui_gateway/server.py already catch the exception and warn the user, but the warning was misleading because the swap had partially succeeded; the agent's state was torn. Snapshot every mutated field before the swap, wrap the swap+rebuild block in try/except, and restore the snapshot on failure before re-raising so the caller's warning surfaces. Reported by @amirariff91. Tests cover both branches (chat_completions and anthropic_messages) and the cross-branch case (anthropic -> openai).	2026-05-27 05:43:20 -07:00
Nami4D	a699de83ec	fix(xai-oauth): strip service_tier and add safety-net sanitization for slash enums xAI's /v1/responses endpoint rejects service_tier with HTTP 400 "Argument not supported: service_tier" when users activate /fast mode. Also add a safety-net strip_slash_enum call in _preflight_codex_api_kwargs to catch any tool schemas that might slip through the caller-level sanitization. xAI's Responses API grammar compiler rejects enum values containing forward slashes (e.g. HuggingFace model IDs like "Qwen/Qwen3.5-0.8B") with the opaque "Invalid arguments passed to the model" error. Fixes the root cause of "Invalid arguments passed to the model" errors reported by xAI OAuth (SuperGrok) users.	2026-05-27 05:25:38 -07:00
chaconne67	9c69204d87	fix(codex_responses_adapter): drop foreign-issuer reasoning on replay reasoning.encrypted_content is sealed to the Responses endpoint that minted it. When a session switches model providers mid-conversation — say the user runs /model gpt-5.5 after several turns on grok-4.3, or vice versa — the persisted codex_reasoning_items carry blobs the new endpoint cannot decrypt, and every subsequent turn fails with HTTP 400 invalid_encrypted_content. This is the cross-issuer prevention layer. Pairs with: * PR #33035 — runtime recovery when the HTTP 400 fires anyway * PR #33146 — prevention for transient rs_tmp_* items Stamps each reasoning item with the issuer kind that minted it (codex_backend / xai_responses / github_responses / other:<url>) at normalize time, then drops items at replay time when the active endpoint differs from the stamp. Unstamped (legacy) items pass through for backwards compatibility. Cherry-picked from @chaconne67's PR #31629. Conflict against current main (#33035's replay_encrypted_reasoning parameter) resolved as 'keep both' — the two guards compose: replay_encrypted_reasoning=False is the session-wide kill switch, current_issuer_kind is the per-item filter that runs only when replay is still enabled.	2026-05-27 02:40:03 -07:00

1 2 3 4 5 ...

1089 commits