hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-07 08:02:23 +00:00

Author	SHA1	Message	Date
chaconne67	9c69204d87	fix(codex_responses_adapter): drop foreign-issuer reasoning on replay reasoning.encrypted_content is sealed to the Responses endpoint that minted it. When a session switches model providers mid-conversation — say the user runs /model gpt-5.5 after several turns on grok-4.3, or vice versa — the persisted codex_reasoning_items carry blobs the new endpoint cannot decrypt, and every subsequent turn fails with HTTP 400 invalid_encrypted_content. This is the cross-issuer prevention layer. Pairs with: * PR #33035 — runtime recovery when the HTTP 400 fires anyway * PR #33146 — prevention for transient rs_tmp_* items Stamps each reasoning item with the issuer kind that minted it (codex_backend / xai_responses / github_responses / other:<url>) at normalize time, then drops items at replay time when the active endpoint differs from the stamp. Unstamped (legacy) items pass through for backwards compatibility. Cherry-picked from @chaconne67's PR #31629. Conflict against current main (#33035's replay_encrypted_reasoning parameter) resolved as 'keep both' — the two guards compose: replay_encrypted_reasoning=False is the session-wide kill switch, current_issuer_kind is the per-item filter that runs only when replay is still enabled.	2026-05-27 02:40:03 -07:00
Teknium	cb38ce28cb	refactor(codex): drop SDK responses.stream() helper; consume events directly (#33042 ) * refactor(codex): drop SDK responses.stream() helper; consume events directly The OpenAI Python SDK's high-level `client.responses.stream(...)` helper does post-hoc typed reconstruction from the terminal `response.completed.response.output` field. The chatgpt.com Codex backend has been observed (today, gpt-5.5) to ship `response.output = null` on terminal frames, which crashes the SDK with `TypeError: 'NoneType' object is not iterable` mid-iteration. Carlton's #32963 patched the symptom by wrapping the helper in try/except and recovering from the same per-event accumulator the SDK was supposed to populate. This PR removes the helper from the call path entirely: we now use `client.responses.create(stream=True)` (raw AsyncIterable of SSE events) and assemble the final response object ourselves from `response.output_item.done` events as they arrive. The terminal event's `output` field is never read for content. Same strategy OpenClaw uses for the same backend. This makes Hermes structurally immune to the bug class, not patched. The next time OpenAI ships a shape change to chatgpt.com's terminal frame, our consumer keeps working because it doesn't read that frame for content — only for usage/status/id. Changes - `agent/codex_runtime.py`: new `_consume_codex_event_stream()` shared consumer; `run_codex_stream()` uses `responses.create(stream=True)`; `run_codex_create_stream_fallback()` collapses into a thin alias since the primary path now does what the fallback used to do. - `agent/auxiliary_client.py`: `_CodexCompletionsAdapter` uses the same consumer; old null-output recovery helpers deleted as unreferenced. - Tests migrated: fixtures that mocked `responses.stream` now mock `responses.create` returning a raw iterable. New regression test asserts the auxiliary path returns streamed items even when the terminal event's `output` is literally `null`. Validation - Live: tested against fresh OAuth on `chatgpt.com/backend-api/codex` with `gpt-5.5` — response built correctly with `response.output=null` on the terminal frame, all events consumed, usage/reasoning tokens propagated. - `tests/run_agent/test_run_agent_codex_responses.py` + `tests/agent/test_auxiliary_client.py`: 242 passed. * test+fix(codex): migrate streaming tests, raise on truncated streams CI surfaced 10 test failures across tests/run_agent/test_streaming.py and tests/run_agent/test_codex_xai_oauth_recovery.py — both files had their own `responses.stream(...)` mocks I missed in the first sweep. agent/codex_runtime.py: _consume_codex_event_stream() now raises "Codex Responses stream did not emit a terminal response" when the stream ends without any terminal frame AND no usable content. This preserves the signal callers used to get from the SDK's high-level helper, which they distinguished from "completed with empty body" in error handling. Tests migrated: - test_streaming.py: text-delta callback, activity-touch, and remote-protocol-error tests all switch from mocking responses.stream to responses.create returning an iterable of events. - test_codex_xai_oauth_recovery.py: prelude-error tests are recast as wire-error-event tests (the new path raises _StreamErrorEvent directly when the wire emits type=error, which is strictly better than the old two-phase "SDK RuntimeError → retry → fallback"). The retry-on-transport-error test moves from responses.stream side-effect to responses.create side-effect. Verified live against chatgpt.com Codex with gpt-5.5 — AIAgent.chat() through the full codex_responses path returns correctly, 319/319 targeted tests passing.	2026-05-27 00:30:06 -07:00
xxxigm	b5ea6a5c80	test(xai-oauth): regression coverage for the bad-credentials disambiguator (#29344 ) Eleven new tests pinning the #29344 fix. Layout mirrors the existing "Fix D" entitlement section so the bad-credentials disambiguator sits alongside the entitlement-block tests it complements. Classifier-level coverage: * ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix`` — verbatim shape from the reporter's wire capture (``{code: 'caller does not have permission', error: 'OAuth2 access token could not be validated. [WKE=unauthenticated:bad-credentials]'}``) ↦ classifier must return False so the refresh path runs. * ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape`` — same body after ``_extract_api_error_context`` has rewritten it to ``{reason, message}``. The disambiguator must fire in BOTH shapes; without this guard the production call site at ``_recover_with_credential_pool`` (which goes through the normalised extractor) would still misclassify. * ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant`` — parametrised forward-compat: ``bad-credentials``, ``expired-token``, ``revoked``, ``some-future-reason``. xAI documents the prefix as stable, the suffix after the colon as a reason code that can grow; every variant under ``unauthenticated:`` must route to refresh. * ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone`` — belt-and-braces guard: if a future API revision drops the WKE suffix but keeps "OAuth2 access token could not be validated", we still classify correctly. * ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords`` — defensive: if a body ever carries BOTH the WKE suffix and entitlement language, the WKE signal wins. Auth is recoverable; entitlement isn't, and a refreshed token will resurface the entitlement message on the next request. * ``test_is_entitlement_failure_case_insensitive_wke_match`` — pins that the classifier lowercases the haystack so a future xAI build that uppercases the prefix doesn't reintroduce the bug. Recovery-path coverage (end-to-end through ``_recover_with_credential_pool``): * ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403`` — the headline test the reporter requested: a bad-credentials 403 with the exact wire body must call ``try_refresh_current()`` exactly once and ``_swap_credential`` once. Pre-fix this returned ``(False, _)`` because the entitlement classifier over-matched and short-circuited the refresh path. * ``test_recover_with_credential_pool_still_blocks_real_entitlement`` — companion regression guard for #26847: a pure unsubscribed- account body (no WKE suffix, no OAuth2-validation phrase) must still surface as entitlement and skip refresh. The new disambiguator must not weaken the original loop-protection it was added to preserve. The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the existing ``MagicMock`` patterns from the surrounding tests so the new section reads as a natural extension of "Fix D" rather than a separate test file.	2026-05-23 02:48:13 -07:00
Teknium	b4afc6546e	fix(xai): restore encrypted reasoning replay across turns xAI partner integration requires Hermes to thread `encrypted_content` reasoning items back to the Responses API on every turn so Grok can maintain cross-turn reasoning coherence. PR #26644 (May 15) gated this off for `is_xai_responses` on the theory that the OAuth/SuperGrok surface rejected replayed encrypted blobs and produced the multi-turn "Expected to have received \`response.created\` before \`error\`" failure. That diagnosis was wrong — the prelude-SSE fallback added in the same PR is what actually fixed that failure mode. Suppressing the replay was an unnecessary side-effect that broke the whole point of xAI's partnership integration. Changes: - agent/codex_responses_adapter.py — drop the `is_xai_responses` gate in `_chat_messages_to_responses_input`. Keep the kwarg in the signature for transport compatibility; update the docstring to document the May 2026 reversal. - agent/transports/codex.py — restore `kwargs["include"] = ["reasoning.encrypted_content"]` on the xAI Responses path so xAI echoes encrypted reasoning back to us. - tests/run_agent/test_codex_xai_oauth_recovery.py — flip the three xAI assertions (now: xAI MUST receive replayed reasoning AND we MUST include encrypted_content in the request). - tests/agent/transports/test_codex_transport.py — flip the `include` assertions on `test_xai_reasoning_effort_passed` and `test_xai_grok_4_omits_reasoning_effort`; update the allowlist block comment. The prelude-SSE fallback and the entitlement-403 surfacing fixes from #26644 are untouched — they were independent fixes that happened to ride along with the reasoning-replay gate. Validation: - Targeted: tests/run_agent/test_codex_xai_oauth_recovery.py + tests/agent/transports/test_codex_transport.py → 65/65 pass - Broader: tests/agent/transports/ + tests/run_agent/ → 1674 passed, 3 skipped, 0 failures - E2E (real imports, isolated HERMES_HOME, ResponsesApiTransport build_kwargs): turn-1 request carries `include: ["reasoning.encrypted_content"]`; turn-2 input replays the encrypted_content blob from turn-1's `codex_reasoning_items`; native Codex unchanged.	2026-05-20 23:12:45 -07:00
xxxigm	34f34ba322	test(xai-oauth): pin tier-denied 403 behavior + docs warning for #26847 Tests: * ``test_refresh_xai_oauth_pure_403_marked_tier_denied_not_relogin`` — refresh-403 raises ``xai_oauth_tier_denied`` with ``relogin_required=False`` and the API-key fallback hint in body. * ``test_format_auth_error_tier_denied_does_not_suggest_relogin`` — the renderer does not append "Run ``hermes model``" for the new code. * ``test_recover_with_credential_pool_skips_refresh_on_bare_403_for_xai_oauth`` — bare ``{"reason":"forbidden","message":"Forbidden"}`` body (which does not match the existing keyword heuristic) still short-circuits ``try_refresh_current`` on xai-oauth. Docs: * Drop the "(any active tier)" claim from the xai-grok-oauth guide, add a top-of-page warning callout, and a Troubleshooting section for the 403-after-login case pointing at ``XAI_API_KEY`` + ``provider: xai`` as the documented fallback.	2026-05-18 20:08:09 -07:00
EloquentBrush0x	1fabd6e100	fix(error_classifier): classify xAI Grok entitlement SSE errors as auth When xAI returns a subscription/entitlement error through an SSE ``type=error`` frame, ``_StreamErrorEvent`` is raised with ``status_code=None``. This caused ``_classify_by_status`` (step 2 of ``classify_api_error``) to be skipped entirely, and the Grok-specific phrases ("do not have an active Grok subscription", "out of available resources") appeared in none of the message-pattern lists. The error fell through to ``FailoverReason.unknown (retryable=True)``, burning ``max_retries`` on every affected X Premium+ / SuperGrok user before the agent stopped — and ``_is_entitlement_failure`` was never called because it only fires under ``FailoverReason.auth``. The HTTP 403 path already handled this correctly (``_classify_by_status`` returns ``auth/non-retryable`` for 403). Add an explicit pattern block at step 1 (highest priority, before the ``status_code`` guard) so both code paths route to ``FailoverReason.auth, retryable=False, should_fallback=True`` — matching the 403 path exactly. Add three regression tests in ``Fix D`` section of ``test_codex_xai_oauth_recovery.py``: - primary "do not have an active Grok subscription" phrase - "out of available resources" + "grok" variant - unrelated ``_StreamErrorEvent`` must not be reclassified	2026-05-18 10:24:13 -07:00
Teknium	dffb602f37	fix(xai): drop stale X Premium+ hint from entitlement 403 surfacing (#27110 ) xAI announced on 2026-05-16 (https://x.ai/news/grok-hermes) that X Premium subscriptions now work in Hermes Agent. The hint we shipped in PR #26644 asserted the opposite ("X Premium+ does NOT include xAI API access — only standalone SuperGrok subscribers can use this provider"), which would now misdirect Premium+ users who hit any other 403 (no Grok sub at all, wrong tier, exhausted quota) into thinking they need to switch subscriptions when their sub is in fact valid. Remove _decorate_xai_entitlement_error and its two call sites in _summarize_api_error. xAI's own body text already says "Manage subscriptions at https://grok.com/?_s=usage" — surface that verbatim and let xAI's wording do the diagnosis. The _is_entitlement_failure guard (which prevents credential-pool refresh loops on entitlement 403s) and the reasoning-replay gating for xai-oauth are unrelated and untouched. Update tests to assert the body still surfaces verbatim and that no Hermes-side editorializing is appended.	2026-05-16 16:00:01 -07:00
Teknium	6784c80794	fix(xai-oauth): lead entitlement-403 hint with X Premium+ gotcha (#26672 ) The #1 confusing cause of the xAI 403 (per Teknium): X Premium+ subscribers see Grok inside the X app and assume API access is included. It is NOT — only standalone SuperGrok subscribers can use xai-oauth with Hermes today. Without calling this out, every Premium+ user hits the 403 with no idea why. PR #26666's neutral 4-cause list was correct but buried the most common cause. Lead with the Premium+ gotcha, then list the other possibilities (no subscription, wrong tier, exhausted quota) as fallbacks. Same neutral framing — does not accuse anyone of being unsubscribed.	2026-05-15 17:23:33 -07:00
Teknium	9818b9a1ac	fix(xai-oauth): rewrite entitlement-403 hint to not accuse subscribers (#26666 ) PR #26644 confidently told users "xAI OAuth account lacks SuperGrok / X Premium entitlement" on any 403 from xAI's permission-denied surface. But that body is returned for at least four distinct causes that Hermes cannot distinguish from the wire: * Account has no Grok subscription at all * Account has SuperGrok but the tier doesn't include the requested model (e.g. grok-4.3 needs SuperGrok Heavy) * Monthly quota for the subscribed tier is exhausted * SuperGrok is active but the API access add-on isn't enabled Don Piedro pushed back that he IS subscribed yet still hit this. Picking the worst-case interpretation ("you're not subscribed") reads as wrong and insulting to subscribers, and points them at a fix they already did. New wording lists all 4 possibilities and points at https://grok.com/?_s=usage where the user can check which applies. The detection logic and credential-pool short-circuit (PR #26664) are unchanged — only the user-facing wording is rephrased.	2026-05-15 17:15:22 -07:00
Teknium	ce0e189d3e	fix(xai-oauth): break entitlement-403 credential-refresh loop, bump grok-4.3 context to 1M (#26664 ) Don Piedro's 18-minute hang on grok-4.3 traced to two issues PR #26644 didn't cover: - _recover_with_credential_pool classifies 403 as FailoverReason.auth and calls pool.try_refresh_current(). For xAI OAuth on an unsubscribed account, refresh succeeds (mints a new token from the same account) but the next API call 403s with the same entitlement error. Result: infinite refresh → retry → 403 loop until Ctrl+C (1133s in Don's log). New _is_entitlement_failure(error_context, status_code) detects the subscription-shape body ("do not have an active Grok subscription" / "out of available resources" + grok / "does not have permission" + grok) and short-circuits recovery so _summarize_api_error surfaces PR #26644's friendly hint. - grok-4.3 resolved to 256k via the grok-4 catch-all in DEFAULT_CONTEXT_LENGTHS. Per docs.x.ai/developers/models/grok-4.3 the model ships with 1M context. Add explicit grok-4.3 entry before the grok-4 fallback (longest-first substring matching ensures grok-4.3 and grok-4.3-latest both land on the new value). Tests: 8 new (23 total in test_codex_xai_oauth_recovery.py). E2E verified Don's 100-iteration loop bails out with 0 refresh calls while genuine auth failures still refresh once and recover.	2026-05-15 17:11:06 -07:00
Teknium	31ba2b0cbc	fix(xai-oauth): recover from prelude SSE errors, gate reasoning replay, surface entitlement 403s (#26644 ) Three fixes for the May 2026 xAI OAuth (SuperGrok / X Premium) rollout failures: - _run_codex_stream: when openai SDK raises RuntimeError("Expected to have received `response.created` before `<type>`"), retry once then fall back to responses.create(stream=True) — same path used for missing-response.completed postlude. Fallback surfaces the real provider error with body+status_code intact. Also fixes #8133 (response.in_progress prelude on custom relays) and #14634 (codex.rate_limits prelude on codex-lb). - _summarize_api_error: when error body matches xAI's entitlement shape, append a one-line hint pointing to https://grok.com and /model. Once-only, applies to both auxiliary warnings and main-loop error surfacing. - _chat_messages_to_responses_input: new is_xai_responses kwarg drops replayed codex_reasoning_items (encrypted_content) before they reach xAI. Also drops reasoning.encrypted_content from the xAI include array. Native Codex behavior unchanged. Grok still reasons natively each turn; coherence rides on visible message text alone. Closes #8133, #14634.	2026-05-15 16:35:12 -07:00

11 commits