xAI's consent page renders the authorization code in-page rather than
redirecting through the 127.0.0.1 callback, so on remote/headless setups
(GCP Cloud Shell, Codespaces, container consoles, headless VPS) the only
value the user can paste is the opaque code with no `code=`/`state=`
query parameters. `_parse_pasted_callback` correctly returns
`state=None` for that input, but `_xai_oauth_loopback_login` then
validated state unconditionally and raised `xai_state_mismatch`,
making the documented bare-code paste path unreachable.
PKCE (code_verifier) still binds the token exchange to this client,
so the local state-equality check is redundant when there is no state
to compare. On the manual-paste path only, substitute the locally
generated state when the callback returned none — the rest of the
validation chain (code presence, error field, token exchange) is
unchanged. The loopback HTTP-server path still requires a matching
state (a real browser redirect always carries one).
Also: clarify the manual-paste prompt to mention xAI's in-page code
rendering so users know pasting the bare code on its own is expected.
Root-cause analysis from #26923 comment by @AccursedGalaxy (2026-05-20).
Tests
-----
* test_xai_loopback_login_manual_paste_bare_code_succeeds — positive
end-to-end through the token exchange with state=None.
* test_xai_loopback_login_loopback_path_rejects_missing_state — the
HTTP-server path still rejects state=None as a regression guard
(the bare-code relaxation must NOT widen the loopback path).
* Existing test_xai_loopback_login_manual_paste_state_mismatch_raises
continues to verify wrong (non-None) state is rejected on manual-paste.
Closes#26923.
#33164 made _save_codex_tokens sync the singleton-seeded `device_code`
pool entry on Codex OAuth re-auth. That fixed the #33000 path but missed
`manual:device_code` entries created by `hermes auth add openai-codex`
(the recommended workaround for users who hit #33000 before #33164
landed).
Every subsequent re-auth would refresh the device_code entry but leave
the manual:device_code entry holding the consumed refresh token plus
stale last_error_* markers — immediately recreating the 401
token_invalidated symptom on the next request, exactly as reported in
#33538.
Extend the refreshable source set to include `manual:device_code`.
Completing the device-code OAuth flow proves the user owns the ChatGPT
account, so it is safe to refresh every device-code-backed entry. Keep
`manual:api_key` and other non-device-code manual sources untouched —
those represent independent credentials.
Closes#33538.
In profile mode, _load_provider_state previously returned None when a
provider was absent from the profile's auth.json — even if the user had
authenticated at the global root. This broke runtime credential resolvers
that read state directly (resolve_nous_access_token,
resolve_nous_runtime_credentials), causing profiles without their own
nous login to fail with 'Hermes is not logged into Nous Portal' despite
a valid global session.
Push the existing read-only global fallback (already used by
get_provider_auth_state and read_credential_pool) into _load_provider_state
so every caller benefits, and simplify get_provider_auth_state into a thin
wrapper. Writes still target the profile only — profile state continues to
shadow global state on the next read after a per-profile login. Behavior in
classic (non-profile) mode is unchanged because _load_global_auth_store
returns an empty dict.
Adds 5 tests covering the new contract on _load_provider_state directly.
Existing 770 auth/credential/nous tests still pass.
Closes#32992.
The chat path resolves Codex credentials via `resolve_codex_runtime_credentials`
which only reads `providers.openai-codex.tokens` (the singleton). The auxiliary
path uses `_read_codex_access_token` which checks the credential_pool first.
For users whose tokens live only in the pool — manual seed, partial re-auth,
restore from backup, or any state where the singleton is empty but the pool
is healthy — the chat path raised AuthError or (worse, since OpenAI(api_key='')
silently attaches no header) the wire saw HTTP 401 "Missing Authentication header"
while the auxiliary path worked fine.
This adds a pool fallback to `resolve_codex_runtime_credentials`: when the
singleton has no usable access_token, scan `credential_pool.openai-codex` for
the first entry that has a non-empty access_token and isn't in an exhaustion
cooldown window (`last_error_reset_at` in the future). If found, return that
token with `source="credential_pool"`. If no usable entry exists, the original
AuthError propagates as before.
Regression tests cover:
- Empty singleton + healthy pool entry → pool token returned
- Pool fallback skips entries currently in cooldown
- Empty singleton + empty/wedged pool → AuthError propagates (existing contract preserved)
When the Codex OAuth token endpoint returns 429 (usage-limit / quota
exhaustion), refresh_codex_oauth_pure raised a generic auth error that the
gateway surfaced as 'Primary provider auth failed: No Codex credentials
stored. Run hermes auth', prompting re-auth that cannot lift a quota cap.
Classify 429 distinctly (codex_rate_limited, relogin_required=False) with a
non-alarming quota message that honors Retry-After, log it as
'Primary provider rate-limited (429)', and stop format_auth_error from
appending the re-authenticate remediation. Also log the fallback provider's
literal config key instead of the resolved runtime category.
Refs #32790
Codex re-auth via `hermes setup` / `hermes model` wrote fresh OAuth
tokens to providers.openai-codex.tokens but left the credential_pool
device_code entry holding the consumed refresh token and stale error
markers. Since the runtime selects from the pool, the next request
spent a dead token and got a 401 token_invalidated. Update the
singleton-seeded pool entries in lockstep and clear their error state.
Fixes#33000
* remove Vercel AI Gateway provider and Vercel Sandbox terminal backend
Both Vercel-hosted integrations are removed end-to-end. Users on the AI
Gateway should switch to OpenRouter or one of the other aggregators
(Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should
switch to Docker, Modal, Daytona, or SSH.
What's removed:
- `plugins/model-providers/ai-gateway/` provider plugin
- `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper
- `tools/environments/vercel_sandbox.py` terminal backend
- `ai-gateway` provider wiring across auth, doctor, setup, models,
config, status, providers, main, web_server, model_normalize, dump
- `vercel_sandbox` backend wiring across terminal_tool, file_tools,
code_execution_tool, file_operations, approval, skills_tool,
environments/local, credential_files, lazy_deps, prompt_builder,
cli, gateway/run
- `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client
header set, run_agent base-URL header/reasoning special-cases
- `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock
- env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`,
`VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`,
`TERMINAL_VERCEL_RUNTIME`
- Tests: deletes test_ai_gateway_models.py and
test_vercel_sandbox_environment.py; scrubs references across 23
surviving test files (no entire tests deleted unless they were
dedicated to AI Gateway / Sandbox)
- Docs: provider tables, env-var reference, setup guides, security
notes, tool config, terminal-backend tables — English plus zh-Hans
i18n parity
- `hermes-agent` skill: provider table entry and remote-backend list
What stays (intentional):
- `popular-web-designs/templates/vercel.md` — CSS design reference,
unrelated to Vercel-the-AI-product
- `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN
response header, useful diag signal on any Vercel-hosted endpoint
- `vercel-labs/agent-browser` URL in browser config — lightpanda
browser project, different OSS effort
- `userStories.json` historical contributor entry mentioning Vercel
Sandbox — archive, not active docs
Validation:
- 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`)
- Full repo `py_compile` clean
- Live import of every touched module + invariant check (no
`ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no
`vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`)
* test: convert profile-count check from change-detector to invariant
The hardcoded "== 34" assertion broke when ai-gateway was removed.
Per AGENTS.md change-detector-test guidance, assert the relationship
(registry count >= number of plugin dirs) instead of a literal count.
Counts shift when providers are added/removed; that's expected.
qwen3.7-max on OpenCode Go rejects the OpenAI-compatible (oa-compat)
format with HTTP 401 but works correctly via the Anthropic Messages
endpoint (/v1/messages with x-api-key auth). Route it the same way
MiniMax models are routed: anthropic_messages api_mode.
Changes:
- hermes_cli/models.py: add qwen3.7-max routing + curated list
- hermes_cli/setup.py: add to setup wizard model list
- hermes_cli/auth.py: update provider comment
- tests: add assertions for qwen3.7-max api_mode routing
#27385 reports that on macOS the browser sees the xAI 'authorization
received' success page but Hermes still raises xai_callback_timeout.
The loopback HTTP handler was silent — no log line on receipt, no log
line on wait timeout — so triaging the gap between 'browser saw
success' and 'CLI saw timeout' required either a code change or
guesswork.
Adds two INFO log lines:
- Per callback hit (handler): path, has_code, has_state, has_error,
truncated User-Agent. Booleans / fingerprints only — no actual
code/state strings leak.
- On wait timeout: report whether result.code or result.error was
populated at deadline. Distinguishes three failure modes:
1. No hit log + timeout log w/ has_code=False has_error=False
→ xAI's IDP never reached the loopback (firewall, port-binding,
IPv6/IPv4 mismatch, browser blocked private-network access).
2. Hit log w/ has_code=False has_error=False + timeout log
→ xAI hit the loopback without OAuth params (the bare-URL
case the handler already 400s on).
3. Hit log w/ has_code=True + timeout log w/ has_code=False
→ result_lock contention or race; would indicate a real bug.
133/133 in tests/hermes_cli/test_auth_xai_oauth_provider.py,
tests/hermes_cli/test_xai_oauth_pkce_token_exchange.py, and
tests/run_agent/test_codex_xai_oauth_recovery.py.
X Premium+ also grants Grok OAuth access — the 'SuperGrok Subscription'
wording suggested SuperGrok was the only entitlement path. Updated to
'SuperGrok / Premium+' across the picker label, setup wizard, auth flows,
and docs so Premium+ subscribers know the row applies to them too.
* fix(minimax-oauth): refresh short-lived access tokens per request
MiniMax OAuth issues ~15-minute access tokens. The Anthropic SDK caches
api_key as a static string at client construction, so a session that
resolves credentials once at startup keeps sending the same bearer until
MiniMax returns 401 mid-session.
Swap the static string for a callable token provider, reusing the existing
Entra-ID bearer-hook infrastructure in build_anthropic_client. The callable
re-reads auth.json on each invocation and calls _refresh_minimax_oauth_state,
which is a no-op when the token still has more than 60s of life left and
refreshes proactively otherwise. Refreshes persist to auth.json so other
processes (gateway, cron) see them immediately.
The wire-up lives at the agent-init / model-switch boundary rather than in
resolve_runtime_provider, so aux client paths that hand the api_key string
to OpenAI(api_key=...) are unaffected.
* docs: add infographic for minimax-oauth token refresh
@memosr's PR #27612 put the inference_base_url allowlist check only at the
Nous proxy adapter forward boundary. The poisoned URL, however, lands in
``auth.json`` upstream of that — at five refresh / agent-key-mint payload
read sites inside ``resolve_nous_runtime_credentials`` and
``_extend_state_from_refresh``. Without gating those sites, a single MITM
on a refresh response persists the attacker's URL across restarts, even
if the proxy adapter's defense-in-depth check would later catch it on
the way out.
Replace ``_optional_base_url`` with ``_validate_nous_inference_url_from_network``
at all five Portal-network reads:
- hermes_cli/auth.py L4840 (refresh-only access-token path)
- hermes_cli/auth.py L4876 (mint payload path)
- hermes_cli/auth.py L5154 (terminal-runtime access-token refresh)
- hermes_cli/auth.py L5262 (cross-process serialized refresh)
- hermes_cli/auth.py L5317 (terminal-runtime mint payload)
The state-read path at L5025 (``state.get("inference_base_url")``) is
deliberately NOT gated — pre-existing state in ``auth.json`` is either
already validated (it came from one of the five network sites above) or
set by a trusted local actor (manual edit, ``_setup_nous_auth`` test
fixture, ``hermes login nous`` against a staging endpoint via the
documented ``NOUS_INFERENCE_BASE_URL`` env override). Direct write_file /
patch tampering with auth.json is independently blocked by PR #14157.
Adds tests/hermes_cli/test_nous_inference_url_validation.py covering:
- validator https + host + edge-case rules (12 cases)
- all 5 network call sites grep contracts (no _optional_base_url
regression possible without test failure)
- proxy adapter defense-in-depth check still present
- env override path NOT gated (documented dev/staging behaviour)
18 new tests, all 119 Nous-auth tests green.
The Nous Portal proxy adapter forwards minted ``agent_key`` bearer tokens
to whatever ``base_url`` ``resolve_nous_runtime_credentials()`` returns,
which is read directly from the refresh / agent-key-mint response and
persisted to ``~/.hermes/auth.json``. With no validation beyond a
trailing-slash strip, a poisoned URL (Portal-side MITM, or local write
to auth.json) gets forwarded the legitimate bearer on every subsequent
proxy request — exfiltrating the user's inference budget and opening a
response-injection channel back into the IDE / chat client.
Add ``_validate_nous_inference_url_from_network()`` in ``hermes_cli.auth``:
an https + host-allowlist check that returns None for anything outside
``inference-api.nousresearch.com``, so callers fall back to the
documented default rather than ship the bearer to an attacker.
This commit wires the validator into the proxy adapter at
``nous_portal.py``. A follow-up commit wires it into the four refresh /
mint sites in ``auth.py`` so the poisoned URL never lands in auth.json
in the first place.
The env-var override path (``NOUS_INFERENCE_BASE_URL``) bypasses
validation by design — that's the documented staging/dev escape hatch
and the env source is already trusted (the user set it themselves).
Co-authored-by: memosr <mehmet.sr35@gmail.com>
Five call sites do os.chmod(path.parent, 0o700) without checking that
the parent resolves to a safe directory. If HERMES_HOME or another
path env var resolves to /, the chmod strips traversal permission from
the root inode and bricks the entire host.
Add secure_parent_dir() to hermes_constants.py that refuses to chmod
/ or any top-level directory (depth < 2). Replace all 5 call sites
with this helper.
Fixes#25821
XAI_BASE_URL / HERMES_XAI_BASE_URL let users repoint the OAuth-authenticated
inference endpoint, but the env override was an unguarded credential-leak
vector: a tampered .env or hostile shell init setting
XAI_BASE_URL=https://attacker.example/v1 would silently ship the SuperGrok
OAuth bearer to a third party on every request.
Add _xai_validate_inference_base_url() that pins the host to x.ai or a
*.x.ai subdomain and rejects non-HTTPS. On rejection, fall back to the
default with a warning rather than raise — a bad env var should not
deadlock auth, but should never leak the bearer either.
Apply at all three sites that read the env override for xai-oauth:
- hermes_cli/auth.py resolve_xai_oauth_runtime_credentials (main path)
- hermes_cli/auth.py _xai_oauth_loopback_login (initial login)
- agent/auxiliary_client.py _resolve_xai_oauth_for_aux (aux client)
E2E validated against four scenarios: attacker.example, lookalike
api.x.ai.evil.com, http:// downgrade on api.x.ai, and legit custom.x.ai
subdomain (which still resolves correctly).
Discovered while comparing against the opencode-grok-auth plugin
(github.com/ysnock404/opencode-grok-auth), which highlighted the same
guard on the OpenCode side.
1. trajectory_compressor.py: yaml.safe_load() returns None on empty
files, crashing with TypeError on `if 'tokenizer' in data`. Fix by
adding `or {}` fallback. (HIGH — blocks startup with empty config)
2. 6 files with fcntl.flock(LOCK_UN) in finally blocks without
try/except: cron/scheduler.py, hermes_cli/auth.py,
agent/shell_hooks.py, tools/skill_usage.py,
tools/environments/file_sync.py, tools/memory_tool.py. If unlock
raises OSError, fd.close() is skipped and the lock is held forever.
The msvcrt branches already had try/except; the fcntl branches did
not. Fix by wrapping in try/except (OSError, IOError): pass.
3. agent/copilot_acp_client.py line 639: TOCTOU race — path.exists()
followed by path.read_text() with no try/except. If file is deleted
between the check and the read, FileNotFoundError propagates. Fix
by using try/except FileNotFoundError.
4. gateway/sticker_cache.py: non-atomic write via Path.write_text()
can leave truncated JSON on crash, causing JSONDecodeError on next
load. Fix by writing to tempfile + fsync + os.replace (atomic).
xAI Grok OAuth (and Spotify) use a loopback redirect to
``http://127.0.0.1:<port>/callback`` to capture the authorization
code. That works when the browser and Hermes run on the same
machine, and the SSH tunnel recipe handles the regular remote
case. It breaks completely on **browser-only remote consoles**
(GCP Cloud Shell, GitHub Codespaces, AWS EC2 Instance Connect,
Gitpod, Replit, …) where the user has a browser but no real SSH
client to forward a port — the redirect to 127.0.0.1 on the
remote VM simply isn't reachable from the laptop, and there's
nothing the existing flow can do about it (#26923).
This commit adds the foundation for a manual-paste fallback:
* ``_is_remote_session`` now also recognises Cloud Shell,
Codespaces, Gitpod, Replit, StackBlitz (in addition to SSH),
so the existing tunnel hint at least fires in those
environments.
* ``_parse_pasted_callback`` accepts any of: a full
``http(s)://...?code=...&state=...`` URL, a bare ``?code=...``
query string, a bare ``code=...&state=...`` fragment, or a
bare opaque code value. Returns the same dict shape the HTTP
callback handler produces, so the caller's state / error
validation works unchanged (no CSRF bypass).
* ``_prompt_manual_callback_paste`` reads stdin with a clear
multi-line explanation of what's happening and what to paste.
* ``_xai_oauth_loopback_login`` gains a ``manual_paste`` kwarg
that skips the HTTP listener entirely. The redirect_uri,
PKCE verifier, state, and nonce are byte-identical to the
loopback path so xAI's token endpoint can't tell the
difference at the protocol level.
* ``_print_loopback_ssh_hint`` now also mentions
``--manual-paste`` so users without a real SSH client see a
path forward instead of a dead-end tunnel recipe.
* ``_login_xai_oauth`` threads ``args.manual_paste`` into the
loopback helper.
xAI's token endpoint returns HTTP 403 to the OAuth grant when the
account isn't on the allowlist for API access (e.g. standard
SuperGrok subscribers — see #26847). Treating it like a stale-token
400/401 made ``format_auth_error`` append "Run ``hermes model`` to
re-authenticate", which is misleading because re-login can't change
xAI's tier decision.
Split 403 off in both ``refresh_xai_oauth_pure`` and the loopback
login token exchange:
* New error code ``xai_oauth_tier_denied`` with ``relogin_required=False``
* Message explains the entitlement gate and points at the
``XAI_API_KEY`` + ``provider: xai`` fallback
* 400/401 still set ``relogin_required=True`` as before
* 5xx still set ``relogin_required=False`` as before
resolve_xai_oauth_runtime_credentials() called _refresh_xai_oauth_tokens()
with no try/except. A terminal refresh failure (HTTP 400/401/403 —
invalid_grant, token revoked) propagated without clearing the dead
access_token / refresh_token from auth.json, causing every subsequent
session to retry the same doomed network request.
Add a try/except around the refresh call that mirrors the existing
credential_pool.py quarantine: when _is_terminal_xai_oauth_refresh_error
identifies a non-retryable failure, clear the dead token fields from
auth.json and write a last_auth_error diagnostic marker so future calls
fail fast with a clear relogin_required error instead of hitting the
network.
active_provider is preserved (set_active=False) so multi-provider users
whose chosen provider is not xai-oauth are unaffected.
Tests: two new cases in test_auth_xai_oauth_provider.py cover terminal
quarantine and transient pass-through.
resolve_minimax_oauth_runtime_credentials called _refresh_minimax_oauth_state
without a try/except, so a terminal failure (invalid_grant,
refresh_token_reused, invalid_refresh_token) raised AuthError but left
the dead refresh_token in auth.json. Every subsequent API call retried
the same token via a network round-trip, failing identically each time.
Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_*) and write a last_auth_error quarantine marker
to auth.json before re-raising. The next call sees no access_token and
fails fast with 'not_logged_in' — no network retry — and the user is
prompted to re-authenticate.
Mirrors the existing quarantine pattern for Nous (_quarantine_nous_oauth_state),
xAI-OAuth (#28116), and Codex-OAuth (#28118). Persist failure is
best-effort (logged at DEBUG, error still re-raised).
Salvaged from #28003 by @EloquentBrush0x — contributor's branch was
severely stale (would have reverted ~5000 LOC across azure/kanban/i18n
subsystems); fix re-applied surgically with their pattern preserved and
added two regression tests (terminal-quarantines + transient-does-not-quarantine).
When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403,
token revoked or reused), _mark_exhausted was called but auth.json was left with
the dead credentials. On the next session, _seed_from_singletons re-read
auth.json and re-seeded the pool with the same revoked token, triggering the
same terminal failure in a loop.
Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine
block in _refresh_entry: when a terminal error is detected and auth.json holds
no newer tokens, clear access_token/refresh_token from auth.json and remove all
device_code-sourced pool entries from memory. Mirrors the Nous quarantine added
in c90556262 and the xAI quarantine in #28116.
Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure,
matching the xAI and Nous patterns, to avoid refresh_token_reused races when
multiple Hermes processes share the same auth.json singleton.
Salvaged from #27911 by @EloquentBrush0x — contributor's branch was severely
stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems);
fix re-applied surgically on current main with their predicate and tests preserved.
When refresh_xai_oauth_pure raises a terminal error (HTTP 400/401/403,
i.e. revoked or reused refresh token), _refresh_entry's existing race-
recovery path re-syncs from auth.json and returns if another process has
already rotated the tokens. If auth.json still holds the same stale
token pair, the function fell through to _mark_exhausted — leaving the
dead credentials in auth.json. On the next Hermes startup _seed_from_singletons
re-seeded the pool from those stale tokens, causing the same failure loop
on every session.
Fix: after the auth.json re-sync check in the xAI-oauth error handler,
detect terminal errors with the new _is_terminal_xai_oauth_refresh_error
helper and apply a quarantine:
- Clear access_token and refresh_token from providers["xai-oauth"]["tokens"]
in auth.json so they are not re-seeded.
- Write a last_auth_error entry for hermes doctor / auth status diagnostics.
- Remove all loopback_pkce entries from the in-memory pool so the current
session stops retrying with the dead credentials.
Mirrors the identical quarantine already in place for Nous OAuth
(c90556262).
Closes the parity gap introduced when c90556262 added Nous-only terminal
error handling without a corresponding xAI-oauth path.
When xAI's auth backend fails to redirect (e.g. the German "We couldn't reach
your app" fallback shown in #27385), users sometimes navigate manually to the
bare loopback callback URL — `http://127.0.0.1:<port>/callback` with no query
string. The handler used to return 200 "xAI authorization received" for any
GET that hit the expected path, because `parse_qs("")` yields no `code` and no
`error`, leaving `result` untouched while the success page was still served.
The CLI's wait loop, of course, still saw no code and timed out with
`AuthError: xAI authorization timed out waiting for the local callback.`
The user is left looking at a browser tab that claims success and a terminal
that says failure — exactly the contradiction in #27385.
This change makes the empty-callback case return 400 with an explicit
"not received" page and a hint to retry `hermes auth add xai-oauth`. The
wait-loop semantics are unchanged: `result["code"]` and `result["error"]`
both stay None, so the CLI still raises a real timeout rather than treating
the bare hit as a successful callback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Quarantine Nous OAuth state when refresh fails with terminal invalid_grant/invalid_token errors. Clear local and shared refresh material across runtime, managed access-token, proxy, and credential-pool paths so Hermes stops retrying revoked refresh sessions.
xAI's OAuth implementation at ``auth.x.ai`` validates the PKCE
``code_challenge`` at the **token** endpoint, not just at the
authorize step. When Hermes sends the standards-compliant token
POST with ``code_verifier`` alone — exactly what RFC 7636 §4.5
prescribes — xAI rejects the exchange with ``code_challenge is
required`` and the user is stuck with no working OAuth login.
The fix:
* Extract the token POST into ``_xai_oauth_exchange_code_for_tokens``
so the wire format is unit-testable in isolation.
* Send the original ``code_challenge`` and ``code_challenge_method``
in the form body alongside ``code_verifier``. Strict RFC-compliant
servers ignore the extras at the token endpoint, and xAI's
permissive implementation accepts the exchange. This is the
standard "defensive echo" workaround used by every OAuth client
that targets a server with this quirk.
* Refuse to fire the POST when ``code_verifier`` is empty — leaking
the authorization code to a server that can't redeem it is worse
than failing locally with an actionable error. The new error
code is ``xai_pkce_verifier_missing`` and the message points at
this issue for context.
* Surface the HTTP status code prominently in the 4xx error message
(``xAI token exchange failed (HTTP 400). Response: …``) so users
and maintainers can tell a 400 (bad request / PKCE problem) from
a 403 (tier denied, see #26847) at a glance instead of parsing
the JSON body by eye.
Closes#26990
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.
All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.
Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- `ruff check` clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
tests/tools/: 18187 passed, 59 pre-existing failures (verified against
origin/main with the same shape — identical failure count, identical
category — all xdist test-order flakes unrelated to this change)
Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).
Two loopback-redirect OAuth flows (xAI Grok, Spotify) silently fail when
Hermes runs on a remote host: the auth server redirects to
127.0.0.1:<port> on the user's laptop, not on the remote box. The
--no-browser flag only suppresses webbrowser.open() — it doesn't change
the bind address. Symptom xAI surfaces is 'Could not establish
connection. We couldn't reach your app.', followed by a 'xAI
authorization timed out waiting for the local callback' on the CLI side.
Changes
- hermes_cli/auth.py: new _print_loopback_ssh_hint() helper, called from
_xai_oauth_loopback_login() and _spotify_login() right after they
print the redirect URI. Silent off SSH; on SSH prints the exact
'ssh -N -L <port>:127.0.0.1:<port>' command using the actually-bound
port (not the hardcoded constant — the listener auto-bumps when the
preferred port is busy), a provider-specific docs URL, and a link to
the new shared guide.
- website/docs/guides/oauth-over-ssh.md (new): single source of truth
for the tunnel pattern — TL;DR command, jump-box / ProxyJump variant,
mosh+tmux+ControlMaster gotchas, troubleshooting.
- website/docs/guides/xai-grok-oauth.md: fix the two sections that
claimed --no-browser alone was enough; link to the shared guide.
- website/docs/user-guide/features/spotify.md: expand the existing
one-liner; link to the shared guide.
- website/sidebars.ts: register the new page.
- tests/hermes_cli/test_auth_loopback_ssh_hint.py: 7 unit tests
covering SSH-vs-not, loopback-vs-not, malformed URIs, port echo,
with and without provider docs URL.
Drop accounts.mouseion.dev and localhost:20000 / 127.0.0.1:20000 from
the loopback callback CORS allowlist — leftover dev origins. The
redirect_uri is bound to 127.0.0.1 and gated by PKCE + state, so only
xAI's own auth origins are needed.
Co-Authored-By: Jaaneek <Jaaneek@users.noreply.github.com>
Adds a new authentication provider that lets SuperGrok subscribers sign
in to Hermes with their xAI account via the standard OAuth 2.0 PKCE
loopback flow, instead of pasting a raw API key from console.x.ai.
Highlights
----------
* OAuth 2.0 PKCE loopback login against accounts.x.ai with discovery,
state/nonce, and a strict CORS-origin allowlist on the callback.
* Authorize URL carries `plan=generic` (required for non-allowlisted
loopback clients) and `referrer=hermes-agent` for best-effort
attribution in xAI's OAuth server logs.
* Token storage in `auth.json` with file-locked atomic writes; JWT
`exp`-based expiry detection with skew; refresh-token rotation
synced both ways between the singleton store and the credential
pool so multi-process / multi-profile setups don't tear each other's
refresh tokens.
* Reactive 401 retry: on a 401 from the xAI Responses API, the agent
refreshes the token, swaps it back into `self.api_key`, and retries
the call once. Guarded against silent account swaps when the active
key was sourced from a different (manual) pool entry.
* Auxiliary tasks (curator, vision, embeddings, etc.) route through a
dedicated xAI Responses-mode auxiliary client instead of falling back
to OpenRouter billing.
* Direct HTTP tools (`tools/xai_http.py`, transcription, TTS, image-gen
plugin) resolve credentials through a unified runtime → singleton →
env-var fallback chain so xai-oauth users get them for free.
* `hermes auth add xai-oauth` and `hermes auth remove xai-oauth N` are
wired through the standard auth-commands surface; remove cleans up
the singleton loopback_pkce entry so it doesn't silently reinstate.
* `hermes model` provider picker shows
"xAI Grok OAuth (SuperGrok Subscription)" and the model-flow falls
back to pool credentials when the singleton is missing.
Hardening
---------
* Discovery and refresh responses validate the returned
`token_endpoint` host against the same `*.x.ai` allowlist as the
authorization endpoint, blocking MITM persistence of a hostile
endpoint.
* Discovery / refresh / token-exchange `response.json()` calls are
wrapped to raise typed `AuthError` on malformed bodies (captive
portals, proxy error pages) instead of leaking JSONDecodeError
tracebacks.
* `prompt_cache_key` is routed through `extra_body` on the codex
transport (sending it as a top-level kwarg trips xAI's SDK with a
TypeError).
* Credential-pool sync-back preserves `active_provider` so refreshing
an OAuth entry doesn't silently flip the active provider out from
under the running agent.
Testing
-------
* New `tests/hermes_cli/test_auth_xai_oauth_provider.py` (~63 tests)
covers JWT expiry, OAuth URL params (plan + referrer), CORS origins,
redirect URI validation, singleton↔pool sync, concurrency races,
refresh error paths, runtime resolution, and malformed-JSON guards.
* Extended `test_credential_pool.py`, `test_codex_transport.py`, and
`test_run_agent_codex_responses.py` cover the pool sync-back,
`extra_body` routing, and 401 reactive refresh paths.
* 165 tests passing on this branch via `scripts/run_tests.sh`.
`hermes tools` -> "All Platforms" took ~14s to render the checklist
because building the toolset labels called `get_nous_auth_status()` ~31x
transitively (`_toolset_has_keys` -> `_visible_providers` ->
`get_nous_subscription_features` -> `managed_nous_tools_enabled`).
Each call did a synchronous OAuth refresh POST to
portal.nousresearch.com (~350ms even on the failure path), so one menu
paint burned >13s of HTTP and 31 single-use Nous refresh tokens.
Secondary hot spot: every `get_env_value()` re-read and re-sanitised
the entire .env file. 116 reads with O(lines x known-keys) scanning
added ~300ms of CPU per render.
Fix is two process-level caches, both mtime-keyed so login/logout/edit
invalidate naturally:
* `hermes_cli/auth.py`: memoise `get_nous_auth_status()` for 15s keyed
on auth.json mtime. Splits `_compute_nous_auth_status()` as the
uncached impl. Adds `invalidate_nous_auth_status_cache()`.
* `hermes_cli/config.py`: memoise `load_env()` keyed on .env
(path, mtime, size). Adds `invalidate_env_cache()`, wired into
`save_env_value`, `remove_env_value`, and the sanitize-on-load
writer so writers don't return stale dicts on same-second writes.
Before/after on Teknium's box (real HERMES_HOME, no Nous login):
* "All Platforms" cold path: ~13,874ms -> ~691ms label-build
* Warm re-open within the same process: ~122ms -> ~17ms
Side benefit: stops burning a Nous refresh token on every menu paint,
which was risking the portal's reuse-detection revocation logic.
- Rename 'Alibaba Cloud (DashScope)' display label to 'Qwen Cloud'
in CANONICAL_PROVIDERS (model picker, /model, hermes model TUI) and
PROVIDER_REGISTRY (setup wizard prompts, status output).
- Move Qwen Cloud (alibaba) up to position 6 — directly below
OpenAI Codex and above Xiaomi MiMo.
- Move Qwen OAuth (Portal) (qwen-oauth) to the bottom of the
canonical provider list.
Provider slug 'alibaba' is unchanged — only the display label
moved. DashScope env var (DASHSCOPE_API_KEY) and base URL are
unchanged. The separate 'alibaba-coding-plan' plugin provider is
not affected.
Handle MiniMax OAuth expiry values consistently across CLI and dashboard
flows, fix CLI status/add behavior, and force pooled OAuth runtime
requests through Anthropic Messages.
- web_server._minimax_poller: parse expired_in via the shared resolver
so unix-ms absolute timestamps stop landing as TTL seconds and crashing
with 'year 583911 is out of range' when a user connects MiniMax OAuth
from the dashboard.
- auth._minimax_oauth_login / _refresh_minimax_oauth_state: same fix on
the CLI login + refresh paths.
- auth.get_auth_status: dispatch minimax-oauth to its dedicated status
function instead of falling through.
- auth_commands.auth_add_command: 'hermes auth add minimax-oauth' now
starts the device-code login flow and persists a pool entry with the
access + refresh tokens, instead of requiring credentials to already
exist.
- runtime_provider._resolve_runtime_from_pool_entry: pin pooled
minimax-oauth credentials to anthropic_messages so a stale
model.api_mode: chat_completions can't send requests to
/anthropic/chat/completions and trigger MiniMax nginx 404s.
Co-authored-by: Cursor <cursoragent@cursor.com>
Free-tier users were seeing 'No free models currently available.' in the
`hermes model` and post-login pickers even though qwen/qwen3.6-plus is
free on the Portal right now. Three independent breakages compounded:
1. The docs-hosted catalog manifest at website/static/api/model-catalog.json
was not regenerated when _PROVIDER_MODELS['nous'] was updated, so users
fetching the manifest got a list that didn't include qwen/qwen3.6-plus.
2. _resolve_nous_pricing_credentials() returned ('', '') on any auth blip,
collapsing get_pricing_for_provider('nous') to {} and making every
curated model fall through the free-tier filter as 'paid'.
3. Even with healthy pricing, the picker only ever showed models from the
in-repo curated list intersected with live pricing — a Portal-flagged
free model not yet in the curated list could never appear.
Changes:
- hermes_cli/models.py: new union_with_portal_free_recommendations() that
augments the curated list with Portal freeRecommendedModels entries
(with synthetic free pricing so partition keeps them). The Portal's
/api/nous/recommended-models endpoint is now the source of truth for
free-tier surfacing — old Hermes builds will see new free models
without a CLI release.
- hermes_cli/models.py: _resolve_nous_pricing_credentials() falls back to
the public inference base URL when runtime cred resolution fails.
The /v1/models endpoint exposes pricing without auth, so silently
returning {} just because a refresh token expired was wrong.
- hermes_cli/auth.py + hermes_cli/main.py: both free-tier picker call
sites call union_with_portal_free_recommendations() before partition.
- tests/hermes_cli/test_models.py: 7 tests covering union behaviour
(prepend, dedup, end-to-end with stale pricing, empty/missing/error
payloads, invalid entries).
- tests/hermes_cli/test_model_catalog.py: drift guard
TestManifestMatchesInRepoLists fails CI when _PROVIDER_MODELS['nous']
or OPENROUTER_MODELS is edited without re-running
scripts/build_model_catalog.py. Verified empirically that removing a
manifest entry triggers an assertion with an actionable error message.
Validation:
- 133/133 targeted tests pass (test_models, test_model_catalog,
test_auth_nous_provider).
- Live E2E against the real Portal:
- Stale curated list ['claude-opus','claude-sonnet','gpt-5.4'] (no
qwen) → after union: ['qwen/qwen3.6-plus', ...] →
partition(free_tier=True): selectable=['qwen/qwen3.6-plus'].
- Simulated expired refresh token → anon fetch returns 403 pricing
entries including qwen/qwen3.6-plus -> {prompt:0, completion:0}.
- ruff: clean.
Replace with for all literal-tuple
membership tests. Set lookup is O(1) vs O(n) for tuple — consistent
micro-optimization across the codebase.
608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining.
133 files, +626/-626 (net zero).
## Why
Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:
- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.
This commit does three things:
1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.
## What changed
### 1. psutil as a core dependency (pyproject.toml)
Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.
### 2. `gateway/status.py::_pid_exists`
Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.
### 3. `os.killpg` migration to psutil (7 callsites, 5 files)
- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
`# windows-footgun: ok` — the pgid semantics psutil can't replicate,
and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`
### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)
Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:
1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode
Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
`shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds
### 5. CI integration
A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:
```yaml
name: Windows footgun check
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: {python-version: "3.11"}
- run: python scripts/check-windows-footguns.py --all
```
### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion
Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.
### 7. Baseline cleanup (91 → 0 findings)
- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
`encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
tool subprocess management → annotated with
`# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)
## Verification
```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).
$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```
Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.