The system prompt's 'Conversation started:' line carried minute precision
(%I:%M %p), making it byte-unstable across every rebuild path. Within a
CLI session the in-memory cache held, but on the gateway path (fresh
AIAgent per turn → restore from session DB), any silent failure in the
read or write path dropped the cache stem and forced a full re-prefill
on every subsequent turn. Local prefix-caching backends (llama.cpp /
vLLM) saw this as KV-cache invalidation; remote prefix-caching providers
saw it as an Anthropic-style cache miss.
Three changes:
1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM').
System prompt now byte-stable for the full day. The model can still
query exact time via tools when it actually needs it. Credit:
@iamfoz (PR #20451).
2. Loud logging on session DB write failures. The update_system_prompt
call used to log at DEBUG, hiding disk-full / locked-database / schema
drift behind a silent fall-through that forced fresh rebuilds on
every subsequent turn. Now WARN with the session id and exception so
persistent issues show up in agent.log without verbose mode.
3. Three-way stored-state distinction on read. The previous
'session_row.get("system_prompt") or None' collapsed three states
into one (missing row / null column / empty string). Now we tell them
apart and WARN when a continuing session lands on null/empty (which
means the previous turn's write never persisted — every subsequent
turn rebuilds and the prefix cache misses every time).
The restore block is extracted into _restore_or_build_system_prompt()
so the prefix-cache path can be unit-tested in isolation.
E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary
sleep restores byte-identical bytes from the session DB. NULL stored
prompt fires the new warning. Date-only timestamp survives the rebuild
path. All on real SessionDB, no mocks.
Tests:
- tests/agent/test_system_prompt_restore.py (10 new tests)
- tests/run_agent/test_run_agent.py::TestBuildSystemPrompt::
test_datetime_is_date_only_not_minute_precision
Closes#20451 (date-only), #18547 (prefix stabilization),
#8689 (stabilize timestamp across compression), #15866 (timestamp
caching question), #8687 (compression timestamp), #27339
(claim #3: live timestamp in cached system prompt).
Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>
Grok models hit the same failure modes that OPENAI_MODEL_EXECUTION_GUIDANCE
addresses for GPT/Codex: claiming completion without tool calls
('to be honest, I didn't create the file yet'), suggesting workarounds
instead of using existing tools (proposing a folder-based memory system
when the memory tool exists), replying with plans instead of executing.
TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for any model whose
name contains 'grok' (TOOL_USE_ENFORCEMENT_MODELS). This extends the
follow-on family-specific block — OPENAI_MODEL_EXECUTION_GUIDANCE
(tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks
/ verification / missing_context) — to grok-named models too.
The OPENAI_ prefix is retained for backwards compat with imports/tests;
docstring + inline comment now note that the body is family-agnostic and
the prefix reflects origin, not exclusivity.
Tests cover the OpenRouter slug (x-ai/grok-4.3) and the xai-oauth bare
name (grok-4.3), plus a negative control on claude.
E2E verified against a real AIAgent build of the system prompt for both
xai-oauth and openrouter grok models.
Adds a new 'Auxiliary Capacity-Error Fallback' section to
website/docs/user-guide/features/fallback-providers.md covering:
- The 4-step ladder (primary → fallback_chain → main agent → warn)
- Which errors trigger fallback (402, 429 quota, connection) vs
which respect explicit provider choice (transient 429 rate limits)
- Optional fallback_chain config schema with vision + compression examples
- Recognized quota-error phrases (Bedrock, Vertex AI, generic)
Updates the bottom summary table — every auxiliary task now shows
'Layered (see above)' instead of 'Auto-detection chain' since
explicit-provider users also get the main-agent safety net.
7 new tests:
TestAuxiliaryFallbackLayering (3):
- configured_chain succeeds → main agent fallback NOT consulted
- chain returns nothing → main agent fallback runs and succeeds
- both exhausted → user-visible 'all fallbacks exhausted' warning
fires before the original error is re-raised
TestTryMainAgentModelFallback (4):
- returns (None, None, "") when main provider is 'auto'
- returns (None, None, "") when failed provider == main provider
(no point retrying the same backend)
- resolves the main provider's client when configured correctly
- skips when main provider is marked unhealthy
Layered fallback for auxiliary tasks (compression, vision, tts, web_extract,
session_search, etc.):
1. Primary aux provider (existing)
2. User-configured auxiliary.<task>.fallback_chain (new)
3. Main agent provider + model (new — last-resort safety net)
4. Warn user + re-raise original error (new)
For users on 'auto' (no explicit aux provider), the existing
_try_payment_fallback auto-detection chain runs instead — its Step 1
already IS the main agent model, so they get the same behaviour without
configuration.
The configured fallback_chain config schema comes from #26882 / @zccyman;
the main-agent safety net + exhaustion warning were added on top.
Closes#26882. Builds on the capacity-error gate fix in the previous
commit (#26803 / @Bartok9).
The two TestAuxiliaryClientPoisonedCacheEviction tests were written
when explicit-provider users got no fallback at all on connection
errors — they asserted ConnectionError propagated after eviction
because the fallback gate blocked the auto chain.
After the #26803 fix in the previous commit, capacity errors
(payment/quota/connection) now DO trigger fallback even on explicit
providers. The tests still verify cache eviction (their actual
contract) but now stub _try_payment_fallback so the fallback
machinery does not attempt a real network call.
Closes#26803
Root causes:
1. _is_payment_error() checked for billing keywords (credits, insufficient
funds, billing, payment required) but missed daily token quota exhaustion
phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g.
'Too many tokens per day', 'quota exceeded', 'resource exhausted',
'daily limit'. These are functionally identical to credit exhaustion
(provider cannot serve the request) but don't trigger fallback.
2. The call_llm() fallback chain was gated on resolved_provider == 'auto'.
When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM
proxy, or 'openrouter'), capacity failures (payment/quota/connection)
silently raise instead of trying alternatives. This is overly conservative:
capacity errors mean the provider *cannot* serve the request regardless of
user intent, so alternatives should always be tried.
Fixes:
- Add quota-related keywords to _is_payment_error(): quota_exceeded,
too many tokens per day, daily limit, tokens per day, daily quota,
resource exhausted (Vertex AI gRPC code).
- Allow fallback for capacity errors (payment + connection) even when
resolved_provider is not 'auto'. Rate-limit fallback stays gated on
is_auto to honour explicit provider constraints for transient limits.
- Apply both fixes to sync call_llm() and async acall_llm() paths.
- Add 6 targeted tests for the new quota-error detection cases.
Quarantine Nous OAuth state when refresh fails with terminal invalid_grant/invalid_token errors. Clear local and shared refresh material across runtime, managed access-token, proxy, and credential-pool paths so Hermes stops retrying revoked refresh sessions.
Restructures the security section so the admin/user distinction is a
first-class concept rather than buried under 'Slash Command Access
Control'. The new section makes explicit that:
- Slash commands are the first capability gated by the tier split today
- Future gating (tools, model switching, etc.) will hang off the same
admin/user distinction, so configuring it now is forward-compatible
- Allowlists vs the admin/user split solve different problems and are
contrasted up front
Heading renamed: 'Slash Command Access Control' -> 'Admins vs Regular
Users'. The platform-specific pages (telegram.md, discord.md) keep the
old heading since slash gating IS the only thing they currently gate.
* feat(kanban): orchestrator-driven auto-decomposition on triage
Closes the core gap in the kanban system: dropping a one-liner into Triage
now decomposes it into a graph of child tasks routed to specialist
profiles by description, matching teknium's original vision ("main
orchestrator splits/creates actual tasks, doles them out to each agent").
The build
---------
- hermes_cli/profiles.py: new `description` + `description_auto` fields
on ProfileInfo, persisted in <profile_dir>/profile.yaml. Helpers
read_profile_meta / write_profile_meta. `create_profile` accepts
optional description.
- hermes_cli/profile_describer.py: new module — auto-generate a 1-2
sentence description from a profile's skills + model + name via the
auxiliary LLM (`auxiliary.profile_describer`).
- hermes_cli/main.py: new `hermes profile create --description ...`
flag; new `hermes profile describe [name] [--text ... | --auto |
--all --auto]` subcommand.
- hermes_cli/kanban_db.py: new `decompose_triage_task` atomic helper —
creates N child tasks, links the root as a child of every leaf
(root waits for the whole graph), flips root `triage -> todo` with
orchestrator assignee, records an audit comment + `decomposed` event
in a single write_txn.
- hermes_cli/kanban_decompose.py: new module — calls the auxiliary LLM
(`auxiliary.kanban_decomposer`) with the profile roster + descriptions
to produce a JSON task graph, then invokes the DB helper. Rewrites
unknown assignees to the configured `kanban.default_assignee` (or
the active default profile) so a task NEVER lands with assignee=None.
Falls back to specify-style single-task promotion when the LLM
returns `fanout: false`.
- hermes_cli/kanban.py: new `hermes kanban decompose [task_id | --all]`
CLI verb.
- hermes_cli/config.py: new DEFAULT_CONFIG keys —
kanban.orchestrator_profile, kanban.default_assignee,
kanban.auto_decompose (default True), kanban.auto_decompose_per_tick
(default 3), auxiliary.kanban_decomposer, auxiliary.profile_describer.
- gateway/run.py: kanban dispatcher watcher now runs auto-decompose
before each `_tick_once`, capped by `auto_decompose_per_tick` so a
bulk-load of triage tasks doesn't burst-spend the aux LLM.
- plugins/kanban/dashboard/plugin_api.py: new endpoints —
GET /profiles (list roster + descriptions),
PATCH /profiles/<name> (set description, user-authored),
POST /profiles/<name>/describe-auto (LLM-generate),
POST /tasks/<id>/decompose (run decomposer),
GET/PUT /orchestration (orchestrator/default-assignee/auto-decompose
pickers, with resolved fallbacks echoed back).
- plugins/kanban/dashboard/dist/index.js: new OrchestrationPanel
collapsible — dropdowns for orchestrator profile and default
assignee, auto-decompose toggle, per-profile description editor with
Save and Auto-generate buttons. New ⚗ Decompose button next to
✨ Specify on triage-column task drawers.
Behavior
--------
- A task in Triage gets fanned out into a small DAG of child tasks.
Children with no internal parents flip to `ready` immediately
(parallel dispatch). Children with sibling parents wait. The root
stays alive as a parent of every child — when the whole graph
finishes, it promotes to `ready` and the orchestrator profile wakes
back up to judge completion (the "adds more tasks until done" part
of the original vision).
- `kanban.orchestrator_profile` unset -> falls back to the default
profile (whichever `hermes` launches with no -p flag).
- `kanban.default_assignee` unset -> same fallback. Tasks NEVER end
up unassigned.
- `kanban.auto_decompose=true` (default) runs the decomposer
automatically on dispatcher ticks; manual `hermes kanban decompose`
is always available.
Tests
-----
- tests/hermes_cli/test_kanban_decompose_db.py — 7 tests for the
atomic DB helper (status transitions, dep graph, audit trail,
validation errors).
- tests/hermes_cli/test_kanban_decompose.py — 6 tests for the
decomposer module (fanout, no-fanout fallback, unknown-assignee
rewrite, malformed-JSON resilience, no-aux-client path).
- tests/hermes_cli/test_profile_describer.py — 10 tests for
profile.yaml r/w + the LLM auto-describer (yaml corrupt tolerance,
user-vs-auto description protection, --overwrite, fallback parsing).
E2E
---
- CLI end-to-end: created profiles with descriptions, dropped a triage
task, mocked the aux LLM with a 3-task graph -> verified all three
children were created with the right assignees, the dependency
edges matched the LLM's graph, root flipped to todo gated by every
child, audit comment + `decomposed` event recorded.
- Dashboard end-to-end: started the dashboard against an isolated
HERMES_HOME, verified all four new endpoints via curl (profile
listing, PATCH for description, PUT for orchestration settings,
POST for decompose). Opened the UI in the browser, confirmed the
OrchestrationPanel renders with all three pickers + the per-profile
description editor, typed a description, clicked Save, verified
~/.hermes/profile.yaml was written. Clicked Decompose on the triage
card and confirmed the inline error message surfaced as designed
("no auxiliary client configured").
* feat(kanban): surface decompose mode (Auto/Manual) as a one-click pill
The auto/manual toggle already existed as kanban.auto_decompose (default
true), but it was buried inside the collapsed Orchestration settings
panel — users couldn't tell at a glance which mode they were in. This
hoists it to a pill at the top of the kanban page so the state is always
visible and one click flips it.
UX
- New "⚗ Decompose: AUTO|MANUAL" pill in the kanban header. Emerald
styling when Auto is on (the default), muted/gray when Manual.
- Pill is visible both in the collapsed AND expanded Orchestration
settings views so context is preserved when the user opens the panel.
- Tooltip explains both states + what clicking does.
- Renamed the in-panel "Auto-decompose on triage / Enabled" checkbox
to "Decompose mode / Auto (default) | Manual" for language parity
with the pill.
Behavior preserved
- Default remains Auto (kanban.auto_decompose=true).
- Manual mode restores pre-PR behavior: triage tasks stay in triage
until the user clicks ⚗ Decompose on each card (or runs
`hermes kanban decompose <id>`).
Implementation
- plugins/kanban/dashboard/dist/index.js: load /orchestration on mount
(not just on expand) so the collapsed pill reflects real state.
Render mode pill in both collapsed and expanded headers. Reuses the
existing PUT /api/plugins/kanban/orchestration endpoint — no new
backend, no new tests required.
E2E verified
- Pill renders as "⚗ Decompose: AUTO" on page load (default).
- One click flips to "⚗ Decompose: MANUAL" with muted styling.
- config.yaml on disk shows auto_decompose: false after the flip.
- Second click round-trips back to Auto; config.yaml flips to true.
* feat(kanban): rename mode pill to "Orchestration: Auto/Manual"
Per Teknium feedback — "Decompose" was too implementation-specific.
"Orchestration" is the user-facing concept (the whole pitch is the
orchestrator profile routing work), and the pill is the front door to it.
- Pill text: "Orchestration: Auto" / "Orchestration: Manual" (title case,
no ⚗ prefix, no SHOUTY-CAPS for the mode value)
- In-panel checkbox label: "Orchestration mode" (was "Decompose mode")
- Tooltips updated to match
- No behavior change
* docs(kanban): document decompose, profile descriptions, orchestration mode
Brings the docs site up to parity with the PR. English build verified
locally (npx docusaurus build --locale en) — clean, no new broken links
or anchors. Pre-existing broken-link warnings (rl-training, llms.txt,
step-by-step-checklist, fallback-model) untouched.
- website/docs/reference/cli-commands.md
+ `hermes kanban decompose` action row in the action table, with
pointer to the Auto vs Manual orchestration section.
- website/docs/reference/profile-commands.md
+ `--description "<text>"` flag on `hermes profile create`.
+ Full `hermes profile describe` section: read, --text, --auto,
--overwrite, --all flags with examples.
- website/docs/user-guide/features/kanban.md (the big one)
+ Triage column intro rewritten around the Auto-decompose default
behavior, with pointer to the new Auto vs Manual section.
+ Status action row updated to mention both ⚗ Decompose and
✨ Specify on triage cards.
+ New "Auto vs Manual orchestration" section explaining the two
modes, how to flip them (pill, config), how routing-by-description
works, the no-None-assignee guarantee, plus a config knob table
(auto_decompose, auto_decompose_per_tick, orchestrator_profile,
default_assignee) and the two new auxiliary slots
(kanban_decomposer, profile_describer).
+ REST surface table gains 6 new endpoint rows: /tasks/:id/decompose,
/profiles (GET), /profiles/:name (PATCH), /profiles/:name/describe-auto,
/orchestration (GET + PUT).
- website/docs/user-guide/features/kanban-tutorial.md
+ Triage column blurb updated for Auto by default + Manual via the
pill, with cross-link to the Auto vs Manual orchestration section.
- website/docs/user-guide/profiles.md
+ Blank-profile flow now mentions --description and points to the
kanban routing model for context.
- website/docs/user-guide/configuration.md
+ `kanban_decomposer` and `profile_describer` added to the
`hermes model -> Configure auxiliary models` menu listing.
Port of the run_agent.py changes from #27219 to current main: the
_build_api_kwargs body was extracted into agent/chat_completion_helpers.
build_api_kwargs, so wire the xAI tool-schema sanitization there
(provider in {'xai', 'xai-oauth'} or base_url=api.x.ai). Logs a warning
instead of silently swallowing exceptions, matching the contributor's
review-followup fix.
Co-authored-by: zccyman <zccyman@163.com>
xAI's /responses endpoint rejects pattern and format JSON Schema keywords
in tool schemas with HTTP 400 'Invalid arguments passed to the model'.
The existing strip_pattern_and_format() only walked OpenAI-format tools
({'function': {'parameters': ...}}), missing Responses-format shapes
({'name': ..., 'parameters': ...}) used by codex_responses API mode.
This shows up most often with MCP-derived tools that carry validation
keywords (e.g. domain pattern regex in firecrawl, format: date-time)
through to the wire.
Extends the walk to handle both shapes. Auto-strip wiring is applied
separately in chat_completion_helpers (post-refactor location).
Closes#27197
14 focused tests on the extracted helper
``_xai_oauth_exchange_code_for_tokens`` cover:
Core contract:
* ``code_verifier`` is on the wire (RFC 7636 §4.5).
* ``code_challenge`` + ``code_challenge_method=S256`` are echoed
(the #26990 defense-in-depth that makes xAI's token endpoint
stop rejecting valid exchanges).
* ``grant_type=authorization_code``, ``code``, ``redirect_uri``,
and ``client_id`` are all locked.
* Content-Type is ``application/x-www-form-urlencoded`` (xAI
rejects ``application/json`` on this endpoint).
* The supplied ``token_endpoint`` URL is used verbatim — no
hard-coded constant sneaks in via a future refactor.
* ``timeout_seconds`` is forwarded; floored at 20s.
Sanity guard:
* Empty ``code_verifier`` raises ``xai_pkce_verifier_missing``
with a link to #26990 — and NOTHING is sent. Leaking the auth
code to a server that can't redeem it is the wrong failure mode.
* Empty ``code_challenge`` omits only the defensive echo; the
standards-compliant ``code_verifier`` request still goes out so
RFC-compliant servers keep working.
Error surfacing:
* Non-200 responses include both ``HTTP <status>`` and the body
verbatim — disambiguates 400 (PKCE / bad request) from 403
(tier denied, see #26847).
* Transport errors are wrapped as ``AuthError`` with the
``xai_token_exchange_failed`` code, so the surrounding
``format_auth_error`` UI mapping still fires.
* Non-dict JSON payloads raise ``xai_token_exchange_invalid``.
* 200 happy path returns the parsed payload dict verbatim.
End-to-end wire-format guard:
* A real ``httpx.Client`` with a stub transport captures the bytes
on the wire and asserts every PKCE field round-trips through
``urlencode``. Catches a future refactor that swaps
``data=`` for ``json=`` (which xAI would silently reject).
xAI's OAuth implementation at ``auth.x.ai`` validates the PKCE
``code_challenge`` at the **token** endpoint, not just at the
authorize step. When Hermes sends the standards-compliant token
POST with ``code_verifier`` alone — exactly what RFC 7636 §4.5
prescribes — xAI rejects the exchange with ``code_challenge is
required`` and the user is stuck with no working OAuth login.
The fix:
* Extract the token POST into ``_xai_oauth_exchange_code_for_tokens``
so the wire format is unit-testable in isolation.
* Send the original ``code_challenge`` and ``code_challenge_method``
in the form body alongside ``code_verifier``. Strict RFC-compliant
servers ignore the extras at the token endpoint, and xAI's
permissive implementation accepts the exchange. This is the
standard "defensive echo" workaround used by every OAuth client
that targets a server with this quirk.
* Refuse to fire the POST when ``code_verifier`` is empty — leaking
the authorization code to a server that can't redeem it is worse
than failing locally with an actionable error. The new error
code is ``xai_pkce_verifier_missing`` and the message points at
this issue for context.
* Surface the HTTP status code prominently in the 4xx error message
(``xAI token exchange failed (HTTP 400). Response: …``) so users
and maintainers can tell a 400 (bad request / PKCE problem) from
a 403 (tier denied, see #26847) at a glance instead of parsing
the JSON body by eye.
Closes#26990
Addresses reviewer feedback: when resolve_runtime_provider returns a dict
without the 'provider' key, the result must be None regardless of
configured_provider. This guards against malformed runtime responses.
Test: test_runtime_missing_provider_key_returns_none
Named custom providers (e.g. crof.ai) resolve to provider='custom' at the
runtime level, causing subagents to lose their intended provider identity.
On retry/fallback, resolve_provider_client('custom', model=...) searches all
providers advertising that model and picks non-deterministically, routing to
Z.AI or Bailian instead of the configured target.
The fix preserves configured_provider when runtime['provider'] == 'custom',
restoring the original provider name so routing stays correct through retries.
Adds a named constant _RUNTIME_PROVIDER_CUSTOM instead of a magic string.
Adds three regression tests:
- test_named_custom_provider_preserves_provider_name: the #26954 case
- test_standard_provider_not_overwritten_by_configured_name: openrouter/nous
must still return their own identity, not the configured name
- test_custom_provider_with_empty_configured_provider_falls_back_to_runtime:
empty provider triggers the early-return None path as before
When the dashboard is reverse-proxied under a path prefix
(`X-Forwarded-Prefix: /dashboard`), the SPA already routes its
`/api/...` REST traffic through `HERMES_BASE_PATH` via
`web/src/lib/api.ts`. Three WebSocket URLs constructed elsewhere
were still hardcoded to root `/api/...` and so opened
`wss://host/api/...` instead of `wss://host/dashboard/api/...`,
forcing operators to forward selected root API/WS paths through the
reverse proxy as a workaround (see issue #25547).
Add `HERMES_BASE_PATH` between `host` and `/api/...` in the
three constructed WebSocket URLs:
- `web/src/pages/ChatPage.tsx` — PTY WebSocket
- `web/src/components/ChatSidebar.tsx` — events subscriber
- `web/src/lib/gatewayClient.ts` — JSON-RPC gateway WebSocket
When the dashboard is served at root, `HERMES_BASE_PATH === """
and the URLs are bit-for-bit identical to before. Under a prefix,
the WebSocket connections now go through the same proxy path the
REST calls already use.
Note: bundled dashboard plugins (kanban, hermes-achievements) embed
`"/api/plugins/..."` in their compiled `dist/index.js` and
remain out of scope here — those need source-side fixes per plugin.
Fixes#25547.
`_ws_client_is_allowed()` enforces a loopback-only client check on every
dashboard WebSocket upgrade (`/api/ws`, `/api/events`, `/api/pty`,
`/api/pub`):
def _ws_client_is_allowed(ws):
if _is_public_bind():
return True
client_host = ws.client.host if ws.client else ""
if not client_host:
return True
return client_host in _LOOPBACK_HOSTS
The intent is: when bound to 127.0.0.1, only accept WS upgrades from
loopback peers. Public bind (--insecure) trades that for token-only.
However, `uvicorn.run(app, host=host, port=port, log_level="warning")`
omits `proxy_headers`. In modern uvicorn (>= 0.20) `proxy_headers`
defaults to True and `forwarded_allow_ips` defaults to "127.0.0.1".
With those defaults, any reverse proxy connecting from loopback (nginx,
in-cluster proxy, Cloudflare Tunnel sidecar in HTTP mode, K8s
ingress-nginx) causes uvicorn to rewrite `ws.client.host` from the
request's `X-Forwarded-For` header. So the gate sees the original
client's IP (a public address) instead of the loopback peer, returns
False, and closes every browser WS with code=4403 (surfaces as HTTP
403 to the proxy).
Passing `proxy_headers=False` keeps the loopback gate's view of
`ws.client.host` at the immediate transport peer (the proxy on
127.0.0.1), which is exactly what the gate is designed to check.
The bug is invisible in dev (no proxy → no XFF → ws.client.host stays
loopback). It surfaces in proxied production: dashboard chat tab opens,
events feed banner shows "disconnected — tool calls may not appear",
all WS endpoints return 403. Reproduces with:
curl -i -H "Connection: Upgrade" -H "Upgrade: websocket" \
-H "Sec-WebSocket-Version: 13" -H "Sec-WebSocket-Key: ..." \
-H "X-Forwarded-For: 1.2.3.4" \
"http://127.0.0.1:9119/api/ws?token=\$TOKEN"
# Before: HTTP/1.1 403 Forbidden
# After: HTTP/1.1 101 Switching Protocols
Without the XFF header, both behave the same (101) — confirming the
single-variable trigger.
Discovered while diagnosing why the Hermes dashboard at
mandy.loadmagic.ai (behind nginx + Cloudflare Tunnel + CF Access)
refused all browser WS upgrades despite Access app config matching a
known-working sibling deployment (Simone, which doesn't have nginx in
the path).
The /restart command used a detached subprocess approach to restart
the gateway. In Docker, when the gateway process exits, tini (PID 1)
also exits, causing Docker to stop the container and kill the detached
helper before it can restart the gateway. This made /restart effectively
a /shutdown in containerized deployments.
Detect Docker (/.dockerenv) and Podman (/run/.containerenv) containers
and use the service restart path (exit code 75) instead, letting the
container restart policy handle the actual restart.
Note: requires restart policy that restarts on non-zero exit (e.g.
unless-stopped or on-failure).
Closes#25249 (and supersedes PR #25260) in spirit.
Two bugs in the streaming chat-completions path caused provider timeout
configuration to be silently ignored:
1. Hardcoded connect/pool timeout. The httpx.Timeout for streaming
calls used hardcoded connect=30.0 and pool=30.0 regardless of the
user's providers.<id>.request_timeout_seconds config. If the custom
provider (e.g. Ollama) was unreachable, the call always waited
exactly 30s before failing, ignoring any configured timeout.
Fix: use min(_base_timeout, 60.0) for connect and pool when a
provider timeout is configured, falling back to 30.0 otherwise.
The 60s cap addresses review feedback (TCP handshake shouldn't
wait the inference timeout — connect/pool cover the connection
layer, not model latency).
2. Streaming stale-stream detector ignored provider config. The
stale detector read only HERMES_STREAM_STALE_TIMEOUT (env default
180s). The providers.<id>.stale_timeout_seconds key (correctly
used in the non-streaming path) was never consulted.
Fix: check get_provider_stale_timeout(provider, model) first,
then fall back to the env var. Aligns the streaming path with
the non-streaming path's priority chain (config > env > default).
Salvage shape diverged from PR #25260: the function moved to
agent/chat_completion_helpers.py and the contributor's two commits
(initial fix + 60s-cap review follow-up) are squashed into one final
commit applied at the new location.
Original diagnosis, fix shape, AND the 60s-cap review response from
@zccyman in PR #25260; credited via Co-authored-by.
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
`hermes doctor` displayed OAuth status for Nous, Codex, Gemini, and MiniMax
but silently omitted xAI OAuth, even though `get_xai_oauth_auth_status()`
exists and the same information is already surfaced in `hermes status`.
Add xAI OAuth as a *separate* try/except block so an import failure cannot
silence the already-printed provider rows above it — consistent with the
per-provider isolation introduced in the doctor fallback fix.
Tests:
- 9 new tests in TestDoctorXaiOAuthStatus covering: logged-in ok, not-logged-in
warn, error line present/absent, import failure isolation, runtime exception
and None-return safety.
- 9 existing run_doctor helpers updated to mock get_xai_oauth_auth_status for
deterministic output.
hermes status listed Nous Portal, OpenAI Codex, Qwen OAuth, and MiniMax
OAuth in the Auth Providers section but omitted xAI OAuth entirely.
Users who authenticated via `hermes auth add xai-oauth` had no way to
verify their session state from the status output.
Add xAI OAuth display using the same field shape as OpenAI Codex:
auth_store (Auth file:), last_refresh (Refreshed:), and error when
not logged in. The import is isolated in its own try/except so an
import failure cannot affect the already-printed rows above it.
Tests cover:
- logged in: check mark, auth_store, last_refresh, error suppressed
- not logged in: login command hint, error shown, error absent = no line
- resilience: import failure, status function raises, returns None
- isolation: xAI import failure does not break Nous/MiniMax display