Commit graph

13153 commits

Author SHA1 Message Date
HiddenPuppy
b34771fc06 fix(cli): disable prompt_toolkit CPR queries to stop escape-sequence leak (#13870)
prompt_toolkit's renderer sends ESC[6n cursor-position queries before
painting in non-fullscreen mode; the terminal replies ESC[<row>;<col>R.
Over SSH/cloudflared tunnels and slow PTYs these replies race past the
input parser and land in the display as raw '20;1R21;1R' text, and the
pending-CPR future can stall the renderer so the prompt freezes after the
agent's final answer.

Build the prompt_toolkit output with enable_cpr=False so CPR is marked
NOT_SUPPORTED up front and ESC[6n is never sent. This is the root-cause
counterpart to the existing input-side _strip_leaked_terminal_responses
scrubbing. Vt100_Output.from_pty() does not expose enable_cpr in
prompt_toolkit 3.x, so _build_cpr_disabled_output() reproduces its
get_size setup and calls the constructor directly; it returns None on any
failure so startup falls back to the default output.

Verified in a real PTY: baseline emits 1 ESC[6n query, the fix emits 0,
banner/UI render identically. Layout is unaffected — with CPR off the
renderer sizes the prompt to its preferred height (the same fallback
prompt_toolkit uses on any terminal that doesn't answer CPR).

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-27 04:15:20 -07:00
LeonSGP43
e7c013494d fix(agent): preserve nested API error bodies 2026-06-27 04:13:53 -07:00
Teknium
5ab4136631
fix(webui): switch provider when Config-page model field changes (#53583)
The dashboard Config tab's Model field is a flat string with no provider
info. _denormalize_config_from_web only updated model.default and kept the
stale provider, so picking an OpenRouter model while the default provider was
ollama-local left provider=ollama-local and every call 404'd.

When the model string actually changes, infer the serving provider — curated
catalog first, then a vendor/model-slug heuristic for non-aggregator providers
— and route the switch through the existing _normalize_main_model_assignment /
_apply_main_model_assignment chokepoints so stale base_url/api_mode/api_key are
cleared on a provider change and preserved on a same-provider re-pick. Saving
an unchanged model never re-detects, so unrelated config saves keep an explicit
provider.

Closes #14058
2026-06-27 04:13:44 -07:00
teknium1
7ee0b68973 fix(gateway,feishu): refuse executor resurrection during real shutdown
Add an explicit _closing guard to both owned executors so the
recreate-on-shutdown path only recovers from an *external* teardown of
the loop default — never resurrects a pool the gateway/adapter itself
stopped. _shutdown_*executor() sets the flag; _get_*executor() raises if
closing; feishu connect() re-arms on reconnect. Updates the gateway
recreate test to assert the refusal contract and adds feishu coverage.
2026-06-27 04:13:09 -07:00
teknium1
b296915c82 fix(feishu): route blocking SDK calls through an adapter-owned executor
Feishu SDK calls ran on asyncio's shared default executor, so a torn-down
default executor wedged every send with 'Executor shutdown has been called'
and left the gateway a zombie (#10849). The adapter now owns a
ThreadPoolExecutor recreated on demand if shut down, mirroring the
gateway-owned executor change. Routes all 17 self._client SDK calls through
_run_blocking; shuts the pool down on disconnect.
2026-06-27 04:13:09 -07:00
konsisumer
1011c07966 fix(gateway): use owned executor for agent work 2026-06-27 04:13:09 -07:00
LeonSGP43
52a09d8faf fix(byterover): honor auto extract config 2026-06-27 04:04:15 -07:00
teknium1
f062cf076b fix(agent): also treat provider=ollama as an Ollama GLM backend
Follow-up to the #13971 fix: a genuine native Ollama provider reached
through a reverse proxy carries no ollama/:11434 URL signature, so the
restricted detection would miss it. Add provider=="ollama" as an
explicit True case (idea from #14789, @Tranquil-Flow) and cover both it
and the #13971 LiteLLM-proxy-to-zai false-positive with E2E tests.
2026-06-27 04:03:07 -07:00
YuShu
266521b55f refactor(agent): trim docstring per review feedback
Remove commentary about the previous is_local_endpoint() approach
from _is_ollama_glm_backend() — git history suffices.
2026-06-27 04:03:07 -07:00
YuShu
00a8252b7d fix(agent): scope Ollama/GLM stop-to-length heuristic to Ollama only
The _is_ollama_glm_backend() function was too broad: any local endpoint
running a GLM model was treated as Ollama, triggering the stop->length
misreport heuristic introduced in 8011aa3. This caused false truncation
detection on sglang, vLLM, LM Studio, and other non-Ollama servers that
correctly report finish_reason.

When a GLM model on sglang/vLLM returned finish_reason='stop', the agent
mistakenly reclassified it as 'length' if the response didn't end with
a whitelisted punctuation character (ASCII or CJK). This particularly
affected Chinese-language responses and Markdown-formatted text.

Root cause: the is_local_endpoint() fallback assumed any local GLM
endpoint = Ollama. But many non-Ollama servers also run on localhost.

Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via
its distinctive signatures (port 11434, 'ollama' in URL). All other
local servers are assumed to report finish_reason correctly.

This is the correct tradeoff because:
- False negatives (Ollama at custom port, heuristic not triggered) only
  mean the user sees a truncated response — same as having no heuristic
- False positives (non-Ollama server, heuristic wrongly triggered) inject
  spurious continuation messages into the conversation — strictly worse

Adds two tests:
- sglang GLM response is NOT reclassified as truncated
- Ollama GLM on port 11434 still triggers the heuristic as before

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-27 04:03:07 -07:00
teknium1
ab1f9b94c5 fix(telegram): accept @username chat_id in delivery paths (#13206)
TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed
all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid
literal for int()'. The Telegram Bot API accepts both a numeric chat_id and
an @username string; Hermes was force-coercing every chat_id with int().

Add normalize_telegram_chat_id() (returns int for numeric values, passes
@username strings through) and apply it at the Bot API send/edit sites in
the Telegram adapter and the send_message tool. Username targets are now
recognized as explicit targets in _parse_target_ref.

Reapplies the approach from #13274 (season179), whose branch predated the
gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py
relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah).

Co-authored-by: season179 <season.saw@gmail.com>
2026-06-27 04:01:58 -07:00
teknium1
f2ca3e3d84 fix(gateway): hold _run_restart on _restart_task + explicit cancel-loop skip
Follow-up on the cherry-picked #13173 fix. Holds the _run_restart task in
self._restart_task (a bare asyncio.create_task keeps only a weak reference,
so a still-pending task can be GC'd mid-flight) and explicitly skips it in
the _stop_impl cancel loop alongside _stop_task. Adds AUTHOR_MAP entry for
the contributor and a regression test that fails when the task is cancellable.

Refs #12875
2026-06-27 03:57:31 -07:00
zeapsu
1ce5d6d974 fix(gateway): exclude _run_restart from _background_tasks to prevent zombie on /restart
When request_restart() adds _run_restart to _background_tasks, _stop_impl
later cancels all entries in that set.  Since _run_restart is awaiting
_stop_task at that point, the CancelledError propagates into _stop_impl,
interrupting cleanup before _shutdown_event.set() and _exit_code = 75
execute.  This leaves the gateway as a zombie (alive but disconnected) or
exiting with code 0 instead of 75, preventing systemd Restart=on-failure
from restarting the service.

Fix: don't add _run_restart to _background_tasks — it self-terminates in
~50ms and needs no lifecycle management.

Fixes #12875
2026-06-27 03:57:31 -07:00
teknium1
08e131f77c test(telegram): cover bot self-message ingestion guard (#11905)
Regression tests for the self-author guard added in the salvaged fix:
- bot-authored DM-topic watcher echo is dropped (the exact #11905 symptom)
- bot self-messages dropped in groups/supergroups too
- other bots in the same chat are still processed (self-id, not is_bot)
- observe-unmentioned sibling path also rejects self-messages
- missing from_user does not crash

Test scaffolding ported from @cola-runner's PR #12817 and adapted to the
current plugins/platforms/telegram/adapter.py and _is_own_message().
2026-06-27 03:56:52 -07:00
Sahil-SS9
6fb25f86ac fix(telegram): filter out bot's own messages from inbound processing (#52363) 2026-06-27 03:56:52 -07:00
Teknium
68a65ed7a1
fix(agent_init): correct misleading sub-64K context_length error message (#53569)
The error raised when a model's context window is below the 64K minimum
advertised "or set model.context_length in config.yaml to override" — but
the guard intentionally has no sub-64K escape hatch. Sub-64K models are
rejected by design (tool schemas + system prompt need the headroom).

The misleading clause invited a cluster of dup PRs (#11097, #11110, #8962,
#9142, #37548) all trying to wire an override that we don't want. Reword to
state the real options: pick a >=64K model, or — if your local server
under-reports its true window — declare the real value (which must itself
be >=64K). Guard behavior is unchanged.
2026-06-27 03:56:25 -07:00
Teknium
d73078e7b0
fix(cron): make per-profile cron isolation intentional and tested (#4707) (#53570)
A profile's cron jobs now provably live in AND execute under that profile's
HERMES_HOME. A job authored under profile `coder` is stored at
`~/.hermes/profiles/coder/cron/jobs.json` and runs with coder's .env,
config.yaml, scripts and skills — never the default root's.

This was the de-facto behavior on main but only by accident: PR #50112 had
re-anchored cron storage at the shared default root, and a later stale-branch
squash merge (#52147) silently reverted it back to the profile home. Neither
direction was guarded by a test, so it could flip again on the next stale merge.

Changes:
- cron/jobs.py: document the per-profile storage anchor (get_hermes_home, NOT
  get_default_hermes_root) and why anchoring at the root leaks
  config/credentials/skills across profiles — the #4707 security boundary.
- cron/scheduler.py, cron/suggestions.py: same intent documented at the
  dynamic resolution helper and the suggestions store.
- tests/cron/test_cron_profile_isolation.py: pin storage, lock-path, and
  execution-home resolution to the active profile so a re-anchor can't regress.

Verified E2E: jobs created under two profiles land in separate per-profile
stores with zero cross-profile leakage and no shared-root store; scheduler
execution-home follows the active profile. Full cron suite: 576/576.
2026-06-27 03:55:01 -07:00
Bartok
864d5521ad test(curator): join straggler curator-review thread on fixture teardown
The curator_env fixture left async review threads (synchronous=False spawns
a daemon 'curator-review' thread that calls save_state() on completion)
running past test teardown. save_state() resolves the state path from
HERMES_HOME at write time, so a straggler could write into the next test's
tmp home, corrupting test_state_file_survives_corrupt_read (and others)
under CI load. Join the thread on teardown while HERMES_HOME is still
pinned to this test's home.
2026-06-27 03:52:52 -07:00
Bartok9
45ce35ed72 fix(agent): classify message-only 'overloaded' as server overload
Salvage of #14261 by @ms-alan — rebased onto current main, scoped to the
overloaded-classification fix, with a regression test that fails without it.
2026-06-27 03:52:52 -07:00
teknium1
151ae1e937 test(api-server): cover SSE failure finish_reason for both failure modes
Lock the contract that a clean stream-queue termination followed by an
agent failure never reports finish_reason: "stop". Covers the raised-
exception case (#12422 repro), the flagged failed-result case, truncation
(length), and the success happy path.

Follow-up to the salvaged #12504 fix from @flobo3.
2026-06-27 03:52:44 -07:00
flobo3
b8b695e2cd fix(api): surface agent crash in SSE chat completions stream 2026-06-27 03:52:44 -07:00
Teknium
f67c0b3e60
docs(hermes-agent skill): cover v0.13–v0.17 features, fix stale claims, tighten (#53566)
Refresh the hermes-agent skill against the last 5 major releases and the
current codebase, and cut verbose prose.

Coverage added (v0.13.0–v0.17.0):
- New gateway platforms: iMessage (Photon), Teams, LINE, SimpleX, ntfy,
  Google Chat, Raft, official WhatsApp Business Cloud API (now 20+).
- New surfaces section: desktop app, web dashboard admin panel,
  hermes proxy (OpenAI-compatible OAuth proxy), Automation Blueprints.
- delegate_task(background=true) async subagents; memory-tool atomic
  batch operations; session_search three-mode shape; x_search/video_analyze
  toolsets; image_gen image-to-image; xAI Grok via SuperGrok OAuth.
- display.interface (cli/tui), curator.consolidate opt-in, PyPI install.

Accuracy fixes:
- Adding-a-Tool is two files (auto-discovery), not three.
- Testing uses scripts/run_tests.sh (canonical runner), not bare pytest.
- Dropped change-detector test count and a dangling references/ pointer.
- Refreshed overview (Windows-native, 20+ providers, many surfaces).

Conciseness: trimmed over-explained Windows keybinding/sandbox/test prose
and deep prompt-builder internals to pointers.
2026-06-27 03:51:25 -07:00
Teknium
d3db73210c chore(release): map blaryx@gmail.com → Blaryxoff for PR #32602 salvage 2026-06-27 03:48:18 -07:00
blaryx
76af2456a2 fix(dashboard): merge PUT /api/config with existing on-disk config
The dashboard form is built from CONFIG_SCHEMA, which doesn't enumerate
every root-level key the YAML supports. Most visibly, `custom_providers`
is in `_KNOWN_ROOT_KEYS` but is absent from the schema — so the frontend
never sends it in the PUT body. The previous full-replace save() then
silently wiped the key from disk every time the user clicked anything
that triggered a save. Other casualties (less visible because defaults
re-mask them on load) include `agent.personalities`,
`agent.reasoning_effort`, `terminal.lifetime_seconds`, etc.

Fix: read the raw on-disk config and deep-merge the incoming PUT body
on top of it before saving. The frontend can only overwrite what it
explicitly sends; everything else is preserved verbatim.

Reuses the existing `_deep_merge` helper from `hermes_cli.config`.

Tests:
- `test_round_trip_preserves_custom_providers` exercises the exact bug:
  seed config with custom_providers, GET → drop the key → PUT,
  assert it's still on disk.
- `test_round_trip_preserves_schema_invisible_nested_keys` covers the
  shallow-vs-deep-merge case for nested dicts under `agent` etc.
Both fail on current main; both pass with this patch.
2026-06-27 03:48:18 -07:00
Teknium
ec769e49d2
fix(gateway): WhatsApp/Signal hints affirm markdown instead of forbidding it (#53564)
The 'whatsapp' and 'signal' PLATFORM_HINTS told the agent 'Please do not
use markdown as it does not render' — factually wrong. Both adapters
actively convert markdown to native formatting:

- whatsapp_common.format_message(): **bold**, ~~strike~~, # headers,
  links, code blocks -> WhatsApp native syntax
- signal_format.markdown_to_signal(): same conversions via bodyRanges,
  plus '- item' / '* item' bullets -> '• ' Unicode bullets

The wrong hint made the agent strip bullets and bold the adapter would
have rendered (#12224). Rewrote both hints to mirror whatsapp_cloud:
markdown is auto-converted, bullet lists work, tables are not supported.
Added a contract test asserting markdown-converting platforms never
forbid markdown in their hint.
2026-06-27 03:46:41 -07:00
teknium1
a5d1f68c74 refactor(moa): share one virtual-provider row builder across pickers
Follow-up on the gateway-picker salvage: the cherry-picked change added a
second copy of the MoA virtual-provider row in model_switch.py, duplicating
inventory._moa_provider_row (same slug/name/preset-models, identical extra
fields). Make _moa_provider_row take a bare current_provider string and reuse
it from the gateway picker path so the row shape lives in one place and the
two surfaces can't drift.
2026-06-27 03:43:38 -07:00
dodo-reach
ed54469d06 fix(gateway): show MoA presets in model picker 2026-06-27 03:43:38 -07:00
Teknium
789f8b7dc2
docs(webhook): clarify authenticated != trusted-content trust model (#53562)
HMAC validation authenticates the webhook sender, not the business
fields inside the payload (PR titles, commit messages, issue bodies),
which are authored by untrusted third parties. Expand the prompt-
injection section to make the trust boundary explicit: the agent's
capability surface, not the input channel. Document the hardening
levers (sandbox the runtime, scope the toolset, keep approvals on,
template narrowly) instead of pretending to sanitize untrusted text.

Refs #8820.
2026-06-27 03:43:33 -07:00
teknium1
4e0788783b refactor(gateway): extract MoA one-shot restore helper; restore #28686 comment; real-method tests
Follow-up on the salvaged MoA restore fix:
- Extract the finally-block restore into _restore_moa_one_shot() so the
  behavior is unit-testable without re-implementing it, and so the gateway
  /moa handler and the finally block share one implementation.
- Restore the load-bearing #28686 zombie-eviction comment above
  _release_running_agent_state that the original diff dropped.
- Rewrite the tests to call the real _restore_moa_one_shot helper (the
  originals re-implemented the restore logic inline, so they passed
  regardless of the production code).
2026-06-27 03:43:28 -07:00
srojk34
2f29e3cfc5 fix(gateway): restore MoA one-shot model override on failed turns
The MoA one-shot restore ran inside the try block after
_handle_message_with_agent returned. When that call raised an
exception (agent init failure, interpreter shutdown, OOM), the
restore was skipped and the MoA model override stayed permanently
on _session_model_overrides — silently routing all subsequent
messages through the MoA reference fan-out with no user-visible
indication.

Move the restore to the finally block so it fires on every exit
path (success, exception, interrupt). The restore data lives on
the per-turn event object and would be lost if not consumed here.
2026-06-27 03:43:28 -07:00
briandevans
17cb829991 test(moa): cover non-list/bare-dict reference_models normalization 2026-06-27 03:43:16 -07:00
briandevans
8dd4e576d0 fix(moa): tolerate non-list reference_models in hand-edited MoA preset config 2026-06-27 03:43:16 -07:00
Teknium
60f58a2b95
feat(verify-on-stop): default OFF, one-time migration, skip doc-only edits (#53552)
The verify-on-stop guard fired too eagerly — including on doc/markdown/skill
edits with nothing to verify, where it pushed a pointless /tmp verification
script. Three changes:

1. Default OFF for new installs: agent.verify_on_stop defaults to false
   (was the "auto" surface-aware sentinel). _config_version bumped 30 -> 31.
2. One-time migration (v30 -> v31): existing installs are switched off once,
   but only when the value is missing or still the "auto" sentinel — an
   explicit true/false the user set is preserved.
3. Path filter: build_verify_on_stop_nudge() now drops documentation/prose
   paths (.md/.mdx/.rst/.txt/LICENSE/CHANGELOG/...) so even when explicitly
   enabled, a doc-only turn never nudges. Mixed doc+code turns still nudge on
   the code paths.

The legacy "auto" sentinel is still honored when set explicitly (ON for
interactive coding surfaces, OFF for messaging). HERMES_VERIFY_ON_STOP env
override unchanged.
2026-06-27 03:23:22 -07:00
teknium1
29ee4bbff6 refactor(dashboard): tighten cron-job form helpers
Collapse the three near-identical optional-text helpers
(optionalText/optionalBaseUrl/listToText) into one optionalText with a
strip-trailing-slash flag, route listToText + toolsets through the
existing splitCronList, and replace the repeated
typeof x === 'string' ? x : '' ladders with a single asString helper.
Behavior-identical; all 16 vitest cases pass.
2026-06-27 03:20:32 -07:00
Versun
c655cdf2c1 feat(dashboard): expose cron job execution fields 2026-06-27 03:20:32 -07:00
teknium1
50f6855217 feat(moa): make /moa one-shot only; route preset switching through the model picker
/moa no longer does a sticky model switch. It now always runs a single
prompt through the default MoA preset and restores the prior model
afterward; the whole argument is the prompt (no preset-name matching).
To switch to a MoA preset for the session, select it from the model
picker, where presets already surface under a virtual Mixture of Agents
provider on every model-selection surface.

Also fixes #53444: the TUI one-shot only set session[model_override],
which the already-built cached agent ignored, so MoA silently never ran
and the turn used the original model. The TUI now does a real in-place
agent.switch_model() via _apply_model_switch() when a live agent exists
(with a proper restore after the turn), and falls back to a model_override
for lazy/unbuilt sessions.

Removes the redundant sticky-switch branch from the CLI, gateway, and TUI
/moa handlers; updates the command description, usage string, and docs.
2026-06-27 03:09:09 -07:00
teknium1
3cd4693494 chore: add DiamondEyesFox to AUTHOR_MAP for PR #53351 salvage 2026-06-27 03:04:26 -07:00
diamondeyesfox
8df231c941 fix(agent): rebaseline in-place compression flushes 2026-06-27 03:04:26 -07:00
Mahesh Sanikommu
1b75b3fd90 feat(memory): add Supermemory setup connection summary
Add post_setup() and get_status_config() to the Supermemory memory
provider so `hermes memory setup` and `hermes memory status` print a
one-line connection summary (container, profile fact count,
auto_recall/auto_capture). Point API-key onboarding at the Hermes
connect URL (app.supermemory.ai/integrations?connect=hermes).

Salvage of #52988. Two fixes folded in:

- Test isolation: the new probe/status tests mocked _SupermemoryClient
  but not the __import__("supermemory") guard inside
  _probe_supermemory_connection, so they passed only where the optional
  supermemory package was installed and failed on a clean checkout / CI
  (the PR shipped with red CI). Added _stub_supermemory_importable()
  mirroring the existing test_is_available_false_when_import_missing
  pattern; the suite now passes with supermemory absent.

- post_setup: `if api_key and api_key not in os.environ` checked whether
  the key's *value* named an env var (always false in practice). Fixed to
  compare the value: `os.environ.get("SUPERMEMORY_API_KEY") != api_key`.

Verified: 38/38 in test_supermemory_provider.py and the full
tests/plugins/memory/ suite green with supermemory not installed.

Closes #52988
2026-06-27 15:07:34 +05:30
underthestars-zhy
8827300267 fix(photon): correlate tapbacks to bot message context
Populate `reply_to_message_id`, `reply_to_text`, and
`reply_to_is_own_message` on reaction events so the gateway injects
`[Replying to your previous message: "..."]` when the agent receives
a tapback.

The sidecar now extracts a capped text preview from the hydrated
reaction target (plain text and mixed group messages; null for
attachment/voice-only targets), emitting it as `targetText` in the
NDJSON reaction payload. The Python adapter reads this field and sets
the reply correlation fields on the `MessageEvent`.
2026-06-27 00:51:34 -07:00
underthestars-zhy
4345b3e767 fix(photon): upgrade spectrum-ts sidecar to v8.0.0
v8 made `richlink` outbound-only; inbound rich links now arrive as
plain `text`. Remove the `getBalloonBundleId`/`toRichlinkMessage`
branches from the iMessage mapper patch and update the fixture,
lockfile, and README accordingly.
2026-06-27 00:51:34 -07:00
underthestars-zhy
5636c22828 feat(photon): upgrade spectrum-ts sidecar to v7.0.0
Update the Photon platform plugin's Node.js sidecar from spectrum-ts
3.1.0 to 7.0.0, which splits the SDK into scoped `@spectrum-ts/*`
packages with `spectrum-ts` as the umbrella re-export.

- Bump exact pin in package.json/package-lock.json to 7.0.0
- Update mixed-attachments patch script to target the new
  `@spectrum-ts/imessage/dist/index.js` path and tab-indented output
- Rewrite test fixture to match v7.x mapper shape (tab-indented,
  `const ... = async` declarations, single-line builder calls) and
  point at `@spectrum-ts/imessage/dist/index.js`
- Update README upgrade guide to document the v5 package split and
  the postinstall patch validation step
- Update comments in cli.py and index.mjs to reference v5/v7 changes
2026-06-27 00:51:34 -07:00
Teknium
d712a7fd73
fix(model-picker): surface the current custom/uncurated model in picker rows (#53457)
A model selected via the CLI (e.g. /model openrouter/<uncurated-name>) was
absent from every model picker — the main picker AND the MoA reference/
aggregator slot pickers — because each provider row only carried its curated
catalog. Inject the current model at the front of its provider's row so it is
selectable and shown everywhere.
2026-06-27 00:06:34 -07:00
Ben Barclay
fbf748b282
fix(dashboard-auth): follow redirects on self-hosted OIDC discovery (#53399)
The self-hosted OIDC provider fetched the discovery document with a bare
httpx.get(). httpx defaults to follow_redirects=False (unlike curl -L or
the requests library), so when an IDP answers GET
/.well-known/openid-configuration with a 3xx — Authentik canonicalises the
.well-known path, and any IDP behind a reverse proxy doing an http→https
upgrade redirects too — the bare redirect (empty body) tripped the
status != 200 guard and raised 'OIDC discovery returned 302', which
routes.py maps to the provider_unreachable audit event and a 503. The
browser surfaced 'Auth provider self-hosted unreachable'.

The user's smoking gun (curl -o writing zero bytes from inside the
container) is exactly a redirect with no body — the same wall the code hit.

Add follow_redirects=True to the discovery GET only. It's safe: the
issuer-pin check and _require_https_or_loopback still validate the resolved
document and every endpoint, so a redirect can't smuggle in a bad issuer or
a cleartext endpoint. The token/revocation POSTs deliberately keep the
no-follow default (they carry an auth code / refresh token and the endpoint
is already the canonical absolute URL).

Existing discovery tests mocked httpx.get with a canned 200 and never
exercised a real 3xx. Add a regression test that runs a real loopback
server returning a 302 on the .well-known path — fails without the fix
(ProviderError: discovery returned 302), passes with it.
2026-06-27 14:14:51 +10:00
ethernet
dd0e4ab81a change(ci): slice files in matrix job
avoid duplicating work, avoid file discovery on each job
2026-06-26 19:15:18 -07:00
ethernet
1a75387fa8 change(ci): log json decode error in durations 2026-06-26 19:15:18 -07:00
ethernet
707ae6e623 change(tests): don't count with pytest collect
it's way too slow. just grep files lol
2026-06-26 19:15:18 -07:00
ethernet
bcc3eb3419 fix(ci): rip out some xdist legacy stuff... how did these ever work?? 2026-06-26 19:15:18 -07:00
ethernet
2fa66950e8 change(ci): upload-artifact from v4 -> v7 2026-06-26 19:15:18 -07:00
ethernet
4b0a2040e7 change(ci): use run_tests in docker 2026-06-26 19:15:18 -07:00