`npm ci` / `uv sync` / toolchain header fetches occasionally die on
transient network blips — e.g. node-pty's node-gyp fetching Node headers
(an undici assert) during the typecheck job's `npm ci`, which killed the job
before `tsc` ever ran. "Re-run and it goes green" is exactly what CI should
do itself.
- New reusable `.github/actions/retry` composite action wraps a command and
retries on failure (3x / 10s, command passed via env so it can't inject).
Applied to every PR-path network install: npm ci (typecheck, desktop
build, docs site), uv sync (tests, e2e), uv tool install (lint),
pip install (docs site).
- typecheck now runs `npm ci --ignore-scripts`: `tsc` needs only sources +
type defs, so skipping install scripts drops node-pty's native rebuild
(whose header fetch was the flake) and is faster. Validated locally — tsc
passes for ui-tui, apps/shared, and apps/desktop with scripts skipped.
- ripgrep download uses `curl --retry`.
Docker (main-only) and the release/windows workflows are intentionally left
for a follow-up.
Heavy PR checks run on every PR because the workflows deliberately avoid
`on.paths` filters — a path-gated workflow leaves its required check pending
forever when no matching file changes, blocking merge. So a docs-only PR
still spins up the TypeScript matrix, the full Python suite, and ruff/ty.
Keep every workflow triggering on every PR (checks always report) but gate
the expensive *steps* on what the PR touches. Skipping a step (not the job)
leaves the job green, so required checks never hang — the same idiom already
proven in contributor-check.yml.
A classifier (scripts/ci/classify_changes.py) maps the PR diff to three
lanes — python, frontend, site — surfaced as step outputs by a composite
action (.github/actions/detect-changes). Fail-open: an empty diff or any
.github/ change runs everything; python is a denylist (skipped only when
every file is provably prose or a frontend-only package); skills/**/SKILL.md
counts as python-relevant since the skill-doc tests read that tree. Non-PR
events always run the full pipeline.
Paired with commit e0c03defd (enabled PLW1514 in pyproject.toml) and
commit 3dfb35700 (added scripts/check-windows-footguns.py). Both
commits noted that the corresponding workflow edits were held back
because the authoring token lacked the `workflow` OAuth scope.
New jobs, both separate from `lint-diff` so the advisory diff
comment still posts when enforcement fails:
- ruff-blocking: runs `ruff check .` against the explicit select
list in pyproject.toml (currently PLW1514, which catches bare
open() that defaults to locale encoding — cp1252 on Windows).
No --exit-zero, no `|| true`; exit code propagates to the
required-check gate.
- windows-footguns: runs scripts/check-windows-footguns.py --all
(380 files, stdlib-only, <2s). Covers 11 Windows-unsafe
primitives — os.kill(pid, 0) bpo-14484 footgun, os.killpg,
os.setsid/setpgrp, signal.SIGKILL/SIGHUP/SIGUSR* without
getattr fallback, shebang scripts via subprocess, wmic without
shutil.which guard, hardcoded ~/Desktop OneDrive trap, bare
open() without encoding=, etc.
Both jobs pin actions by SHA to match repo convention.
tests/test_lint_config.py::test_workflow_has_blocking_ruff_step
now finds the blocking step and passes.