Commit graph

925 commits

Author SHA1 Message Date
orcool
f0fdb5e67d feat(catalog): add qwen3.7-max to alibaba + alibaba-coding-plan model lists
Alibaba's latest flagship Qwen model is released but not yet present in the
DashScope (alibaba) or Alibaba Coding Plan curated catalogs.  Add it so it
shows up in the /model picker and setup wizard for those providers.

OpenCode Go routing for qwen3.7-max already landed via #32780 (commit 2fc77c53f).
OpenRouter + Nous catalog entries already landed via #32809 (commit ccd3d04fc).
This salvage picks up the remaining alibaba / alibaba-coding-plan entries from
#32806 — the AI Gateway entry is dropped because Vercel AI Gateway was removed
in #33067.
2026-05-27 02:05:58 -07:00
Teknium
96223265b9 chore(api-server): mark skills_api capability True now that /v1/skills shipped
#33016 added GET /v1/skills + /v1/toolsets on the API server; the
capability flag introduced in this branch was placeholder-False. Flip
to True so capability probers see the truth.
2026-05-27 01:56:55 -07:00
Teknium
f0be32232d chore(release): map EvilHumphrey noreply for #33034 salvage 2026-05-27 01:52:34 -07:00
Teknium
b6ca56f651
fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) (#33035)
* fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144)

When an OpenAI-compatible Responses API surface accepts an initial
request but later rejects the replayed `codex_reasoning_items`
encrypted blob with HTTP 400 `invalid_encrypted_content`, the
session previously got stuck retrying the same poisoned payload.

Recovery: classify the error as a dedicated FailoverReason, and on the
first hit disable encrypted reasoning replay for the rest of the
session, strip cached items from message history, and retry once.

Changes:
* error_classifier: add FailoverReason.invalid_encrypted_content
  branch in _classify_400 (before context_overflow so the messages
  that mention 'encrypted content … could not be verified' don't trip
  context heuristics), in _classify_by_error_code, and extend
  _extract_error_code to peek inside wrapped JSON in error.message and
  ignore the bare '400' as a code.
* agent_init: initialize `_codex_reasoning_replay_enabled = True` on
  every agent.
* run_agent: add AIAgent._disable_codex_reasoning_replay() helper
  that flips the flag and pops cached items.
* codex_responses_adapter: thread a `replay_encrypted_reasoning`
  kwarg through _chat_messages_to_responses_input so that when the
  flag is False we don't replay codex_reasoning_items.
* transports/codex.py: read `replay_encrypted_reasoning` from params,
  thread it into the adapter, and gate the
  `include=['reasoning.encrypted_content']` request hint on it.
* chat_completion_helpers: pass the agent's replay flag through to
  the transport.
* conversation_loop: in the retry loop, add an
  invalid_encrypted_content recovery branch that fires once per
  session, only when api_mode == codex_responses, only when replay is
  still enabled, and only when at least one assistant message in
  history actually carries cached reasoning items (otherwise the 400
  has nothing to do with our cache and the normal retry path handles
  it).

Tests:
* test_error_classifier: new wrapped-JSON _extract_error_code case;
  new TestClassifyApiError cases proving the 400 is retryable with
  no fallback, that the broad message match doesn't catch a generic
  'parsed' message, and that the error code match is
  case-insensitive.
* test_run_agent_codex_responses: end-to-end test of the recovery
  branch firing once and disabling replay, plus a sibling test that
  proves the branch does *not* fire (and the flag stays True) when
  history has no cached reasoning items.

Salvages PR #10144 onto the post-refactor module layout
(error_classifier / codex_responses_adapter / transports/codex /
conversation_loop / agent_init) since the original diff was written
against the pre-refactor monolithic run_agent.py.

* chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage

---------

Co-authored-by: victorGPT <wuxuebin1993@gmail.com>
2026-05-26 22:01:17 -07:00
teknium1
9feadc2734 chore(release): map ticketclosed-wontfix noreply to GitHub login 2026-05-26 20:51:59 -07:00
Teknium
16e86ce6a7
chore(release): map wangpuv contributor email for #32933 (#33005)
Pre-stages the AUTHOR_MAP entry so the contributor-check workflow
passes when Will Falcon's image-gen SSE fix lands.
2026-05-26 20:40:17 -07:00
Teknium
2a8d217417 chore(release): map carltonawong noreply to GitHub login
Added AUTHOR_MAP entry for the cherry-picked fix in the preceding
commit so the release contributor audit can resolve Carlton's noreply
email.
2026-05-26 19:37:37 -07:00
teknium1
f05a47309e fix(gateway): refresh cached agent tools on /reload-mcp
When the gateway processes /reload-mcp, it reconnects MCP servers and
updates the global _servers registry, but cached AIAgent instances in
_agent_cache keep the tools list they were built with. The user had to
also run /new (discarding conversation history) before the agent could
see the new tools — even though /reload-mcp had succeeded.

This patch refreshes each cached agent's .tools and .valid_tool_names
in _execute_mcp_reload after discovery returns, so existing sessions
pick up new MCP tools on their next turn. The slash-confirm gate in
_handle_reload_mcp_command already obtains user consent for the
implied prompt-cache invalidation before this code runs.

Mirrors the equivalent behaviour the CLI already does in cli.py
_reload_mcp. Per-agent enabled_toolsets and disabled_toolsets are
preserved so an agent that was scoped to a subset of toolsets does
not silently gain disabled tools after the reload.

Original diagnosis + initial implementation in #23812 from @fujinice.
The auto-reload watcher half of that PR is intentionally dropped —
users want /reload-mcp to remain explicit.

Co-authored-by: fujinice <45688690+fujinice@users.noreply.github.com>
2026-05-26 14:28:51 -07:00
kshitij
66851dc413
chore: add krislidimo to AUTHOR_MAP for PR #29775 (#32434) 2026-05-25 23:15:56 -07:00
Teknium
d8703e27f5
feat(skills-hub): health checks, freshness badge, and a watchdog cron (#32345)
Layered safety so the Skills Hub at /docs/skills stays in sync without
silent rot. Three pieces:

1. build_skills_index.py — refuses to ship a degenerate index.
   EXPECTED_FLOORS per source (skills.sh ≥100, lobehub ≥100, clawhub ≥50,
   official ≥50, github ≥30, browse-sh ≥50) and MIN_TOTAL=1500. Any source
   collapsing to zero (the silent OpenAI breakage that hid for weeks) now
   fails the workflow loud — broken index never reaches the live site.

2. extract-skills.py + the React page — visible freshness signal.
   Sidecar website/src/data/skills-meta.json carries the index's
   generated_at timestamp, plus per-source counts. Skills Hub renders a
   'Catalog refreshed N hours ago · auto-rebuilt twice daily' line under
   the hero copy. If the cron stalls, users see the staleness immediately.

3. .github/workflows/skills-index-freshness.yml — watchdog cron.
   Every 4 hours, fetches the live /docs/api/skills-index.json, validates
   shape, checks age (>26h is stale), checks the same per-source floors,
   and opens (or appends to) a GitHub issue when anything is off. The
   issue is title-prefixed [skills-index-watchdog] so subsequent failures
   append a comment instead of spamming new issues.

Net effect:
- A silent regression like 'OpenAI tap moved its skills' now fails the
  build instead of shipping a quietly broken catalog.
- A stuck cron (like the landingpage breakage that ran red for weeks) now
  files an issue within 4 hours.
- Users see how fresh the catalog is on the page itself.

Test plan:
- Local: built skills-meta.json from the live index → 'Catalog refreshed
  N minutes ago' rendered correctly in the static HTML.
- Probe logic dry-run against the live index: total=2456, all 6 sources
  above floor, age 0.1h — issues=NONE.
- Triggered skills-index.yml manually; both jobs green, deploy-site.yml
  dispatch fired.
2026-05-25 23:10:45 -07:00
Teknium
cea87d9139
fix(skills-hub): show every catalog source on /docs/skills (skills.sh, ClawHub, browse.sh, OpenAI, …) (#32336)
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker / shell lint / Lint Dockerfile (hadolint) (push) Waiting to run
Docker / shell lint / Lint docker/ shell scripts (shellcheck) (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Docker Build and Publish / move-latest (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
Build Skills Index / build-index (push) Waiting to run
Build Skills Index / trigger-deploy (push) Blocked by required conditions
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
The Skills Hub page was stuck on a stale Feb 25 snapshot, showing only Built-in
+ Optional + Anthropic + LobeHub. The unified index already has 2078 skills
from skills.sh / ClawHub / LobeHub / GitHub taps / Claude Marketplace, and
BrowseShSource adds another ~330 — none of it was reaching the page.

Changes:

- website/scripts/extract-skills.py: read website/static/api/skills-index.json
  (the unified multi-source catalog, rebuilt twice daily) as the canonical
  external source. Keep the legacy skills/index-cache/ fallback for offline
  builds. Add friendly per-source labels (skills.sh, ClawHub, browse.sh,
  OpenAI, HuggingFace, Anthropic, LobeHub, etc.) and per-entry installCmd.
- website/src/pages/skills/index.tsx: add source pills + ordering for the 11
  new sources; render installCmd from the index entry.
- website/scripts/prebuild.mjs: when no local skills-index.json exists, fetch
  the live one from hermes-agent.nousresearch.com so local 'npm run build'
  matches production without burning GitHub API quota.
- scripts/build_skills_index.py: crawl BrowseShSource so browse.sh entries
  land in the unified index. Adjust source_order.
- tools/skills_hub.py: GitHubSource.DEFAULT_TAPS — openai/skills moved its
  skills into skills/.curated/ and skills/.system/, so add both as explicit
  taps (the listing code skips dotted dirs by design). Drop
  VoltAgent/awesome-agent-skills (README-only, no SKILL.md files) and
  MiniMax-AI/cli (singular skill, not a tap directory). Net effect: github
  source jumps from 83 → 143 skills, with OpenAI properly included.
- .github/workflows/deploy-site.yml: build the unified index BEFORE running
  extract-skills.py — previous order meant extract-skills always fell back
  to the legacy cache. Drop the 'skip if file exists' guard; the file is
  gitignored and must be rebuilt every deploy.
- .github/workflows/skills-index.yml: drop the broken 'deploy-with-index'
  job (it cp'd 'landingpage/\*' which no longer exists, failing every cron
  run since the landingpage move). Replace it with a workflow_dispatch
  trigger of deploy-site.yml so the index refresh still reaches production
  on schedule.
- website/docs/user-guide/features/skills.md: drop VoltAgent from the
  default-taps doc list to match the code.

Before: 695 skills (Built-in 90, Optional 84, Anthropic 16, LobeHub 505).
After:  2168 skills across 9 source pills, including the 1212 skills.sh
        entries the user expected to see.
2026-05-25 18:34:54 -07:00
Teknium
fe9744cbee chore(release): map ffr31mr + TheOnlyMika in AUTHOR_MAP
Pre-salvage prep for the must-have security cluster (#32103, #32155).
#32103 author commit uses dearmayo@localhost; PR opener is ffr31mr —
same pattern as the existing holynn-q localhost mapping.
2026-05-25 18:33:02 -07:00
Teknium
b6ce7a451f chore(release): add ronhi for PR #29523 salvage
Maps the machine-local commit email (ronhi@buildabear1.localdomain) to
the GitHub login RonHillDev so the attribution check passes.
2026-05-25 14:51:43 -07:00
teknium1
386f245d9d feat(skills): add optional openhands skill — closes #477
Adds an optional autonomous-ai-agents skill that delegates coding tasks
to the OpenHands CLI (https://github.com/All-Hands-AI/OpenHands). Sits
alongside claude-code / codex / opencode and is the model-agnostic
option in that family — any LiteLLM-supported provider works.

This is a ground-truth rewrite of #19325 by @xzessmedia (Tim Koepsel).
The original PR's SKILL.md was drafted by the OpenHands agent itself and
hallucinated several flags that don't exist in the real CLI (\`--model\`,
\`--max-iterations\`, \`--workspace\`, \`--sandbox docker\`), pointed at
the wrong PyPI package (\`openhands-ai\`, which is the legacy V0 SDK),
and claimed native Windows support that the upstream docs explicitly
disclaim. Rather than cherry-pick and rewrite half the lines under
contributor authorship, the SKILL.md was rebuilt against a verified
install (\`uv tool install openhands --python 3.12\`) and a real
end-to-end \`--headless --json\` run against openrouter/openai/gpt-4o-mini.

Authorship credited via the \`author:\` frontmatter field and an
AUTHOR_MAP entry in scripts/release.py.

Changes:
- optional-skills/autonomous-ai-agents/openhands/SKILL.md (new)
- website/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-openhands.md (auto-gen)
- website/docs/reference/optional-skills-catalog.md (one new row)
- website/sidebars.ts (one new entry under Optional → Autonomous AI Agents)
- scripts/release.py (AUTHOR_MAP entry for xzessmedia)

Pitfalls documented in the SKILL came from running the tool, not from
the upstream README: LiteLLM bedrock/sagemaker stderr noise on every
invocation, banner spam (\`OPENHANDS_SUPPRESS_BANNER=1\` required),
\`--override-with-envs\` mandatory or the CLI ignores LLM_* env vars
entirely, the dashed-vs-undashed Conversation ID footgun for \`--resume\`,
LiteLLM model-slug double-prefix when going through OpenRouter.
2026-05-25 14:49:34 -07:00
Teknium
8f19485f53 chore(release): map kylekahraman email to GitHub login
Required by CI author validation after salvaging PR #29723.
2026-05-25 06:23:18 -07:00
alt-glitch
f5bb595d51 chore(release): map 8bit64k + hclsys in AUTHOR_MAP 2026-05-25 12:48:46 +00:00
teknium1
2b16de0ec3 chore(release): map adam91holt for PR #31984 salvage 2026-05-25 05:34:42 -07:00
teknium1
e9119e0eb8 chore(release): map dsr-restyn + WuKongAI-CMU + codeblackhole1024 for S04 cluster 2026-05-25 05:15:55 -07:00
Teknium
4909dd84c1 chore(release): map 66773372+Tranquil-Flow@users.noreply.github.com to Tranquil-Flow (PR #27518) 2026-05-25 05:07:11 -07:00
Teknium
ee7789e547 chore(release): map simo.kiihamaki@gmail.com to SimoKiihamaki (PR #30773) 2026-05-25 05:06:03 -07:00
Tranquil-Flow
b1adb95038 fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern
The ChatGPT Codex backend (chatgpt.com/backend-api/codex) has historically
silently dropped certain model requests: the connection is accepted but no
stream events are emitted and no error is raised. PR #31967 lowered the
implicit stale-call default from 300s to 90s so fallbacks kick in faster,
but users still see an opaque "No response from provider for 90s
(non-streaming, ...)" message that gives no path forward.

This patch adds a narrow heuristic — gpt-5.5 family on the Codex backend
via codex_responses api_mode — that substitutes the generic timeout
message with actionable text naming the gpt-5.4-codex workaround and
pointing at #21444 for symptom history.

Changes:

- run_agent.py — new ``AIAgent._codex_silent_hang_hint(model=...)`` method.
  Returns ``None`` for any request that does not match all three guards
  (codex_responses api_mode, openai-codex provider or chatgpt.com Codex
  base URL, gpt-5.5-family model name with word-boundary regex anchoring
  to avoid false-positives on e.g. ``gpt-5.50``).
- agent/chat_completion_helpers.py — the non-stream stale-call site
  consults the hint via ``getattr(...)`` so the call site stays robust
  if the helper is ever removed or stubbed in tests. Hint is appended to
  both the ``_emit_status`` warning and the ``TimeoutError`` message so
  the user sees it in their terminal AND it lands in any retry-loop
  diagnostics.
- tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests
  covering positive cases (bare gpt-5.5, vendor-prefixed openai/gpt-5.5,
  gpt-5.5-codex SKU, model=None fallback to self.model) and negative
  cases (gpt-5.4-codex workaround, gpt-5.50 false-positive guard,
  non-codex api_mode, non-codex provider, empty/None model, unrelated
  models on Codex).

Does NOT fix the backend-side issue (that's an upstream OpenAI/ChatGPT
problem we cannot patch from here). Only converts an opaque timeout into
text that names the workaround so users do not have to dig through logs
or wait for a forum post to learn what to do.

Closes #22046
2026-05-25 04:49:22 -07:00
teknium1
4c64638897 chore(release): map liuhao1024 for PR #20778 salvage 2026-05-25 03:40:47 -07:00
teknium1
51c913caf7 chore(release): map dusterbloom for PR #25726 salvage 2026-05-25 03:40:47 -07:00
teknium1
c0169496d0 chore(release): map jfuenmayor + Jiahui-Gu + YLChen-007 + AdamPlatin123 + waefrebeorn for S11 cluster salvage 2026-05-25 01:55:59 -07:00
Kasun Athaudahetti
2d422720b5 fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults
Codex / Responses-API requests had three latent timeout bugs that combined
into the long silent hangs reported on #21444:

1. The non-stream stale-call detector estimated context tokens from
   ``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry
   their conversational load in ``input`` (with ``instructions`` and
   ``tools``), so every Codex turn logged ``context=~0 tokens`` and the
   detector never applied its >50k / >100k tier bumps.

2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the
   main Codex path. The chat_completions path and the auxiliary Codex
   adapter both forwarded it; the main path skipped it through three
   places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``,
   ``_preflight_codex_api_kwargs``).

3. The streaming stale detector had the same payload-shape bug for
   ``codex_responses`` requests, which route through the non-streaming
   detector (it's the path that emits the user-facing
   "No response from provider for 300s (non-streaming, ...)" warning that
   reporters keep pasting).

This commit:

- Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``,
  used by both the non-stream and stream detectors. Handles ``messages``
  (Chat Completions), ``input + instructions + tools`` (Responses API),
  bare lists, and an unknown-dict fallback.
- Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs``
  and ``_preflight_codex_api_kwargs`` (with guards against
  zero/negative/inf/bool values), and wires
  ``_resolved_api_call_timeout()`` into the Codex branch of
  ``build_api_kwargs``.
- Lowers the implicit non-stream stale defaults so fallback providers
  kick in faster when upstream stalls:
    * base   300s -> 90s
    * >50k   450s -> 150s
    * >100k  600s -> 240s
  These only apply when the user has *not* set
  ``providers.<id>.stale_timeout_seconds`` or
  ``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins.
- Adds regression tests for the estimator shapes, the new defaults, the
  context-tier scaling, transport timeout pass-through, and preflight
  timeout pass-through / rejection of invalid values.

Closes #21444
Supersedes #21652 #24126 #31855

Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>
2026-05-25 01:47:55 -07:00
Teknium
11c40d6a42 test+polish(compression): pin anti-thrash gate and gateway session_id persistence
Follow-up to @someaka's fix.

Polish:
- Drop the redundant `_preflight_tokens >= threshold_tokens` clause.
  `should_compress(tokens)` already short-circuits when tokens < threshold,
  so the explicit comparison was dead code on the True branch.

Tests:
- Preflight: pin that should_compress() is called (anti-thrash has a vote).
  Mocks should_compress to return False even with tokens past the raw
  threshold and asserts no compression runs — exact bug shape from #29335.
- Gateway: AST scan of gateway/run.py asserts every
  `session_entry.session_id = ...` assignment is followed by a
  `session_store._save()` call within the same block. Three sites mutate
  the session_id after compression; all three must persist or the next
  turn loads the pre-compression transcript and re-loops. Empirically
  verified the test catches the bug (drops the new _save() line → red).

AUTHOR_MAP:
- Map ed@bebop.crew -> someaka so the salvaged commit resolves to
  @someaka in release notes.
2026-05-25 01:44:46 -07:00
Teknium
2e0ac31a72 chore(release): map claw@openclaw.ai to wanwan2qq (PR #10215) 2026-05-25 01:33:32 -07:00
teknium1
92d91365e7 chore(release): map zapabob for PR #29826 salvage 2026-05-25 01:15:24 -07:00
teknium1
0c3e34e298 chore(release): map Schrotti77 for PR #25786 salvage 2026-05-25 01:09:54 -07:00
teknium1
a6b0414ea0 feat(providers): extend openai-api with live /v1/models fetch + gpt-5.5-pro
Follow-up on top of @jacevys' PR #21437 cherry-pick:
- _provider_model_ids() now also matches normalized == 'openai-api' for
  the live /v1/models fetch path, so users see the full catalog instead
  of just the curated list.
- Add gpt-5.5-pro and gpt-5.3-codex to the curated list for parity with
  the existing 'openai' table (used as fallback when /v1/models fails).
- Add scripts/release.py AUTHOR_MAP entry for jacevys so CI doesn't
  block the salvage PR.
2026-05-25 00:59:53 -07:00
Ben Barclay
7e165e843d
Merge pull request #31760 from NousResearch/hermes/hermes-bf5898da
feat(docker)!: s6-overlay container supervision (salvage of #30136)
2026-05-25 12:57:51 +10:00
Teknium
46f8948bad test+harden(cli): cover parent-chain walk in concurrent-instance detection
Follow-up to @Strontvod's fix.

Tests:
- Five new tests in test_update_concurrent_quarantine.py cover the parent-
  chain exclusion: the .exe launcher is excluded, an unrelated sibling
  hermes.exe is still reported, multi-level ancestry is fully excluded,
  PID cycles in the parent chain don't hang, and a partially-stubbed
  psutil (no Process attribute) degrades gracefully instead of crashing.
- New _fake_psutil_with_parent_chain helper builds a fuller stand-in
  (Process / NoSuchProcess / AccessDenied + process_iter) than the
  process_iter-only SimpleNamespace the older tests use.

Hardening:
- Broaden the except in the parent-walk to bare Exception. The original
  fix listed (NoSuchProcess, AccessDenied, ValueError), but those names
  are evaluated lazily during exception matching — if psutil is a partial
  stub without the attribute, the exception handler itself raises
  AttributeError that escapes. The function is documented as 'never raises'
  (the surrounding update flow depends on it), so the broader catch keeps
  the contract regardless of how the dependency is shaped.

AUTHOR_MAP:
- Map schepers.zander1@gmail.com -> Strontvod so the salvaged commit
  resolves to @Strontvod in the release notes.

All 18 detect_concurrent + quarantine tests pass.
2026-05-24 19:51:46 -07:00
Ben
da8b2e95fd ci(docker): run tests/docker/ in build-amd64 against the freshly-built image
The new tests/docker/ suite (added by this PR) was being picked up by the
sharded pytest matrix in tests.yml, where its session-scoped `built_image`
fixture issued a 3-7min `docker build` under tests/docker/conftest.py's
180s pytest-timeout cap. Every test in the directory failed in fixture
setup across all 6 shards.

Fix the suite so it actually runs (not skips):

1. Wire the docker tests into docker-publish.yml's build-amd64 job, right
   after the existing smoke test. The image is already loaded into the
   local daemon as `nousresearch/hermes-agent:test`; set
   HERMES_TEST_IMAGE to that and the fixture's pre-built-image branch
   short-circuits the rebuild. 21 tests run in ~90s locally against a
   prebuilt image, no rebuild cost on top of the existing build step.

2. Exclude tests/docker/ from scripts/run_tests_parallel.py's default
   discovery so the sharded matrix in tests.yml stops trying to build
   the image. Explicit positional paths (`pytest tests/docker/` or
   `scripts/run_tests.sh tests/docker/`) still pick the suite up — the
   skip rule honors directory-level user intent, matching the existing
   per-file override pattern.

The dedicated docker-tests step runs on every PR that touches docker
code (the existing path filters on docker-publish.yml already cover
`tests/docker/**` via `**/*.py`), so the suite gates real changes.

(cherry picked from commit 4c481860ce)
2026-05-25 12:40:57 +10:00
teknium1
5cbb132c1d
fix(ci): exclude tests/docker/ from regular test shards; pin read_text encoding
Two CI follow-ups to @benbarclay's #30136 salvage:

1. scripts/run_tests_parallel.py — add 'docker' to _SKIP_PARTS so
   the new tests/docker/ harness doesn't run in the regular test (N)
   matrix. The harness builds the real Dockerfile in a session
   fixture, which can exceed pytest-timeout's 180s ceiling on
   ubuntu-latest where Docker IS available — it surfaced as 6
   identical setup-timeout failures across slices 1–6 on the first
   CI run.

   The docker harness has its own dedicated runner via
   .github/actions/hermes-smoke-test (added in #30136) plus the
   docker-lint workflow. Same treatment as tests/integration/ and
   tests/e2e/ — runs separately, not in the main shards.

2. hermes_cli/service_manager.py — pin encoding='utf-8' on the
   /proc/1/comm read_text call. Ruff PLW1514 enforcement rolled in
   between Ben's last push and the salvage; pure ruff-fix, no
   behavior change.
2026-05-24 18:23:13 -07:00
teknium1
bf2f3b2469 chore(release): map vgocoder for PR #24758 salvage 2026-05-24 17:58:25 -07:00
teknium1
55987818b6 chore(release): map kronexoi for PR #30553 salvage 2026-05-24 17:47:24 -07:00
Teknium
87580076fd chore(release): map 490408354@qq.com to daizhonggeng (PR #9020) 2026-05-24 16:22:48 -07:00
teknium1
99a7ecc335 chore(release): map leeseoki0 for PR #31315 salvage 2026-05-24 15:48:58 -07:00
Teknium
27c4ba98c3 chore(release): map zhangsamuel12@gmail.com to SamuelZ12 (PR #7480) 2026-05-24 15:36:37 -07:00
Teknium
eea9553a9c fix(anthropic): skip mcp_ prefix on outgoing tool schemas when already prefixed
Companion to the GH-25255 incoming-strip fix from @hayka-pacha. Without
this, build_anthropic_kwargs unconditionally added 'mcp_' to every tool
name in step 3, so a native MCP server tool registered as
'mcp_composio_X' was sent as 'mcp_mcp_composio_X' on the wire. The
incoming strip only removes ONE prefix, which still worked on first
call, but on subsequent calls the model pattern-matched the
single-prefixed form from message history and produced names that
stripped to 'composio_X' — registry miss, dispatch fail.

The history-rewrite block (#4) already has this guard. Apply the same
guard to the schema-rewrite block (#3) so round-trip is symmetric.

Added 4 outgoing-side tests. Existing 7 incoming-side tests still pass.

Author map: hayka-pacha added for PR #25270 salvage attribution.

Refs GH-25255.
2026-05-24 15:27:45 -07:00
teknium1
88834baf50 chore: map soju06@users.noreply.github.com for PR #26054 salvage 2026-05-24 15:15:37 -07:00
teknium1
5e6749fbf3 chore(release): map m0n3r0 for PR #29629 salvage 2026-05-24 04:47:45 -07:00
teknium1
26088ca669 chore: map kenyon1977@gmail.com for PR #29119 salvage 2026-05-24 04:31:13 -07:00
Teknium
51bb8c0a9e chore: map pnascimento9596@gmail.com for PR #31235 salvage 2026-05-24 01:31:45 -07:00
Teknium
b207dc28b3 feat(kanban): --ids bulk promote + AUTHOR_MAP entry for #29464
Adds an --ids flag to 'hermes kanban promote' mirroring the existing
block/schedule convention, so the marquee use case from issue #28822
(promote all children of a closed organizational parent in one shot)
doesn't require a shell loop. Single-id JSON output stays a flat
object for back-compat; bulk emits a list. Dedupes positional + --ids
so the same id can't be promoted twice in one call. 5 new CLI-level
tests cover bulk happy path, partial-failure exit code, JSON shapes,
and dedup.

Also adds the thedavidmurray noreply-email -> github-login mapping in
scripts/release.py so the salvage cherry-pick passes the AUTHOR_MAP
contributor-credit check.
2026-05-23 23:10:36 -07:00
Teknium
7a4dc8e8d6 chore: map edison@mcclean.codes for PR #29817 salvage 2026-05-23 17:49:47 -07:00
teknium1
87111c7bfe chore: map Glucksberg noreply email in AUTHOR_MAP 2026-05-23 16:26:28 -07:00
teknium1
4f835f7e43 chore(release): map NickLarcombe author email for #30707 salvage 2026-05-23 05:51:33 -07:00
Teknium
11e6dd3c60
chore(release): add AUTHOR_MAP entry for egilewski (PR #30432) (#30833) 2026-05-23 01:41:31 -07:00
Teknium
3f78d8073c fix(skills): make content_hash filename-sensitive too (symmetric with bundle_content_hash)
PR #6656 added rel_path + \x00 prefixing to ``bundle_content_hash`` so a
filename swap between two files in a bundle changes the digest. But it
only patched the in-memory side — ``content_hash`` in ``tools/skills_guard.py``
(the on-disk equivalent) still hashed file contents only.

These two functions need to stay symmetric: ``check_for_skill_updates``
compares the disk hash of an installed skill against the bundle hash
of the upstream copy. With the asymmetric fix, every clean install
showed as drifted because the digests no longer matched
(2 existing tests in ``test_skills_hub.py`` started failing as soon as
the contributor's change landed).

Apply the same ``rel_path + \x00 + content`` shape to the disk-side
function. Both functions now produce the same digest for the same skill
content laid out two ways. Documented the symmetry invariant in the
docstring so a future change to either function knows to touch both.

Also adds tests/tools/test_pr_6656_regressions.py with 10 regression
tests covering all three fixes salvaged in PR #6656:
  - uninstall_skill path traversal (4 cases: parent segments, absolute
    paths, symlink escape, legitimate skill)
  - bundle_content_hash filename swap detection (4 cases: in-memory
    swap, identity, disk-side swap, bundle↔disk symmetry)
  - list_pending lock contract (2 cases: source-grep contract, smoke)

Also fixes AUTHOR_MAP entry for @aaronlab — their commit email
(1115117931@qq.com) maps to "aaronagent" which isn't a real GitHub
login, so changelog @mentions would 404.
2026-05-22 19:59:24 -07:00