Commit graph

9713 commits

Author SHA1 Message Date
wysie
ee80dfdea0 fix: preserve skill packages during curator consolidation 2026-05-27 13:39:58 -07:00
wysie
f040710d04 fix: backfill official optional skill provenance 2026-05-27 13:39:58 -07:00
wysie
a38e283395 fix: preserve nested official skill install paths 2026-05-27 13:39:58 -07:00
kshitijk4poor
53bdef5775 test(cli): regression test for hermes update fork upstream sync (#26172)
Asserts that when hermes update runs on a fork whose local HEAD matches
origin/main but commit_count == 0, the early-return path still consults
_sync_with_upstream_if_needed() before printing "Already up to date!".

Locks in the fix from the parent commit so the upstream-sync call cannot
silently regress out of the commit_count == 0 branch.
2026-05-27 13:10:50 -07:00
Franci Penov
6f2a2f157f fix: check upstream even when origin/main has no new commits
The upstream sync logic only ran after a successful origin pull,
so forks whose origin/main was already in sync with local (but
behind upstream/main) would bail out with "Already up to date!"
without ever checking upstream.
2026-05-27 13:10:50 -07:00
Teknium
e8955f222c
fix(codex): drop dead model slugs that HTTP 400 on ChatGPT Pro (#33424)
DEFAULT_CODEX_MODELS shipped three slugs that the chatgpt.com Codex
backend rejects with HTTP 400 'The <slug> model is not supported when
using Codex with a ChatGPT account.' on every account tested live:

  gpt-5.2-codex
  gpt-5.1-codex-max
  gpt-5.1-codex-mini

Live verified against https://chatgpt.com/backend-api/codex/models
which returns gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex,
gpt-5.3-codex-spark, gpt-5.2 for ChatGPT Pro accounts.

When _fetch_models_from_api fell back to DEFAULT_CODEX_MODELS (offline
first-run, transient API failure) the picker surfaced these dead slugs
and crashed on selection. The forward-compat synthesis table chained
them downstream too.

If OpenAI re-enables them on the OAuth-backed Codex backend, live
discovery will pick them up automatically — the defaults list is only
consulted when live discovery is unavailable.

Test fixture pivoted to use gpt-5.3-codex (templated by 4 entries) as
the synthesis driver so the forward-compat test still exercises the
synthesis path.
2026-05-27 12:16:15 -07:00
teknium1
5deb384b53 chore(release): map donovan-yohan for #33263 salvage 2026-05-27 11:48:23 -07:00
Donovan Yohan
c94ad89818 fix(kanban): retry corrupt-board dispatch after quarantine 2026-05-27 11:48:23 -07:00
xxxigm
fc47b7285c fix(codex): omit tools key from Codex Responses kwargs when no tools registered
Salvages the transport-side fix from #32911 (@xxxigm). Closes #32892.

The openai SDK's responses.stream() / responses.parse() eagerly call
_make_tools(tools), which iterates tools without a None guard. Passing
tools=None raises TypeError: 'NoneType' object is not iterable before
any HTTP request is issued (openai==2.24.0).

PR #33042 already removed responses.stream() from our own Codex call
paths, so the specific iteration crash inside _make_tools is no longer
on the hot path. But the right API contract is to omit tools entirely
when there are no functions to expose — passing tools=None to the
backend is semantically wrong regardless of the SDK's iteration
behavior, and we'd hit it again on any future code path that hasn't
migrated off responses.stream().

This applies the transport-level part of @xxxigm's fix: move
'tools': response_tools into the if response_tools: branch so the
key is omitted when there are no tools, just like tool_choice and
parallel_tool_calls already are. Skips the run_agent.py-side
_strip_sdk_none_iterables helper from their PR — that path is now
obsolete because the SDK helper that needed defending is gone.

Tests
- tests/run_agent/test_codex_no_tools_nonetype.py: 6 tests trimmed
  from @xxxigm's original 13-test file. Drops the obsolete tests for
  _strip_sdk_none_iterables and _RecordingResponsesStream (helpers
  that don't exist on main anymore), keeps the transport behavior
  tests + the SDK contract sanity check that ensures we notice if
  upstream ever fixes _make_tools(None).
- 6/6 passing locally.

Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
2026-05-27 11:46:17 -07:00
teknium1
8386f84454 chore(release): map Brixyy for #33136 salvage 2026-05-27 11:30:55 -07:00
Brixyy
dc9d677d59 fix(agent): classify TypeError('NoneType ... not iterable') as retryable provider shape error
Salvages the intent of #33136 (@Brixyy) onto current main. The original PR
was written against the pre-refactor monolithic run_agent.py and added a
top-level _is_nonretryable_local_validation_error() helper. Both target
functions have since been extracted to agent/conversation_loop.py:2869,
so the salvage applies the equivalent guard inline at that canonical
location rather than reintroducing the helper.

## Why

After #33042 made our own Codex consumer structurally immune to NoneType
crashes, third-party shims, mocked clients, and any future code path that
hasn't migrated could still surface TypeError: 'NoneType' object is not
iterable as a wire-shape mismatch. The agent loop's classifier currently
treats ALL TypeError as a local programming bug and aborts non-retryable
— users on stale Telegram/gateway turns saw bare "Non-retryable error
(HTTP None)" with no recovery.

This is a provider/SDK shape mismatch, not a local programming bug. The
retry/fallback path should run, not be short-circuited.

## What

agent/conversation_loop.py: extend is_local_validation_error to exclude
TypeErrors whose message matches the NoneType-not-iterable shape (case-
insensitive, both "NoneType" and "not iterable" must appear).

tests/run_agent/test_jsondecodeerror_retryable.py:
- update the mirror predicate to match the production check
- add TestNoneTypeNotIterableIsRetryable class with 3 tests (the basic
  shape, message variants, unrelated TypeErrors still abort)
- add TestAgentLoopSourceHasNoneTypeCarveOut to enforce the source-level
  invariant matches the test mirror

## Validation

tests/run_agent/test_jsondecodeerror_retryable.py +
tests/run_agent/test_31273_402_not_retried.py → 14/14 passing

Co-authored-by: Brixyy <subrtt@gmail.com>
2026-05-27 11:30:55 -07:00
teknium1
3476509f97 chore(release): map sanghyuk-seo-nexcube for #33383 salvage 2026-05-27 11:19:55 -07:00
Sanghyuk Seo
283bb810e7 fix(agent): tolerate large codex stream prefill 2026-05-27 11:19:55 -07:00
teknium1
486d632cc2 fix(auxiliary): coerce None final.output to empty list in Codex aux adapter
Closes #33368.

`_CodexCompletionsAdapter.create()` iterates `final.output` from the
Codex Responses stream. The event-driven consumer (introduced in #33042)
always sets `final.output` to a list, so this shape can't come from our
own code path. But:

- Mocked clients in tests can return a typed Response with `output=None`
- Third-party shims / compatibility layers that bypass the consumer can
  do the same
- A future code path that wraps a different consumer could regress

The old code `getattr(final, "output", [])` returns `None` (not the
default `[]`) when the attribute EXISTS but is `None`. Iterating
`None` then raises `TypeError: 'NoneType' object is not iterable` —
the exact error logged by title-generation when this fires.

Fix: `getattr(final, "output", None) or []` — single-line defensive
coerce. Cheap; zero risk.

Regression test asserts the auxiliary path handles a final whose
`.output` is `None` (via monkey-patched consumer) without raising and
returns the expected chat.completions-shaped response.

Reporter: @pavegrid-1 (issue #33368).
2026-05-27 11:08:21 -07:00
Teknium
9919caff46
feat(image_gen): add Krea provider plugin (Krea 2 Medium + Large) (#33236)
* feat(image_gen): add Krea provider plugin (Krea 2 Medium + Large)

New built-in image_gen backend wrapping Krea's Krea 2 foundation
image model family. Auto-discovered like the other image_gen plugins
and appears in 'hermes tools' → Image Generation → Krea.

Krea's API is asynchronous — submit returns a job_id, poll /jobs/{id}
until terminal. The provider hides that behind the synchronous
ImageGenProvider.generate() contract: submit, poll every 2s with
light backoff (max 5s), 3-minute ceiling matching Krea's hosted-tool
timeout. Result URL is materialised to $HERMES_HOME/cache/images/
to avoid CDN-expiry 404s downstream (same fix as xAI #26942).

Models:
- krea-2-medium (default — Krea's 'start here' recommendation)
- krea-2-large

Aspect ratios map landscape→16:9, square→1:1, portrait→9:16.
Resolution: 1K (Krea's only current option).

Kwarg passthrough: seed, creativity (raw/low/medium/high), styles,
image_style_references (capped 10), moodboards (capped 1) — matches
Krea's per-request limits. Unknown kwargs are ignored.

Config knobs (config.yaml):
  image_gen.provider: krea
  image_gen.krea.model: krea-2-medium | krea-2-large
  image_gen.krea.creativity: raw | low | medium | high
Env overrides: KREA_API_KEY (required), KREA_IMAGE_MODEL.

KREA_API_KEY is registered in OPTIONAL_ENV_VARS so 'hermes setup'
prompts for it.

31 new tests; image_gen suite + picker + tools_config: 211/211.

* fix(image_gen/krea): address review feedback

- Update KREA_API_KEY setup URL to the canonical token-creation page
  (https://www.krea.ai/app/api/tokens). The previous URL returned 404.

- Fail fast on non-retryable HTTP statuses during poll. The previous
  loop retried every HTTPError for the full 180s deadline, so an auth
  (401), billing (402), forbidden (403), or not-found (404) response
  would make image_generate hang for three minutes. Only retry
  transient statuses (408/409/425/429/5xx); surface everything else
  immediately.

- Add 5 tests covering fail-fast on 401/403/404 and retry on 429/503.

* fix(krea): point users at the real API token dashboard URL

Three call sites linked users to dashboard pages that don't exist:
- hermes_cli/config.py: https://www.krea.ai/app/api/tokens
- plugins/image_gen/krea/__init__.py get_setup_schema: https://www.krea.ai/api-keys
- plugins/image_gen/krea/__init__.py auth_required error: https://www.krea.ai/api-keys

Per Krea's own docs (https://docs.krea.ai/developers/api-keys-and-billing),
the real dashboard URL is https://www.krea.ai/settings/api-tokens. All three
sites now point there.
2026-05-27 11:01:47 -07:00
Erosika
eccbbe4b1b chore(release): map adopted Honcho contributors 2026-05-27 10:49:33 -07:00
Erosika
c89393b711 chore(honcho): trim peer-card fallback comment 2026-05-27 10:49:33 -07:00
Dora (kyra-nest)
bcae3fcc4e fix(honcho): align user context peer perspective
Use the shared observer/target resolver for session context so peer='user' and explicit configured peer IDs query Honcho from the same assistant-observed perspective when allowed. Add regression coverage for user alias, explicit peer, and self-observer fallback.
2026-05-27 10:49:33 -07:00
David Doan
1800a1c796 fix(honcho): align peer-card read and write paths
honcho_profile(peer="user") returned an empty card even when Honcho
held a populated peer card for the user. Two independent bugs combined
to produce the symptom:

1. Read path: get_peer_card() called _fetch_peer_card(observer, target=user),
   which hits GET /peers/{observer}/card?target={user} — the observer's local
   card of the user. On self-hosted Honcho v3 this slot is empty unless writes
   also use it. The peer card lives on the user peer itself
   (GET /peers/{user}/card). Add a fallback: when the observer-target slot is
   empty and a target exists, retry against the target peer's own card.

2. Write path: set_peer_card() resolved only the target peer and called
   user_peer.set_card(card). The read path uses the assistant peer as
   observer, so writes and reads addressed different Honcho card scopes.
   Align set_peer_card() with _resolve_observer_target() so writes go to
   assistant_peer.set_card(card, target=user_peer_id), matching the read.

Both paths now use the same observer/target resolution, and the read
path additionally falls back to the target's own card for compatibility
with deployments where cards were written directly to the peer.

Closes: related to #13375, #17124, #20729
2026-05-27 10:49:33 -07:00
Erosika
1a8e67076a fix(honcho): cover pinUserPeer + aiPeer edge cases in setup, clone, and gateway cache
Three related regressions stemming from the pinUserPeer alias landing:

- Setup wizard read host-only fields when detecting current shape but the
  parser supports root-level config and gives host pinUserPeer higher
  precedence than pinPeerName. Re-running setup could mis-detect shape
  and silently flip routing. Detection now uses the same resolver order
  as HonchoClientConfig, and each shape branch scrubs every peer-mapping
  key before writing so a stale pinUserPeer=false can't outrank a freshly
  written pinPeerName=true. Multi no longer auto-writes
  userPeerAliases={} (was silently masking root-level baselines).

- clone_honcho_for_profile inherited pinPeerName but not pinUserPeer, so
  a default profile configured with the newer key produced cloned
  profiles without the pin.

- Gateway cache-busting signature fingerprinted Honcho user-peer fields
  but not ai_peer. Since HonchoSessionManager freezes cfg.ai_peer at
  init, mid-flight aiPeer edits kept assistant writes on the old peer
  until an unrelated cache eviction. ai_peer is now part of the
  signature.
2026-05-27 10:49:33 -07:00
Erosika
939499beed chore(honcho): trim PR-history narration from docs and tests
Remove "PR #14984 / #27371 / #1969" references and "the original key /
legacy / backwards-compatible / Port #N" narration from the honcho
plugin README, tests, and one stale code comment. These artefacts age
poorly: they describe how a change happened rather than what the code
does today, and they tax readers who weren't around for the original
work.

Also drop a dangling reference to scratch/memory-plugin-ux-specs.md in
__init__.py — the file isn't in the repo or git history.

No behaviour change.
2026-05-27 10:49:33 -07:00
Erosika
6feb2afd50 fix(honcho): plug pinPeerName transition gaps
Three correctness gaps when honcho.json's identity-mapping config changes
mid-flight:

1. The gateway's agent cache signature ignored honcho identity keys, so
   editing peerName / pinPeerName / userPeerAliases / runtimePeerPrefix
   was silently dropped until an unrelated cache eviction. Extend
   _extract_cache_busting_config to fingerprint the resolved honcho
   config so the AIAgent rebuilds on the next message.

2. cmd_setup let single → multi flips orphan the pinned-pool history
   under peerName without warning. Detect the transition, warn that
   runtime users will resolve to fresh empty peers, and auto-steer to
   hybrid (alias the operator's runtime IDs back to peerName) so the
   operator's own continuity survives. yes / no overrides available.

3. README didn't document the orphaning behaviour. Add a "Migrating
   single → multi" callout under Deployment shapes.

Tests:
- TestPinTransition (test_pin_peer_name.py): fresh-manager flip resolves
  to runtime, in-process flip is gated by the per-key session cache
  (documents the gateway-cache-must-bust contract), 3 cache-bust
  signature tests for pin / aliases / prefix.
- TestProfilePeerUniqueness: two profiles pinned to distinct peerNames
  resolve to distinct peers; host-level peerName overrides root when
  pinned.
- test_single_to_multi_steers_to_hybrid_by_default and
  test_single_to_multi_yes_override_keeps_multi (test_cli.py): wizard
  guard end-to-end coverage.
2026-05-27 10:49:33 -07:00
erosika
58987cb8b1 docs(honcho): document identity-mapping config + resolver ladder + deployment shapes
PR #27371 introduced three new identity-mapping config keys
(pinPeerName, userPeerAliases, runtimePeerPrefix), but the README's
'Full Configuration Reference' didn't mention them.  Operators had
to read the source to understand the resolver, leading to predictable
support questions ("why is my user split across two peers?", "what
does pinPeerName actually pin?").

Add a new 'Identity Mapping' subsection that covers:

* The four config keys (pinUserPeer + alias, userPeerAliases,
  runtimePeerPrefix) with concrete examples.

* The 7-step resolver ladder so operators can predict which peer a
  given runtime ID will land on.

* Why there's no symmetric pinAiPeer (the AI peer is already pinned
  by construction; the asymmetry is intentional).

* Host vs root semantics (host-level replaces root for maps, wipes
  with empty value).

* The three deployment shapes ('hermes honcho setup' uses these same
  shape names) with one-line guidance per shape.
2026-05-27 10:49:33 -07:00
erosika
3cf5e8225d refactor(honcho): accept pinUserPeer as backwards-compatible alias for pinPeerName
The original key 'pinPeerName' from #14984 is ambiguous: a fresh
reader can't tell whether it pins the user peer or the AI peer from
the name alone.  The resolver only ever pins the user-side
(_resolve_user_peer_id short-circuits when pin_peer_name is true; the
AI peer is already pinned by construction via aiPeer).

Add 'pinUserPeer' as the canonical alias.  Both keys land on the
same internal pin_peer_name field; precedence is host pinUserPeer →
host pinPeerName → root pinUserPeer → root pinPeerName → default.
Host-level always beats root-level regardless of alias, so a host
block can still explicitly disable a root-level pin even via the new
key.

Make _resolve_bool variadic so it can express the four-value
precedence chain.  All existing callers pass two positional args +
default keyword, which the new signature accepts unchanged.

Internal var name (pin_peer_name) stays the same to keep the
cherry-picked #27371 commits clean and avoid a noisy rename diff.
2026-05-27 10:49:33 -07:00
erosika
0bac880991 feat(honcho-setup): add deployment-shape step to identity-mapping wizard
The PR #27371 resolver introduced three identity-mapping config keys
(pinPeerName, userPeerAliases, runtimePeerPrefix), but operators had
no guided way to set them — they had to read the README, understand
the resolver ladder, and hand-edit honcho.json.  This commit adds an
interactive step to 'hermes honcho setup' that asks one question
('what's your deployment shape?') and writes the right combination
of keys.

Three shapes cover the realistic deployments:

* single -- pinPeerName=true.  All gateway users collapse to your
            peerName.  Recommended for personal/single-operator use.

* multi  -- pinPeerName=false, no aliases.  Each runtime user gets
            their own peer.  Optional runtimePeerPrefix for cross-
            platform namespace isolation.

* hybrid -- pinPeerName=false, with userPeerAliases mapping YOUR
            runtime IDs (Telegram UID, Discord snowflake, Slack
            user, Matrix MXID) to peerName.  Multi-user gateway
            where you are a privileged operator.

A 'skip' option leaves existing identity-mapping config untouched —
critical because re-running setup must not silently wipe operator-
curated aliases.

The wizard detects the current shape from existing config so the
prompt's default matches what the operator already has.
2026-05-27 10:49:33 -07:00
erosika
c03960decd fix(honcho): include user_id in agent cache signature to prevent shared-thread peer contamination
PR #27371 introduced a per-user-peer resolver in HonchoSessionManager,
but the resolved runtime identity is frozen into the manager at first-
message init.  When the gateway session_key intentionally omits the
participant ID (the default for threads via thread_sessions_per_user=
False), a cached AIAgent created by user A is reused for user B's
messages, attributing B's writes to A's resolved Honcho peer and
breaking #27371's per-user-peer contract.

Fix by including user_id and user_id_alt in _agent_config_signature so
the cache key distinguishes participants in shared threads.  Each user
in a shared thread now triggers a fresh AIAgent build (trading prompt-
cache warmth for memory-attribution correctness — the right tradeoff
for an external-memory backend where misattribution is unrecoverable).

The default-None case keeps the signature byte-identical to pre-fix
behavior so this change doesn't invalidate in-flight caches on deploy.
2026-05-27 10:49:33 -07:00
erosika
00e6830204 fix(honcho): inherit identity-mapping config in cloned profile blocks
PR #27371 added host-scoped userPeerAliases, runtimePeerPrefix, and
pinPeerName, but the cloned-profile allowlist in
plugins/memory/honcho/cli.py::clone_honcho_for_profile() omitted them.
A new profile created via 'hermes honcho setup' or similar would
silently drop the operator's identity-mapping config, causing gateway
users to resolve to raw runtime IDs and fragmenting Honcho memory
across an unintended set of peers.

Add the three keys to the allowlist and a regression test class
covering all three plus the unset case.
2026-05-27 10:49:33 -07:00
mavrickdeveloper
30b391ab36 Avoid Honcho runtime peer collisions
(cherry picked from commit 4ae3c1a228)
2026-05-27 10:49:33 -07:00
mavrickdeveloper
382b1fc1b6 Cover Honcho runtime peer edge cases
(cherry picked from commit d89a57ea40)
2026-05-27 10:49:33 -07:00
mavrickdeveloper
2e3c6627ce Add Honcho runtime peer mapping
(cherry picked from commit 864cdb3d2e)
2026-05-27 10:49:33 -07:00
zccyman
2e181602a1 fix(agent): isolate credential pool on provider fallback
Closes #33163.

When _try_activate_fallback() switches from one provider to another (e.g.
openai-codex → openrouter), the credential pool still belongs to the
primary provider. This causes two compounding bugs:

1. The pool retains the primary's base_url. Downstream pool recovery
   (rate_limit / billing / auth) calls _swap_credential() with a primary
   entry which overwrites the agent's base_url back to the primary's
   endpoint. Every fallback request then 404s against the wrong host.

2. Pool recovery acting on errors from the FALLBACK provider mutates the
   PRIMARY's pool state (#33088 reported a related corruption pattern),
   exhausting/rotating entries that have nothing to do with the failure.

Two layered fixes:

a) try_activate_fallback (agent/chat_completion_helpers.py): on fallback
   activation, clear agent._credential_pool when the fallback provider
   doesn't match the pool's provider. Pool is preserved when the fallback
   shares the pool's provider (e.g. multiple openrouter entries).

b) recover_with_credential_pool (agent/agent_runtime_helpers.py):
   defensive guard rejects any pool mutation when agent.provider doesn't
   match pool.provider. Defense-in-depth — should never fire after (a)
   is in place, but covers any future path that attaches a stale pool.

Salvaged from @zccyman's PR #33217. The original PR was written against
the pre-refactor monolithic run_agent.py; both target functions have
since been extracted to module-level helpers. Behavior is identical —
the guards live in the canonical extracted locations.

Tests
- New tests/run_agent/test_fallback_credential_isolation.py (7 tests
  covering: fallback clears mismatched pool, fallback preserves matching
  pool, recovery rejects mismatched pool, recovery accepts matching
  pool, 429-from-z.ai-doesn't-exhaust-codex-pool, _client_kwargs
  base_url survives pool clear, _swap_credential doesn't restore
  primary URL after fallback).
- Cross-verified: 77/77 passing across fallback isolation tests +
  agent/test_credential_pool.py — no regression.

Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
2026-05-27 10:45:26 -07:00
JohnC1009
414a5bc924 fix(auth): fall back to global auth.json in _load_provider_state
In profile mode, _load_provider_state previously returned None when a
provider was absent from the profile's auth.json — even if the user had
authenticated at the global root. This broke runtime credential resolvers
that read state directly (resolve_nous_access_token,
resolve_nous_runtime_credentials), causing profiles without their own
nous login to fail with 'Hermes is not logged into Nous Portal' despite
a valid global session.

Push the existing read-only global fallback (already used by
get_provider_auth_state and read_credential_pool) into _load_provider_state
so every caller benefits, and simplify get_provider_auth_state into a thin
wrapper. Writes still target the profile only — profile state continues to
shadow global state on the next read after a per-profile login. Behavior in
classic (non-profile) mode is unchanged because _load_global_auth_store
returns an empty dict.

Adds 5 tests covering the new contract on _load_provider_state directly.
Existing 770 auth/credential/nous tests still pass.
2026-05-27 09:38:58 -07:00
kshitij
dd0d5d5a82
chore: add JohnC1009 to AUTHOR_MAP (#33351)
Pre-requisite for PR #32020 salvage (auth: global auth.json fallback
in _load_provider_state). Contributor_audit strict mode fails if any
commit author email on main is unmapped.

Co-authored-by: kshitijk4poor <kshitijk4poor@gmail.com>
2026-05-27 09:37:50 -07:00
LeonSGP43
458a94e425 fix(cli): keep destructive slash modal on Linux 2026-05-27 05:57:01 -07:00
Teknium
f0de3cd0a0
fix(agent): roll back switch_model() state when client rebuild fails (#33228)
Closes #33175.

switch_model() in agent/agent_runtime_helpers.py mutated agent.model and
agent.provider before rebuilding the client, with no try/except to restore
them on failure. If the rebuild raised (bad API key, network error,
build_anthropic_client failure, etc.) the agent was left with the new
model+provider name paired with the OLD client — producing HTTP 400s like
"claude-sonnet-4-6 is not supported on openai-codex" on the next turn.

Callers in cli.py, gateway/run.py, and tui_gateway/server.py already catch
the exception and warn the user, but the warning was misleading because
the swap had partially succeeded; the agent's state was torn.

Snapshot every mutated field before the swap, wrap the swap+rebuild block
in try/except, and restore the snapshot on failure before re-raising so
the caller's warning surfaces.

Reported by @amirariff91. Tests cover both branches (chat_completions and
anthropic_messages) and the cross-branch case (anthropic -> openai).
2026-05-27 05:43:20 -07:00
ethernet
825948edab ci(docker): simplify tagging — push both :main and :latest on main push
Remove the ancestor-check gate and the separate move-latest job.
On main pushes, the merge job now tags both :main and :latest in
a single imagetools create call. Releases still get :<tag> only.

Removed:
- move-latest job (ancestor check + retag dance)
- Decide whether to move :main step (ancestor check in merge)
- Compute tag step
- push_main gate on manifest push
- merge job outputs (nothing downstream needs them anymore)
2026-05-27 05:32:19 -07:00
Teknium
b4eea187d5 fix(xai-oauth): gate slash-enum strip on model name + add regression tests (#28490)
Three additions on top of @Nami4D's salvage:

1. Gate the preflight slash-enum strip on the model name pattern
   (grok-* / x-ai/grok-*).  The original PR stripped slash-containing
   enum values from every codex_responses request, but native Codex
   (OpenAI) and GitHub Models DO accept slash enums — stripping them
   there would silently degrade tool-schema constraints.  xAI is the
   only Responses-API surface that rejects the shape.

2. Resolve the merge conflict in agent/transports/codex.py by
   preserving both the timeout-forwarding block that landed on main
   between the PR's branch point and now AND the new service_tier
   strip.  Behavioural intent of both is preserved.

3. Six new tests in tests/agent/transports/test_codex_transport.py
   covering:
   - TestCodexTransportXaiServiceTierStrip (3 tests): xAI strips
     service_tier from request_overrides; non-xAI codex_responses
     and GitHub Models both KEEP service_tier (regression guards
     so the strip stays xAI-only).
   - TestPreflightSlashEnumStrip (3 tests): Grok and aggregator-
     prefixed Grok model names both trigger the safety-net strip;
     non-Grok models preserve slash enums as a regression guard
     against the strip becoming too broad.

51/51 in tests/agent/transports/test_codex_transport.py.

Co-authored-by: Nami4D <hello@nami4d.tech>
2026-05-27 05:25:38 -07:00
Nami4D
a699de83ec fix(xai-oauth): strip service_tier and add safety-net sanitization for slash enums
xAI's /v1/responses endpoint rejects service_tier with HTTP 400
"Argument not supported: service_tier" when users activate /fast mode.

Also add a safety-net strip_slash_enum call in _preflight_codex_api_kwargs
to catch any tool schemas that might slip through the caller-level
sanitization. xAI's Responses API grammar compiler rejects enum values
containing forward slashes (e.g. HuggingFace model IDs like
"Qwen/Qwen3.5-0.8B") with the opaque "Invalid arguments passed to the
model" error.

Fixes the root cause of "Invalid arguments passed to the model" errors
reported by xAI OAuth (SuperGrok) users.
2026-05-27 05:25:38 -07:00
Teknium
0325e18f34
fix(gateway): keep Telegram heartbeat + interim commentary on; edit heartbeat in place (#33187)
#33151 flipped THREE Telegram display defaults to false:
  - tool_progress: new -> off            (kept: per-tool stream is too chatty)
  - interim_assistant_messages: T -> F   (REVERTED here)
  - long_running_notifications: T -> F   (REVERTED here)
  - busy_ack_detail: T -> F              (kept: verbose iteration counter)

The two reverts were wrong. interim_assistant_messages = the model's REAL
words mid-turn ("I'll inspect the repo first.", "Let me check both files
in parallel"). That is signal, not noise. Suppressing it left Telegram
users staring at "typing..." for the entire turn duration with no
feedback. long_running_notifications = the periodic heartbeat. Silent
agent for 30 minutes is worse than one bubble updating every 3 minutes.

Changes:
  - gateway/display_config.py: Telegram tier-1 inbox keeps both defaults
    on (only tool_progress and busy_ack_detail stay off).
  - gateway/run.py _notify_long_running(): edit a single heartbeat
    message in place (where the adapter supports it) instead of posting
    a new "Still working..." bubble each interval. Telegram, Discord,
    Slack, Matrix all qualify. Falls back to send-new when edit fails.
  - gateway/run.py: tighten heartbeat text. " Still working... (12 min
    elapsed — iteration 21/60, running: terminal)" -> " Working — 12
    min, terminal". Verbose iteration detail moves behind busy_ack_detail
    (one knob now controls both busy acks AND heartbeat verbosity).
  - tests/, cli-config.yaml.example, website/docs/user-guide/messaging:
    updated to reflect the corrected story.
2026-05-27 05:21:53 -07:00
Teknium
69dfcdcc15 fix(auth): codex chat path falls back to credential_pool when singleton is empty
Closes #32992.

The chat path resolves Codex credentials via `resolve_codex_runtime_credentials`
which only reads `providers.openai-codex.tokens` (the singleton). The auxiliary
path uses `_read_codex_access_token` which checks the credential_pool first.
For users whose tokens live only in the pool — manual seed, partial re-auth,
restore from backup, or any state where the singleton is empty but the pool
is healthy — the chat path raised AuthError or (worse, since OpenAI(api_key='')
silently attaches no header) the wire saw HTTP 401 "Missing Authentication header"
while the auxiliary path worked fine.

This adds a pool fallback to `resolve_codex_runtime_credentials`: when the
singleton has no usable access_token, scan `credential_pool.openai-codex` for
the first entry that has a non-empty access_token and isn't in an exhaustion
cooldown window (`last_error_reset_at` in the future). If found, return that
token with `source="credential_pool"`. If no usable entry exists, the original
AuthError propagates as before.

Regression tests cover:
- Empty singleton + healthy pool entry → pool token returned
- Pool fallback skips entries currently in cooldown
- Empty singleton + empty/wedged pool → AuthError propagates (existing contract preserved)
2026-05-27 03:43:51 -07:00
Ben
3e33e14335 fix(docker): discover agent-browser Chromium binary at boot
The image's Dockerfile runs npx playwright install chromium, which
populates $PLAYWRIGHT_BROWSERS_PATH (=/opt/hermes/.playwright) with a
`chromium_headless_shell-<build>/chrome-headless-shell-linux64/` tree.
agent-browser (the runtime CLI Hermes spawns for the browser tool)
doesn't recognise this layout in its own cache scan and fails with
`Auto-launch failed: Chrome not found` — even though the binary is
right there.

Reproduction on current main:

    $ docker run --rm <image> sh -c 'npx -y agent-browser snapshot --url about:blank'
    ✗ Auto-launch failed: Chrome not found. Checked:
      - agent-browser cache: /tmp/.../.agent-browser/browsers
      - System Chrome installations
      - Puppeteer browser cache
      - Playwright browser cache
    Run `agent-browser install` to download Chrome, or use --executable-path.

Fix: at boot, locate the binary under $PLAYWRIGHT_BROWSERS_PATH and
export AGENT_BROWSER_EXECUTABLE_PATH via /run/s6/container_environment
so the with-contenv shebang on main-wrapper.sh propagates it into the
supervised `hermes` process and thence to agent-browser subprocesses.

Filename-matched (chrome / chromium / chrome-headless-shell /
chromium-browser), not path-matched: the chromium dir contains many
shared libraries (libGLESv2.so, libEGL.so, ...) which inherit the
executable bit from Playwright's tarball but are NOT browser binaries.
Compare PR #18635's earlier `find | grep -Ei 'chrome|chromium'` which
would match the path .../chrome-headless-shell-linux64/libGLESv2.so
and pick a .so as the browser binary.

User overrides (e.g. `-e AGENT_BROWSER_EXECUTABLE_PATH=/usr/bin/...`)
are respected — the discovery block is skipped when the env var is
already set. Quietly skipped when $PLAYWRIGHT_BROWSERS_PATH doesn't
exist (e.g. custom builds that strip Playwright).

This salvages PR #18635 by @jackey8616, who identified the bug and
proposed the same env-var approach but in the now-deprecated
docker/entrypoint.sh shim and with a path-match find command that
selected .so files instead of the chrome binary. The fix retargets
docker/stage2-hook.sh (the s6-overlay cont-init script where boot-time
env setup belongs) with a corrected filename-match query.

Fixes #15697
Closes #18635

Co-authored-by: Clooooode <12930377+jackey8616@users.noreply.github.com>
2026-05-27 20:43:27 +10:00
helix4u
ea34925002 fix(discord): recover Windows voice opus decoding 2026-05-27 03:35:33 -07:00
Ben Barclay
bb65bebed7
Merge pull request #30504 from ilonagaja509-glitch/fix/30394-docker-anthropic-package
fix(docker): include anthropic, bedrock, azure-identity extras in image

Fixes #30394. Air-gapped/restricted-network Docker containers can't reach
PyPI for lazy-install, so `--extra anthropic --extra bedrock --extra
azure-identity` are now added to the Dockerfile's `uv sync` so these
provider packages are baked into the published image.

The [all] extra deliberately excludes these (per the 2026-05-12
lazy-install policy on [all]) to keep `uv sync --locked` from breaking
when one of their pinned versions gets PyPI-quarantined. The Dockerfile
adds them back via additive --extra flags, mirroring the existing
--extra messaging pattern (issue #24698 / test_dockerfile_pid1_reaping.py).

Follow-up: separate PR will bump pyproject.toml's [anthropic] extra
from 0.86.0 to 0.87.0 to converge with tools/lazy_deps.py's
CVE-patched pin (CVE-2026-34450, CVE-2026-34452).
2026-05-27 20:29:13 +10:00
Teknium
0b6ace6498 test(verbose): align with telegram tier-1 inbox default
Two tests in test_verbose_command.py asserted Telegram's tool_progress
default was "new" and expected /verbose to cycle that to "all". The
default has since been overridden to "off" in gateway/display_config.py
(_PLATFORM_DEFAULTS for telegram — tier-1 inbox preset that keeps mobile
chats final-answer-first), making the first /verbose invocation cycle
off → new, not all → verbose.

The behavioral change was intentional; the tests were stale and missing
from the same commit. Surfaced as a pre-existing failure on origin/main
during CI for the unrelated #33164 / #33168 Codex auth salvages.
2026-05-27 03:13:15 -07:00
konsisumer
f1422ffd77 fix(gateway): classify Codex 429 quota as rate-limit, not missing credentials
When the Codex OAuth token endpoint returns 429 (usage-limit / quota
exhaustion), refresh_codex_oauth_pure raised a generic auth error that the
gateway surfaced as 'Primary provider auth failed: No Codex credentials
stored. Run hermes auth', prompting re-auth that cannot lift a quota cap.

Classify 429 distinctly (codex_rate_limited, relogin_required=False) with a
non-alarming quota message that honors Retry-After, log it as
'Primary provider rate-limited (429)', and stop format_auth_error from
appending the re-authenticate remediation. Also log the fallback provider's
literal config key instead of the resolved runtime category.

Refs #32790
2026-05-27 03:13:15 -07:00
konsisumer
2bbd53493d fix(cli): sync credential_pool on Codex re-auth
Codex re-auth via `hermes setup` / `hermes model` wrote fresh OAuth
tokens to providers.openai-codex.tokens but left the credential_pool
device_code entry holding the consumed refresh token and stale error
markers. Since the runtime selects from the pool, the next request
spent a dead token and got a 401 token_invalidated. Update the
singleton-seeded pool entries in lockstep and clear their error state.

Fixes #33000
2026-05-27 03:02:06 -07:00
Teknium
4feb181eb4 chore(release): map sir-ad + rdasilva1016-ui in AUTHOR_MAP 2026-05-27 02:41:24 -07:00
Teknium
2f7ba51b80 refactor(gateway): drop try/except wrappers around resolve_display_setting
The two new display-resolution sites added by #31034 (busy_ack_detail
and long_running_notifications) wrapped resolve_display_setting() in
try/except Exception. The existing 4 call sites in this file don't —
the function is safe by contract. Match the established pattern and
drop the redundant guards. -16 LOC, no behaviour change.
2026-05-27 02:41:24 -07:00
houenyang-momo
60f84c6c28 gateway: quiet Telegram operational chatter 2026-05-27 02:41:24 -07:00
Robert DaSilva
efa952531b fix: ignore Telegram start pings 2026-05-27 02:41:24 -07:00