Commit graph

8400 commits

Author SHA1 Message Date
Teknium
4ad5fa702f
docs(xai-oauth): add xai-oauth to provider enumeration pages (#26542)
Follow-up to #26534 (xai-oauth provider). The new guide and integrations
page were shipped with the salvage, but four reference/enumeration pages
still listed every other OAuth provider without xai-oauth:

- reference/cli-commands.md     — `--provider` choices list
- reference/environment-variables.md — HERMES_INFERENCE_PROVIDER values
- user-guide/configuration.md   — auxiliary-task provider list, OAuth
                                  tip block (mirrored from MiniMax OAuth),
                                  and provider table row
- user-guide/features/fallback-providers.md — provider table
2026-05-15 12:33:12 -07:00
teknium1
aac6d97a14 chore(xai-oauth): trim CORS allowlist to xAI auth origins
Drop accounts.mouseion.dev and localhost:20000 / 127.0.0.1:20000 from
the loopback callback CORS allowlist — leftover dev origins. The
redirect_uri is bound to 127.0.0.1 and gated by PKCE + state, so only
xAI's own auth origins are needed.

Co-Authored-By: Jaaneek <Jaaneek@users.noreply.github.com>
2026-05-15 12:11:32 -07:00
Jaaneek
7d7cdd48e0 test(xai-oauth): use grok-4.3 instead of retiring grok-code-fast-1
Per @mark-xai's review on PR #26457 and the xAI model retirement on
2026-05-15: grok-code-fast-1 is being retired today and aliases redirect
to grok-4.3 (already pinned to the top of the xAI model list by this
PR). Update the two xAI Responses-API test fixtures Mark flagged plus
the picker fallback default in hermes_cli/main.py that uses the same
literal.
2026-05-15 12:11:32 -07:00
Jaaneek
1e4801b8d0 docs(xai-oauth): correct logout command (was hermes auth remove)
The previous "Logging Out" section showed `hermes auth remove xai-oauth`
with no positional target — argparse rejects that and the command does
not clear the singleton OAuth state anyway. The correct command for the
"clear everything" intent is `hermes auth logout xai-oauth`. Also point
users at `hermes auth remove xai-oauth <target>` for single-pool-row
deletion.
2026-05-15 12:11:32 -07:00
Jaaneek
7fdc16dd4a refactor(transports/codex): trim duplicated cache-key comments
The xAI prompt_cache_key block carried two long comment paragraphs
that either restated setdefault semantics, narrated the SDK
type-validation mechanism, or recapped the historical motivation for
the extra_body indirection — all already covered by the test
docstring at test_xai_responses_sends_cache_key_via_extra_body
(which links to the xAI docs). Also restored the truncated link in
the body-injection comment.

No behavior change.
2026-05-15 12:11:32 -07:00
Jaaneek
e13c1b8060 fix(xai-http): preserve ~/.hermes/.env fallback and XAI_STT_BASE_URL precedence
The new resolve_xai_http_credentials() resolver was using os.getenv()
for the XAI_API_KEY/XAI_BASE_URL fallback path, which dropped the
~/.hermes/.env contract guarded by PR #17140 / #17163. Users with
XAI_API_KEY in dotenv only would see "No xAI credentials found" even
though the key was configured.

Separately, _transcribe_xai started consulting creds["base_url"] (which
always returns at least the default https://api.x.ai/v1) ahead of the
public XAI_STT_BASE_URL env override, so the per-tool override stopped
working.

- tools/xai_http.py: add module-level get_env_value() wrapper that
  reads ~/.hermes/.env first (via hermes_cli.config.get_env_value),
  then os.environ. Resolver uses it for the API-key/base-url fallback.
- tools/transcription_tools.py: restore precedence so XAI_STT_BASE_URL
  wins over creds["base_url"].
- tests/tools/test_transcription_dotenv_fallback.py +
  tests/tools/test_tts_dotenv_fallback.py: repoint the per-call-site
  patches at the new resolution point (tools.xai_http.get_env_value).
  The end-to-end regression-guard test (which patches load_env) is
  unchanged and still passes.
2026-05-15 12:11:32 -07:00
Jaaneek
9eef53b960 chore(release): map Jaaneek@users.noreply.github.com to Jaaneek
The contributor's commit author email is the legacy GitHub noreply
form (no leading numeric "id+"), so it doesn't match the
check-attribution workflow's auto-resolve regex
(\+.*@users\.noreply\.github\.com). Register it explicitly in
AUTHOR_MAP so the PR #26457 attribution check passes.
2026-05-15 12:11:32 -07:00
Jaaneek
e4d7a5dffa fix(tools): video_gen picker reflects active xAI selection and runs xai_grok post_setup
Two bugs in the `hermes tools` reconfigure flow caused picking xAI Grok
Imagine for video_gen (or image_gen) to feel like a no-op:

1. `_is_provider_active()` had a branch for `image_gen_plugin_name` but
   none for `video_gen_plugin_name`, so a row marked as the active xAI
   video provider was never recognized as active. The picker fell through
   to the env-var fallback in `_detect_active_provider_index()`, which
   matched the FAL row (because `FAL_KEY` is set), so the picker visually
   defaulted to FAL even though the user had selected xAI.

2. `_plugin_video_gen_providers()` and `_plugin_image_gen_providers()`
   built picker rows from the plugin's `get_setup_schema()` but only
   copied `name`, `badge`, `tag`, `env_vars`. The xAI plugins declare
   `post_setup: "xai_grok"` so the picker should run the OAuth /
   API-key prompt hook after selection — that key was silently dropped,
   so the hook never fired from the picker rows.

Adds the missing `video_gen_plugin_name` branch (placed before the
`managed_nous_feature` block, mirroring the existing image_gen branch)
and propagates `post_setup` from the plugin schema into both picker-row
builders. Adds focused tests in `test_video_gen_picker.py` and
`test_image_gen_picker.py`.
2026-05-15 12:11:32 -07:00
Jaaneek
b62c997973 feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider
Adds a new authentication provider that lets SuperGrok subscribers sign
in to Hermes with their xAI account via the standard OAuth 2.0 PKCE
loopback flow, instead of pasting a raw API key from console.x.ai.

Highlights
----------
* OAuth 2.0 PKCE loopback login against accounts.x.ai with discovery,
  state/nonce, and a strict CORS-origin allowlist on the callback.
* Authorize URL carries `plan=generic` (required for non-allowlisted
  loopback clients) and `referrer=hermes-agent` for best-effort
  attribution in xAI's OAuth server logs.
* Token storage in `auth.json` with file-locked atomic writes; JWT
  `exp`-based expiry detection with skew; refresh-token rotation
  synced both ways between the singleton store and the credential
  pool so multi-process / multi-profile setups don't tear each other's
  refresh tokens.
* Reactive 401 retry: on a 401 from the xAI Responses API, the agent
  refreshes the token, swaps it back into `self.api_key`, and retries
  the call once. Guarded against silent account swaps when the active
  key was sourced from a different (manual) pool entry.
* Auxiliary tasks (curator, vision, embeddings, etc.) route through a
  dedicated xAI Responses-mode auxiliary client instead of falling back
  to OpenRouter billing.
* Direct HTTP tools (`tools/xai_http.py`, transcription, TTS, image-gen
  plugin) resolve credentials through a unified runtime → singleton →
  env-var fallback chain so xai-oauth users get them for free.
* `hermes auth add xai-oauth` and `hermes auth remove xai-oauth N` are
  wired through the standard auth-commands surface; remove cleans up
  the singleton loopback_pkce entry so it doesn't silently reinstate.
* `hermes model` provider picker shows
  "xAI Grok OAuth (SuperGrok Subscription)" and the model-flow falls
  back to pool credentials when the singleton is missing.

Hardening
---------
* Discovery and refresh responses validate the returned
  `token_endpoint` host against the same `*.x.ai` allowlist as the
  authorization endpoint, blocking MITM persistence of a hostile
  endpoint.
* Discovery / refresh / token-exchange `response.json()` calls are
  wrapped to raise typed `AuthError` on malformed bodies (captive
  portals, proxy error pages) instead of leaking JSONDecodeError
  tracebacks.
* `prompt_cache_key` is routed through `extra_body` on the codex
  transport (sending it as a top-level kwarg trips xAI's SDK with a
  TypeError).
* Credential-pool sync-back preserves `active_provider` so refreshing
  an OAuth entry doesn't silently flip the active provider out from
  under the running agent.

Testing
-------
* New `tests/hermes_cli/test_auth_xai_oauth_provider.py` (~63 tests)
  covers JWT expiry, OAuth URL params (plan + referrer), CORS origins,
  redirect URI validation, singleton↔pool sync, concurrency races,
  refresh error paths, runtime resolution, and malformed-JSON guards.
* Extended `test_credential_pool.py`, `test_codex_transport.py`, and
  `test_run_agent_codex_responses.py` cover the pool sync-back,
  `extra_body` routing, and 401 reactive refresh paths.
* 165 tests passing on this branch via `scripts/run_tests.sh`.
2026-05-15 12:11:32 -07:00
brooklyn!
9fb40e6a3d
fix(tui): restrict fast-echo bypass to ASCII so Vietnamese/CJK/IME input renders correctly (#26011)
* fix(tui): restrict fast-echo bypass to ASCII so Vietnamese/CJK/IME input renders correctly

The composer's fast-echo path (canFastAppend / canFastBackspace) writes
characters straight to stdout to skip an Ink re-render on the hot
typing path. The previous guard only checked
'stringWidth(text) === text.length', which lets a lot of non-ASCII
through:

  - Vietnamese precomposed letters (ề, ắ, ờ, ự, ...) report width 1 and
    length 1, but a Vietnamese Telex / IME stack produces them across
    multiple keystrokes; the intermediate composition state must be
    drawn by Ink so the rendered cell, the stored value, and the
    cursor column stay in lockstep when the final commit replaces the
    preview.
  - NFD combining marks (U+0300..U+036F) are zero-width but length 1,
    so even a passing equality lets them slip and silently desync the
    cell column.
  - CJK/East-Asian wide and emoji rejected only because their length
    differs, but the boundary was shape-shaped, not intent-shaped.

User-visible bug from the original report:
  Example: eê noiói nge neène
  -> the bypass committed the IME preview char before the diacritic
     replaced it, leaving doubled letters on screen.

Fix: gate fast-echo on pure printable ASCII (0x20-0x7e). The
performance-critical English typing path is unchanged; everything else
goes through the normal Ink render path so layout stays accurate.

Also extracts the shape preconditions as pure exported helpers
(canFastAppendShape / canFastBackspaceShape) so the regression matrix
is testable without spinning up a TextInput.

Tests: ui-tui/src/__tests__/textInputFastEcho.test.ts adds 20 cases
covering ASCII still works, Vietnamese precomposed + NFD, CJK, emoji,
NBSP / Latin-1, ANSI / control bytes, multi-line, and end-of-line
preconditions. Verified RED on the previous guard (11 of 20 fail) and
GREEN on the new guard.

Refs: #5221, #7443, #17602, #17603 (similar wide-char rendering bugs).

* docs(tui): clarify Vietnamese char terminology in regression comment

Address Copilot review: 'single byte width' implied UTF-8 byte semantics,
but the relevant property is JS code units (`text.length === 1`) and
display width (`stringWidth === 1`). Reworded to match.
2026-05-15 09:41:50 -05:00
Siddharth Balyan
d5416284f1
fix(tui): autonomous background process completion notifications (#26071) (#26327)
* feat(process-registry): add format_process_notification shared helper

* feat(process-registry): add drain_notifications method

* refactor(cli): use shared drain_notifications and format_process_notification

* feat(tui): add background notification poller for completion_queue

* feat(tui): wire notification poller into session init/finalize

* refactor(tui): add post-turn drain using shared helper as safety net
2026-05-15 19:31:00 +05:30
kshitij
db84a78e61
fix(langfuse): complete observability fix — trace I/O, tool outputs, placeholder credentials (closes #22342, #22763) (#26320)
* fix(langfuse): reject placeholder credentials with one-shot warning

When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY
at a template value like 'placeholder', 'test-key', or 'your-langfuse-key',
the Langfuse SDK silently accepts the credentials at construction time and
drops every trace at flush time. No warning, no error — just an empty
Langfuse dashboard the operator only notices hours later.

Add prefix-based validation in _get_langfuse() against the documented
'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side.
Anything else fires a single warning naming the offending env var(s)
with a log-safe value preview (full string for short placeholders so the
operator knows which template they left in place; truncated for long
values so a real secret pasted into the wrong field never hits the log),
then short-circuits via the existing _INIT_FAILED cache so the warning
fires once per process, not once per hook invocation.

The check sits after the 'Langfuse is None' SDK-installed guard so hosts
without the optional langfuse SDK don't see misleading 'set real keys'
hints when the actionable fix is 'pip install langfuse'. Missing
credentials remains the documented opt-out path and stays silent — no
log noise for unconfigured installs.

Fixes #22763
Fixes #23823

* fix(langfuse): use actual API request messages for generation input

on_pre_llm_request previously used the messages kwarg alone, which
could be None when Hermes passes the payload via request_messages,
conversation_history, or user_message instead. Add _coerce_request_messages
to pick the first available list across all variants, falling back to a
synthetic user message. Generations now show the real outbound payload
rather than an empty input.

* fix(langfuse): record tool call outputs in traces

Tool observations showed input (arguments) but output was always
undefined. Root cause: when tool_call_id is empty, pre_tool_call stored
observations under a unique time-based key that post_tool_call could
never reconstruct, so every tool span was closed without output by the
_finish_trace sweep.

Fix pre/post matching by routing empty-tool_call_id tools through a
per-name FIFO queue (pending_tools_by_name) instead of the time-based
key. Tools with a tool_call_id continue to use the id-keyed dict.

Also:
 - Preserve OpenAI-style nested function shape in serialized tool calls
   so Langfuse renders name/arguments correctly
 - Keep name + tool_call_id on role:tool messages for proper pairing
 - Backfill tool results onto the matching turn_tool_calls entry so the
   generation's tool-call record carries the result alongside arguments
 - Coerce request messages from whichever field the runtime provides
   (request_messages, messages, conversation_history, user_message)

* fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test

Self-review of the combined #22345 + #23831 salvage surfaced three issues
worth fixing in the same PR rather than as follow-ups:

1. Drop is_first_turn from the pre_api_request hook. The boolean expression
   `not bool(conversation_history)` was wrong: conversation_history is
   reassigned to None mid-run after compression (5 sites in run_agent.py),
   so the value flips False -> True mid-conversation on every post-compression
   API call. The langfuse plugin never consumed it, so the kwarg was both
   misleading AND dead.

2. Replace copy.deepcopy(request_messages) with shallow list() copy. The
   pre_api_request hook contract discards return values (invoke_hook never
   writes back to api_kwargs), and the langfuse plugin's _serialize_messages
   already builds its own snapshot dicts via _safe_value. A deepcopy on every
   API call would walk every tool result and base64 image — significant
   overhead for no real isolation benefit. Shallow copy of the outer list
   protects against later mutations of api_messages without paying for the
   inner-dict walk.

3. Rename test_empty_tool_call_id_concurrent_fifo_order ->
   test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a
   real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8
   threads behind a barrier to actually exercise _STATE_LOCK on the
   pending_tools_by_name queue. The original test was sequential and only
   validated Python list semantics; this one validates the lock discipline.

4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED —
   that function does not exist. Tests reload the module via sys.modules.pop
   + importlib.import_module instead.

Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass.

---------

Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: Brian Conklin <brian@dralth.com>
2026-05-15 05:04:02 -07:00
kshitij
f199cd9f84
chore(release): map brian@dralth.com to btorresgil for #22345 salvage (#26319)
PR #22345 by @btorresgil authors commits as 'Brian Conklin
<brian@dralth.com>' (git config carries a different name/email than the
GitHub account). GitHub's commit-author mapping correctly attributes these
commits to @btorresgil based on the public-key registration, but Hermes'
release attribution audit reads the raw commit email, not the GitHub
mapping. Without this AUTHOR_MAP entry, salvaging #22345 would fail
`scripts/contributor_audit.py` strict mode at release time.

Prerequisite for the langfuse trace fix salvage that cherry-picks
@btorresgil's commits onto current main.
2026-05-15 05:03:43 -07:00
kshitijk4poor
77276070f5 fix(codex-runtime): de-dup [plugins.X] tables and stop leaking HERMES_HOME into config.toml
Builds on @steezkelly's Bug A fix (#25857, top-level default_permissions
via _insert_managed_block_at_top_level) by addressing the other two
config-corruption bugs described in #26250:

Bug B (duplicate [plugins.X] tables)
  - Codex itself writes [plugins."<name>@<marketplace>"] tables to
    config.toml when the user runs `codex plugins enable` directly,
    before hermes-agent's managed block exists. On the next migrate run,
    _query_codex_plugins() re-discovers the same plugins via plugin/list
    and render_codex_toml_section() re-emits them inside the managed
    block. Codex's strict TOML parser then rejects the duplicate table
    header on startup.
  - Add _strip_unmanaged_plugin_tables() that drops [plugins.*] tables
    from the user-content portion of the file. Only run it when
    plugin/list succeeded — if the RPC failed we can't re-emit and
    must preserve the user's tables. plugin/list is the source of
    truth when it answers.

Bug C (HERMES_HOME pytest-tempdir leak into ~/.codex/config.toml)
  - _build_hermes_tools_mcp_entry() read HERMES_HOME directly from
    os.environ, so a sibling pytest's monkeypatch.setenv("HERMES_HOME",
    tmp_path) silently burned a transient pytest tempdir into the
    user's real ~/.codex/config.toml. After pytest reaped the tempdir,
    every codex-routed hermes-tools tool call failed silently.
  - Derive HERMES_HOME from get_hermes_home() (the canonical resolver
    that goes through the profile-aware path) and refuse to emit
    obvious test-tempdir paths via _looks_like_test_tempdir() as
    belt-and-suspenders for any other callsite that forgets to patch
    migrate().
  - test_enable_succeeds_when_codex_present in test_codex_runtime_switch.py
    invoked the real migrate() (no mock), writing to Path.home() / .codex
    using whatever HERMES_HOME the running pytest session had set. Add
    the same migrate patch the other apply() tests already use, so the
    suite stops touching the user's real ~/.codex/config.toml.

E2E verification (replicating the issue's repro):
  - Pre-state config.toml with user [mcp_servers.omx_team_run] +
    codex-installed [plugins."tasks@openai-curated"],
    HERMES_HOME="/private/var/folders/.../pytest-of-.../..."
  - On origin/main: tomllib refuses to load the result with
    "Cannot declare ('plugins', 'tasks@openai-curated') twice" AND
    the pytest-tempdir HERMES_HOME is burned in.
  - On this branch: file parses cleanly, default_permissions is
    top-level, exactly one [plugins."tasks@openai-curated"] table
    inside the managed block, no HERMES_HOME in the MCP env.

7 new regression tests covering all three bugs + the test-leak guard.
`bash scripts/run_tests.sh tests/hermes_cli/test_codex_runtime_*.py` —
95 passed, 0 failed.

Closes #26250
2026-05-15 02:31:30 -07:00
Steve Kelly
274217316e fix(codex-runtime): keep migrated root keys top-level 2026-05-15 02:31:30 -07:00
nidhi-singh02
13c72fb486 fix(tools): wrap browser provider network calls with error handling
Wrap requests.post() in create_session() for browser_use, browserbase,
and firecrawl providers with requests.RequestException handling.
Connection timeouts and DNS resolution failures now surface as clean
RuntimeError messages instead of raw requests exception tracebacks.

Browser Use managed-gateway mode preserves raw exception propagation
so the existing idempotency-key retry semantics keep working.

Closes #2746

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-15 01:53:06 -07:00
aydnOktay
6af9942327 fix(url-safety): allow only http and https schemes 2026-05-15 01:52:48 -07:00
nidhi-singh02
8373956850 fix(slack): guard split()[0] against whitespace-only command text
When a user sends a Slack message like '/hermes   ' (trailing whitespace
after the slash) the legacy subcommand router hit `text.split()[0]` with
a truthy-but-whitespace-only `text`. `'   '.split()` returns `[]` →
IndexError, blowing up the slash handler before fallthrough to `/help`.

Switch to a two-step guard that materializes the parts list first and
indexes only if non-empty.

Salvaged from PR #2752 by @nidhi-singh02. The PR's other two hunks
(`tools/file_operations.py`, `agent/anthropic_adapter.py`) are
unreachable in current code — `LINTERS` is a hardcoded constant dict
with no empty values, and the anthropic version-detection site is
already guarded by a `result.stdout.strip()` truthy check — so only the
slack hunk is taken.

Closes #2745

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-15 01:50:56 -07:00
teknium1
94bdc63ff5 chore(release): add AUTHOR_MAP entry for nidhi-singh02
PR #2751 salvage. CI requires AUTHOR_MAP coverage for all
contributor commit emails.
2026-05-15 01:50:41 -07:00
Nidhi Singh
eacb398f75 fix(tools): add return_exceptions to asyncio.gather in web_tools
Three asyncio.gather() calls in tools/web_tools.py ran without
return_exceptions=True. A single failing task (e.g. LLM rate limit on
one URL) would raise out of gather() and discard every other
successfully fetched/summarized result.

Pass return_exceptions=True and filter BaseException entries with a
warning log before unpacking. Affects:

- chunk summarization gather (large web_extract pages)
- firecrawl per-result LLM post-processing
- tavily crawl per-result LLM post-processing

Closes #2744
2026-05-15 01:50:41 -07:00
teknium1
5301cc212b chore(release): add AUTHOR_MAP entry for nidhi-singh02 2026-05-15 01:50:07 -07:00
nidhi-singh02
c4a21d7831 fix(cli): log swallowed exception in runtime model auto-detection
Replaces bare `except Exception: pass` with debug-level logging
so failures in local endpoint model discovery are diagnosable
instead of silently hidden.
2026-05-15 01:50:07 -07:00
teknium1
59c7cc64f0 chore(release): add AUTHOR_MAP entry for amethystani 2026-05-15 01:43:54 -07:00
Animesh Mishra
55f3262e78 fix(mcp): pre-compile env-var regex and unify interpolation
Remove redundant inner `import re` and regex recompilation on every call in
_interpolate_env_vars. Add module-level _ENV_VAR_PATTERN compiled once.

Replace the separate _interpolate_value() in mcp_config.py (which used \w+
and would silently fail on env vars containing hyphens or dots) with the
shared _ENV_VAR_PATTERN from mcp_tool.py. Remove now-unused import re.
2026-05-15 01:43:54 -07:00
teknium1
5360b54244 fix(providers): set User-Agent on ProviderProfile.fetch_models
Some catalog endpoints (OpenCode Zen, etc.) sit behind a WAF that
returns 403 for the default Python-urllib/<ver> User-Agent.  The
generic profile-based live fetch in providers/base.py was silently
failing for any such provider — falling through to the static catalog
and missing newly-launched models.

Set a generic 'hermes-cli/<version>' UA on the catalog probe so every
api_key provider profile benefits.  Verified live against opencode-zen:
before this change, profile.fetch_models() raised HTTP 403; after, it
returns 42 models including gpt-5.5, gpt-5.5-pro, kimi-k2.6, glm-5.1
and the *-free variants the static catalog doesn't list.

Also strip the now-stale comment in validate_requested_model() claiming
opencode-zen's /models returns 404 against the HTML marketing site —
the API endpoint at /zen/v1/models returns 200 with valid JSON.

Surfaced by #2651 (@aashizpoudel) — fixes the same user-facing gap
their PR targeted, applied at the right layer so all api_key provider
profiles get live catalogs through the same code path.

Co-authored-by: Aashish Poudel <mr.aashiz@gmail.com>
2026-05-15 01:42:21 -07:00
teknium1
647cc0bb0d chore(release): add AUTHOR_MAP entries for InB4DevOps 2026-05-15 01:42:08 -07:00
InB4DevOps
4f8aaf1046 perf(run_agent): accumulate length-continuation prefix via list+join
Replace O(n²) string concatenation of truncated_response_prefix in the
length-continuation retry loop with a list + ''.join(). Functionally
equivalent: same partial response on early return, same prepend on
final assembly. The legacy retry path is capped at 3 iterations, so
the practical wall-clock win is small, but the new idiom matches the
rest of the codebase and removes a needless repeated allocation.

Salvaged from PR #2717 (the run_conversation portion only — trajectory
refactor dropped because it silently rewrote </tool_response> to </think>).

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-15 01:42:08 -07:00
Mibayy
b6e07417c5 feat(cli): show YOLO mode warning in banner and status bar
When running with --yolo, all dangerous command approvals are bypassed.
Make this state visible so users don't forget:

- Banner: '⚠ YOLO mode — all approval prompts bypassed' line in red, only
  shown when YOLO is active. Default case is silent (no extra line, no
  always-on 'restricted' label).
- Status bar: '⚠ YOLO' fragment appended in red (#FF4444 bold) across all
  three width tiers (<52, <76, ≥76) in both the plain-text fallback and
  the fragments builder.

Closes #2663

Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
2026-05-15 01:41:59 -07:00
teknium1
47614dbfca chore: wire simplex docs into sidebar + AUTHOR_MAP
- Adds plugins/platforms/simplex docs page to the messaging sidebar
  between LINE and Open WebUI.
- Maps louismichalot@hotmail.com -> Mibayy in scripts/release.py so the
  attribution check on the salvage PR passes.
2026-05-15 01:41:30 -07:00
Mibayy
09d9724a09 feat(gateway): add SimpleX Chat platform plugin
SimpleX Chat (https://simplex.chat) is a private, decentralised messenger
with no persistent user IDs — every contact is identified by an opaque
internal ID generated at connection time. This adds it as a Hermes
gateway platform via the plugin system.

The adapter connects to a local simplex-chat daemon via WebSocket,
listens for inbound messages, and sends replies. Originally proposed in
PR #2558 as a core-modifying integration; reshaped here as a self-
contained plugin under plugins/platforms/simplex/ with no edits to any
core file. Discovery is filesystem-based (scanned by gateway.config),
and the platform identity is resolved on demand via Platform("simplex").

Plugin contract:
- check_requirements() requires SIMPLEX_WS_URL AND the websockets package
- validate_config() / is_connected() accept env or config.yaml input
- _env_enablement() seeds PlatformConfig.extra (ws_url + home_channel)
- _standalone_send() supports out-of-process cron delivery
- interactive_setup() provides a stdin wizard for hermes gateway setup
- register() wires the adapter into the registry with required_env,
  install_hint, cron_deliver_env_var, allowed_users_env, and a
  platform_hint for the LLM.

Lazy dependency: the websockets Python package is imported inside the
functions that need it. The plugin is importable and discoverable even
when websockets is missing — check_requirements() simply returns False
until `pip install websockets` is run. No new pyproject extras are
introduced.

Environment variables:
  SIMPLEX_WS_URL             WebSocket URL of the daemon (required)
  SIMPLEX_ALLOWED_USERS      Comma-separated allowed contact IDs
  SIMPLEX_ALLOW_ALL_USERS    Set true to allow all contacts
  SIMPLEX_HOME_CHANNEL       Default contact for cron delivery
  SIMPLEX_HOME_CHANNEL_NAME  Human label for the home channel

Closes #2557.
2026-05-15 01:41:30 -07:00
teknium1
85782a4ed7 feat(acp): hermes acp --setup-browser bootstraps browser tools for registry installs
The Zed ACP Registry path (uvx --from 'hermes-agent[acp]==X' hermes-acp)
gets a Python-only install. Browser tools depend on the agent-browser npm
package + Chromium, neither of which are in the wheel. Without an
explicit bootstrap, registry users have no path to working browser tools.

Ship a bundled, idempotent bootstrap script (Linux/macOS bash + Windows
PowerShell) inside acp_adapter/bootstrap/ as wheel package-data. New
entry points:

  hermes acp --setup-browser        # interactive; prompts before Chromium download
  hermes acp --setup-browser --yes  # non-interactive
  hermes-acp --setup-browser

The terminal-auth flow (hermes acp --setup) also offers the browser
bootstrap as a follow-up after model selection, so first-run registry
users get the option without knowing the flag exists.

Key design choices:
- npm install -g --prefix $NODE_PREFIX so we never need sudo. System Node
  on PATH is respected; only the install target is redirected to the
  user-writable Hermes-managed Node prefix.
- tools/browser_tool.py::_browser_candidate_path_dirs() already walks
  $HERMES_HOME/node/bin, so installed binaries are discovered with no
  agent-side code change.
- System Chrome/Chromium detection short-circuits the ~400 MB Playwright
  download when a suitable browser already exists.
- Bash + PowerShell live as ONE copy each under acp_adapter/bootstrap/.
  Not duplicated under scripts/. install.sh and install.ps1 keep their
  inline browser blocks for the source-checkout path.

E2E validated end-to-end:
  bash bootstrap_browser_tools.sh --skip-chromium
    → installs agent-browser into ~/.hermes/node/bin/
  tools.browser_tool._find_agent_browser()
    → returns the installed path
  check_browser_requirements()
    → returns True (browser tools register)

Tests:
- tests/acp/test_entry.py: 11 tests covering --setup-browser dispatch
  (linux + windows + --yes forwarding + failure propagation), the
  terminal-auth follow-up prompt path, and a package-data wheel-shipping
  assertion that catches any future pyproject.toml regression.

Docs: website/docs/user-guide/features/acp.md gains a 'Browser tools
(optional)' subsection with the two-line install + what-it-does.
2026-05-15 01:38:24 -07:00
teknium1
9f57f2286d chore(release): add AUTHOR_MAP entry for buntingszn 2026-05-15 01:36:03 -07:00
buntingszn
6682f91b80 feat(cron): support name-based lookup for job operations
Cron mutation operations (run/pause/resume/remove) and 'hermes cron edit'
now accept a job name in addition to the hex ID, with case-insensitive
matching. Before this, 'hermes cron run my_job_name' died with
'Job with ID my_job_name not found' and forced the user to look up the
hex ID first.

The original PR matched by name but silently picked the first match when
two jobs shared a name. This version refuses to act on an ambiguous name
and surfaces every matching job (id, name, schedule, next_run_at) so the
caller can pick a specific ID.

- cron/jobs.py:
  - get_job() stays ID-only (preserves existing call-site semantics for
    web_server/api_server/curator/scheduler/test code that always passes
    real IDs).
  - resolve_job_ref() is the new name-or-ID resolver, used by pause/
    resume/trigger/remove_job. Exact ID match wins over a name match
    even if a different job's name happens to equal that ID. Ambiguous
    name match raises AmbiguousJobReference with all candidate IDs.
- tools/cronjob_tools.py: dispatch site uses resolve_job_ref, surfaces
  ambiguous matches as a structured error with the matching IDs.
- hermes_cli/cron.py: 'cron edit' uses resolve_job_ref so editing by
  name works and ambiguous names are reported with IDs.
- tests/cron/test_jobs.py: new TestResolveJobRef covering ID match,
  case-insensitive name match, ID-wins-over-name, ambiguous refusal,
  and that pause/resume/trigger/remove all refuse on ambiguity.

Closes #2627
2026-05-15 01:36:03 -07:00
Teknium
05d9f641c0
docs(cron): worked recipes for the wakeAgent pre-run gate (#26229)
Adds three pre-run gate recipes to the cron docs:
- file-change gate (stat + mtime + state file)
- external-flag gate (file presence)
- SQL-count gate (user's own database, not state.db)

These are the use cases @iankar8 proposed adding as a parallel
'trigger' subsystem in #2654. The existing `script` + `wakeAgent`
gate already covers all three at $0 — this lands the patterns as
documentation so users can find them, instead of adding a second
gating mechanism to the cron subsystem.
2026-05-15 01:34:15 -07:00
Teknium
9329e06696
feat(image-gen): actionable setup message when no FAL backend is reachable (#26222)
When the in-tree FAL path has no API key (and no managed gateway), the
handler used to return a bare 'FAL_KEY environment variable not set'
error. Users had no idea where to get a key, that a managed Nous
gateway exists, or that plugin-registered providers are an option.

Now `image_generate_tool` returns a structured multi-line message:
  - signup link (https://fal.ai)
  - managed-gateway status (if Nous tools are enabled)
  - pointer to `hermes tools` / `hermes plugins list` for alternate
    backends, so users on a stale `image_gen.provider` know where to look

The schema is untouched — `check_fn` still gates the tool out of the
schema when no backend is reachable at startup, consistent with every
other conditional tool. This patch fixes the call-time failure modes:
managed-gateway 5xx, plugin provider disappearing mid-session, etc.

Inspired by #2546 / @Mibayy. The PR was ~5700 commits stale against
the new plugin-aware image_gen architecture, so this is a forward port
of the actionable-error idea rather than a cherry-pick.


Closes #2543

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-05-15 01:33:13 -07:00
Siddharth Balyan
04b1fdaecf
security(deps): add upper bounds to 5 loose deps + document supply chain policy (#24226)
After the Mini Shai-Hulud supply chain campaign (May 2026) and the litellm
compromise (March 2026), codify the dependency pinning policy that was
established in PRs #2810 and #9801 but never written down for contributors.

Changes:
- pyproject.toml: Add tight upper bounds to the 5 deps that slipped
  through as review escapes from external contributor PRs:
  - hindsight-client>=0.4.22,<0.5 (was >=0.4.22)
  - aiosqlite>=0.20,<0.23 (was >=0.20)
  - asyncpg>=0.29,<0.32 (was >=0.29)
  - alibabacloud-dingtalk>=2.0.0,<3 (was >=2.0.0)
  - youtube-transcript-api>=1.2.0,<2 (was >=1.2.0)

  Pre-1.0 packages get <0.(current_minor+2) — tight enough to block
  hostile minor releases but loose enough to not require bumps every week.

- CONTRIBUTING.md: Add 'Dependency pinning policy' section under Security
  with the full rationale, table of source types + treatments, and examples.

- AGENTS.md: Add concise 'Dependency Pinning Policy' section for AI coding
  agents with the decision table and step-by-step checklist.

- supply-chain-audit.yml: Add dep-bounds job that fails PRs introducing
  PyPI deps without <ceiling upper bounds. Fires on pyproject.toml changes.
  Posts a PR comment with the specific unbounded specs found.

Refs: #2796 #2810 #9801 #24205
2026-05-15 01:33:08 -07:00
Wysie
681778a0b7 fix(whatsapp): fail fast when Baileys sendMessage hangs
Baileys' sock.sendMessage() can hang indefinitely while uploading
media to WhatsApp servers (and, less often, on text sends), pinning
the bridge's Express handler until the gateway's aiohttp timeout
fires — surfacing to the user as a 120s wait followed by an empty
error from the TTS/voice path.

Wrap every sock.sendMessage() call inside the bridge in a
sendWithTimeout() helper that rejects after WHATSAPP_SEND_TIMEOUT_MS
(default 60s) via Promise.race. The four call sites are /send,
/edit, and /send-media's primary send. Express handlers catch the
rejection in their existing try/catch and return a real 500 to the
gateway, which can then surface a retryable error.

Salvaged from #2608 — wysie diagnosed the hang and the
Promise.race shape; the other two parts of that PR (gateway HTTP
session pooling, base.py metadata kwarg removal) already landed on
main via separate routes and are no longer needed.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-15 01:30:48 -07:00
teknium1
0161d4bb6c chore(release): add AUTHOR_MAP entry for CoinTheHat 2026-05-15 01:29:31 -07:00
CoinTheHat
814c60092b fix: clean stale conversation mappings on response eviction/deletion
ResponseStore.put() and .delete() now remove conversations rows that
reference evicted or deleted response IDs, preventing 404 errors when
a conversation name is reused after its backing response was purged.

Adds regression tests for delete, eviction, and handler-level reuse.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-15 01:27:43 -07:00
KiraKatana
23ac522d37 fix(gateway): isinstance-guard string-form 429 error body
When a non-Anthropic provider (e.g. Morpheus proxy) returns a 429 with
`{"error": "Too Many Requests"}` instead of the expected
`{"error": {"type": ...}}` dict, _err_body.json().get("error", {})
returns the raw string and the next .get("type") line crashes with
AttributeError, taking down the message handler.

Guard with isinstance(_err_json, dict) so non-dict error bodies fall
through to the generic rate-limit hint.

Salvaged from PR #2587 by @KiraKatana. The PR's fallback-config
`base_url`/`api_key_env` fix was already implemented independently
on main (run_agent.py:8759-8780) with additional aliases and Ollama
Cloud host handling, so only the gateway guard is cherry-picked.

Co-authored-by: KiraKatana <kira.ops@proton.me>
2026-05-15 01:26:11 -07:00
teyrebaz33
e0e7397c32 fix(session): persist auto-reset state across gateway restarts
was_auto_reset, auto_reset_reason, and reset_had_activity were not
included in SessionEntry.to_dict() / from_dict(), so a gateway restart
between session expiry and the user's next message would silently drop
the auto-reset notification and context note.

Add the three fields to the serialization roundtrip with safe defaults
(False / None / False) so existing sessions.json files load cleanly.

Add three roundtrip tests to test_session_reset_notify.py.
2026-05-15 01:25:42 -07:00
kshitijk4poor
e0e4856d46 feat(skills-hub): add huggingface/skills as trusted default tap (#2549)
Adds Hugging Face's official skill catalog to the default GitHub taps and
classifies it as a trusted source alongside openai/skills and anthropics/skills.

- tools/skills_guard.py: huggingface/skills -> TRUSTED_REPOS
- tools/skills_hub.py: GitHubSource.DEFAULT_TAPS += huggingface/skills (skills/)
- website/docs: list it under default taps + trusted-source examples

Closes #2549.

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-15 01:25:33 -07:00
libo1106
0086cdaf93 refactor(yuanbao): improve quote media fallback — move to DispatchMiddleware, tighten conditions 2026-05-15 01:17:50 -07:00
libo1106
fc2754dbdf fix(yuanbao): resolve quoted file/image via transcript lookup when quote desc lacks ybres
When a user quotes a file message (type=3) and @bot, the quote's desc field
only contains the filename without a ybres:// resource reference. The existing
QuoteContextMiddleware only extracted media refs from desc using the ybres regex,
which always returned empty for file quotes.

Fix: add a transcript lookup fallback in QuoteContextMiddleware.handle() —
when quote_media_refs is empty but reply_to_message_id is set, search the
session transcript for the quoted message_id and extract ybres anchors from
its content.

Also fix message_type classification: when quote media resolves non-image files,
override message_type to DOCUMENT so gateway/run.py's document injection logic
properly prepends the file path and content for the agent.
2026-05-15 01:17:50 -07:00
libo1106
3df26b925c feat(yuanbao): prioritize quote media refs over history backfill in DispatchMiddleware 2026-05-15 01:17:50 -07:00
libo1106
80efe664ce feat(yuanbao): add quote_media_refs extraction to QuoteContextMiddleware 2026-05-15 01:17:50 -07:00
libo1106
d57a4b3eb5 feat(yuanbao): add _parse_resource_id and update _extract_text for ybres anchors 2026-05-15 01:17:50 -07:00
Siddharth Balyan
6bdad1f3b2
ci: add PyPI publish workflow (salvaged from #25901) (#26148)
* ci(pypi): add publish workflow for automated PyPI releases

Triggered by CalVer tag pushes from scripts/release.py (v20* pattern).
Three jobs: build (uv build) → publish (OIDC trusted publishing) → sign
(Sigstore + attach to existing GitHub Release).

- workflow_dispatch as manual escape hatch
- skip-existing for safe re-runs
- Graceful skip when GitHub Release not found (sign job)
- Top-level permissions: contents: read (CodeQL compliant)

Requires one-time setup: PyPI trusted publisher + GitHub pypi environment.

Co-authored-by: dmahan93 <44207705+dmahan93@users.noreply.github.com>

* fix(release): address review findings

- Stage acp_registry/agent.json in version bump commit (was silently left unstaged)
- Add missing return when no previous tags found without --first-release
- Fix get_pr_number return type annotation (str -> str | None)
- Prefer uv build over python -m build (matches CI workflow), with fallback
- Use unit separator (%x1f) in git log format to handle | in author names
- Add explicit encoding='utf-8' to .release_notes.md write

Workflow hardening:
- Gracefully skip signing when GitHub Release not found (env var gate
  instead of exit 1, so PyPI publish still shows green)

* fix(ci): harden PyPI workflow — SHA-pin actions, guard workflow_dispatch, explicit build flags

- Pin all actions to commit SHAs (supply-chain hardening for id-token:write)
- workflow_dispatch now requires confirm_tag input + checks out that tag
- Both uv build paths explicitly pass --sdist --wheel

---------

Co-authored-by: dmahan93 <44207705+dmahan93@users.noreply.github.com>
2026-05-15 13:21:48 +05:30
teknium1
f9ad7400e3 fix(goals): raise judge max_tokens 200 → 4096, make configurable
The freeform /goal judge was capped at max_tokens=200, which reliably
truncated the JSON verdict on reasoning-heavy models (deepseek-v4-pro,
qwq, etc.) — the model burns tokens on hidden reasoning before emitting
visible content, and the first /goal turn's prompt is larger than later
turns, blowing past 200. Symptom: agent.log shows
`judge reply was not JSON: '{"done": true, "reason": "The agent successfully'`
followed by repeated `judge returned empty response` lines, then the
goal pauses with a misleading 'judge model isn't returning the required
JSON verdict' message.

Diagnosed live by @helix4u — empirically verified that raising the
budget on an unmodified worktree makes the failures go away on the
exact configs users were hitting on Nous Plus subscription paths.

Changes:
- DEFAULT_JUDGE_MAX_TOKENS = 4096 (up from 200)
- New auxiliary.goal_judge.max_tokens config knob for tuning in
  specifically constrained setups
- _goal_judge_max_tokens() resolves the value with fail-open semantics
  (non-int / non-positive / load failure → default). load_config() is
  mtime-cached so per-turn lookup is cheap.

Scoped narrowly to the verified root cause — does not introduce a
submit_verdict tool-call schema (see #26162 / #23671 for that direction;
they can land separately if we want them).

Tests: tests/hermes_cli/test_goals.py + tests/cli/test_cli_goal_interrupt.py
+ tests/gateway/test_goal_verdict_send.py — 62/62 passing.

E2E verified: config override honored (8192), missing/garbage/zero
values fall back to 4096, no-auxiliary-section falls back to 4096.

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>

Credits:
- @helix4u (Gille) — diagnosed the max_tokens=200 truncation via live
  testing on an unmodified worktree, drafted the original fix shape
  in #26162.
- @AhmetArif0 — flagged the freeform judge fragility in #23671 from
  the tool-call angle.
- @0xharryriddle (HarryRiddle.eth) — reported the issue from a Nous
  Plus subscription setup in #23876 with full debug reports.

Closes #23876
Supersedes #26162, #23671, #23881
2026-05-14 23:44:06 -07:00
Teknium
965ae7fa97
revert(cli): drop scrollback box width clamp (#25975), restore full-width borders (#26163)
#25975 (salvaging #24403) clamped decorative scrollback Panels and
streaming box rules to `max(32, min(width, 56))` as a defense against
terminal-emulator reflow when columns shrink. On any modern wide
terminal this made the response/reasoning borders look stubby — 56
cols inside a 200-col viewport.

#26137 (salvaging #25981, by @OutThisLife) landed a more fundamental
fix: prompt_toolkit's `_output_screen_diff` is monkey-patched so its
reserve-vertical-space cursor move no longer pushes chrome into
scrollback at all. With that in place, the clamp is no longer
load-bearing for the chrome-into-scrollback class of bugs — the
remaining risk is purely cosmetic reflow of *already stamped*
Panel borders during an aggressive column shrink, which we now
accept as a tradeoff for restoring proper full-width rendering.

Changes:
- `_scrollback_box_width()` returns `max(32, width)` (just the floor,
  no upper cap). All 10 call sites stay valid.
- Updated `test_scrollback_box_width_caps_to_resize_safe_value` to
  the new `test_scrollback_box_width_returns_viewport_width` asserting
  full-width passthrough above the 32-col floor.

Floor of 32 is kept so `'─' * (w - 2)` math stays positive on tiny
terminals.

Refs #18449 #19280 #22976 (the original reflow class) and #25975
(the clamp this reverts).
2026-05-14 23:30:16 -07:00