On Windows (msvcrt path), _file_lock() first checked if the lock file
existed and wrote it with write_text(), then opened it with open('r+').
Between these two calls, another process could delete the file causing
open('r+') to raise FileNotFoundError — uncaught, leaving memory writes
to proceed without holding the lock, risking data corruption.
Replace the three-line sequence with a single open('a+', ...) call which
atomically creates the file if missing or opens it if it exists, closing
the TOCTOU window entirely. The existing fd.seek(0) before msvcrt.locking()
is preserved and sufficient for correct lock byte positioning.
Root cause: TOCTOU between lock_path.write_text() and open('r+')
Impact: concurrent memory writes on Windows could corrupt MEMORY.md
Pairs with the prior commit (start() now inside the try block). If
threading.Thread.start() itself raises (OS thread exhaustion under
heavy delegation fanout), the finally would call .join() on a
never-started thread, which raises RuntimeError("cannot join thread
before it is started") — trading one rare bug for another.
Thread.ident is None until start() succeeds, so gate the join on it.
_heartbeat_thread.start() was called before the try/finally block that
contains _heartbeat_stop.set(). If _register_subagent() or any code
between .start() and try: raised an exception, the finally block would
never run — leaving the heartbeat thread as an orphan that continues
calling _touch_activity() on the parent agent, incorrectly resetting
gateway timeout counters.
Move _heartbeat_thread.start() to be the first statement inside the
try block so the finally block always reaches _heartbeat_stop.set()
regardless of how the child run completes or fails.
Root cause: heartbeat start outside try/finally scope
Impact: orphan heartbeat thread incorrectly resets parent gateway timeouts
* feat(skills/notion): overhaul for Notion Developer Platform (May 2026)
Notion shipped its Developer Platform on May 13, 2026: ntn CLI, Workers,
Markdown API, bidirectional webhooks, agent tools. The existing skill only
covered curl + integration token CRUD, so it didn't surface any of the new
ergonomics — particularly the /markdown endpoints (much easier for agents
to consume) and the ntn CLI for headless API + Workers management.
This rewrite (v1.0.0 -> v2.0.0):
- Splits setup into Path A (HTTP, cross-platform incl. Windows), Path B
(ntn CLI on macOS/Linux, with NOTION_API_TOKEN env var for headless),
and Path C (Windows fallback — HTTP API or WSL2; native ntn is 'coming
soon').
- Keeps the full curl reference (still the only Windows-compatible path).
- Adds /markdown endpoints — GET and PATCH page-as-markdown, plus POST
/v1/pages with a markdown body param. Agent-friendly, no CLI required.
- Adds ntn CLI cheat sheet for raw API shorthand, file uploads, and
workspace flags.
- Adds Notion Workers section: scaffold, tool/webhook capability shapes,
lifecycle commands. Gated on Business/Enterprise plans + macOS/Linux.
- Adds Notion-flavored Markdown reference (callouts, toggles, columns,
mentions, colors) for the /markdown endpoints.
- Adds a 'choose the right path' decision table at the bottom.
- Notes the new efficient Notion MCP server as an optional wiring path.
Auto-generated docs page regenerated via
website/scripts/generate-skill-docs.py.
* docs(skills-catalog): update notion description for v2.0.0
Catches the failure mode that produced #25045: a contributor PR whose
branch had been disconnected from main's history (likely an accidental
'git checkout --orphan' or '.git/' re-init). GitHub's merge UI does
not refuse merges of unrelated histories, so the PR landed cleanly
with its intended one-file change but its parent-less root commit
(413990c94) got grafted into main as a second root. The merge
resolution itself was correct — main's content won for every
conflicting file — but ~1500 files' worth of git blame collapsed
onto that single commit.
Implementation: 'git merge-base origin/main HEAD' exits non-zero and
prints nothing when the two commits share no ancestor. Check both
conditions and fail with a clear message + recovery steps.
Verified: against the historic state of PR #25045 (base 5d90386ba,
head 1149e75db), 'git merge-base' returns empty with exit 1, so the
new check would have rejected it.
Follow-up to #26592. The new docs/guides/oauth-over-ssh.md page was
linked from the two SSH-specific sections of the xAI Grok OAuth guide
but was missing from the surfaces a user is more likely to hit first:
- guides/xai-grok-oauth.md 'See Also' — add the SSH guide at the top
with a short qualifier so remote users notice it before clicking
through.
- integrations/providers.md xAI Grok OAuth callout — append the SSH
guide link alongside the existing xAI OAuth guide link.
- user-guide/configuration.md xai-oauth tip — same.
Docs build: zero warnings on touched files.
- installation.md: add tip about `hermes postinstall` for upfront dep install
- quickstart.md: show `hermes postinstall` in pip install flow
- updating.md: fix --check description to mention PyPI path for pip installs
- dep_ensure.py: use get_hermes_home() instead of hand-rolled env var
- dep_ensure.py: add "chrome" to browser name list (was inconsistent with browser_tool.py)
- main.py _cmd_update_check: use detect_install_method() directly instead of redundant .git check
- main.py _cmd_update_pip: build command list directly instead of fragile split() on display string
- banner.py: rename _check_via_pypi → check_via_pypi (cross-module public API)
Document pip install hermes-agent as a first-class install option.
Clarify that PyPI releases track tagged versions (major/minor),
not every commit on main — git installer is for bleeding-edge.
One-shot bootstrap that installs non-Python deps (node, browser,
ripgrep, ffmpeg) via ensure_dependency(), then runs setup if no
provider is configured. Closes the gap between `pip install` and
the full user-facing experience.
Also fixes 3 pre-existing test regressions caused by earlier commits:
- test_recommended_update_command: mock detect_install_method for git env
- test_check_for_updates_no_git_dir: now falls back to PyPI, not None
- test_plist_path_includes_node_modules_bin: skip when dir absent
Before: missing node → hard exit; missing browser → FileNotFoundError.
After: both try ensure_dependency() first, which prompts interactively
and delegates installation to install.sh --ensure.
ripgrep and ffmpeg already degrade gracefully (grep fallback, skip
conversion) so they don't need wiring.
Also documents the design rationale in dep_ensure.py: detection and
prompting live in Python (portable, instant, UX-integrated); only
the actual installation delegates to install.sh (1900 lines of
battle-tested OS/package-manager logic).
_cmd_update_check() had its own `.git` gate separate from _cmd_update_impl.
For pip installs, fork to _check_via_pypi() and display the result with
the correct recommended_update_command().
- banner.py: remove redundant `import json as _json` (json already at module level)
- main.py: _cmd_update_pip now delegates to recommended_update_command_for_method
instead of duplicating the uv-vs-pip detection logic
- main.py: remove redundant `import subprocess as _sp` (subprocess already at module level)
Match the full set of subdirs created by install.sh: pairing, hooks,
image_cache, audio_cache, and skills are now pre-created alongside the
existing cron, sessions, logs, logs/curator, and memories dirs. This
makes hermes doctor checks cleaner without changing any runtime behaviour.
When .git is absent and detect_install_method returns "pip", fork
hermes update to run `uv pip install --upgrade hermes-agent` (or
`python -m pip install --upgrade hermes-agent` as fallback) instead of
hard-exiting with "Not a git repository".
Adds detect_install_method() to identify nixos/homebrew/git/pip installs,
and recommended_update_command_for_method() to return the right upgrade command
for each method. Updates recommended_update_command() to use these for pip-installed
instances (no .git dir, not managed).
Add _find_bundled_tui() that checks for hermes_cli/tui_dist/entry.js
(present in wheel installs) and wire it into _make_tui_argv() between
the HERMES_TUI_DIR prebuilt path and the npm install fallback.
Extract PATH building into _build_service_path_dirs() that skips directories
which don't exist on disk (e.g. node_modules/.bin for pip installs) and also
includes ~/.hermes/node/bin and ~/.hermes/node_modules/.bin for agent-browser.
When cli-config.yaml.example is not present (e.g. pip wheel install),
fall back to writing DEFAULT_CONFIG via save_config() instead of
warning and requiring a manual fix.
For pip-installed hermes-agent (no .git directory), fall back to
querying PyPI's JSON API to compare __version__ against the latest
published release, using stdlib only (urllib + json, no packaging dep).
The top-of-file scope docstring listed delegate_task, memory, and
session_search as exposed tools, but EXPOSED_TOOLS deliberately omits
them (they're _AGENT_LOOP_TOOLS and require the running AIAgent context
to dispatch — the inline comment block already explains this). Kanban
tools, which ARE exposed, were missing from the docstring entirely.
Rewrite the Scope / DO NOT expose sections to match the actual tuple:
drop delegate_task/memory/session_search from 'expose', add the
kanban_* family, move delegate_task/memory/session_search/todo into
'DO NOT expose' with the agent-loop rationale.
Fixes#26567 (doc-only fix; option 2 — shimming memory/session_search
through MemoryStore/SessionDB directly — left for a follow-up issue
once the plugin-memory locking story is audited).
Stop the gateway from exiting (or systemd-restart-looping) when a single
messaging adapter fails at startup or runtime. A misconfigured WhatsApp
(npm install timeout, unpaired bridge, missing creds.json) used to take
the entire gateway down, killing cron jobs and any other connected
platforms with it.
Changes:
• Startup (gateway/run.py): when connected_count==0 but the only
errors are retryable, log a degraded-state warning and keep the
gateway alive instead of returning False. Reconnect watcher then
recovers platforms as their underlying problem clears.
• Runtime (gateway/run.py _handle_adapter_fatal_error): when the last
adapter goes down with a retryable error and is queued for
reconnection, stay alive instead of exit-with-failure. Previously
this triggered systemd Restart=on-failure, which created infinite
restart loops on persistent retryable failures (proxy outage,
repeated bridge crashes).
• Reconnect watcher (gateway/run.py _platform_reconnect_watcher):
replace the 20-attempt hard drop with a circuit-breaker pause.
After _PAUSE_AFTER_FAILURES (10) consecutive retryable failures, the
platform stays in _failed_platforms with paused=True so the watcher
skips it but the operator can still see and resume it. Non-retryable
errors still drop out of the queue immediately. Resolves#17063
(gateway giving up on Telegram after 20 attempts).
• WhatsApp preflight (gateway/platforms/whatsapp.py): refuse to start
the Node bridge when creds.json is missing. Sets a non-retryable
whatsapp_not_paired fatal error so the watcher drops it cleanly
with a single 'run hermes whatsapp' log line instead of paying the
30s bridge bootstrap timeout on every gateway start.
• WhatsApp setup ordering (hermes_cli/main.py cmd_whatsapp): only set
WHATSAPP_ENABLED=true once pairing actually succeeds. Previously
the wizard wrote the env var at step 2 (before npm install and QR
pairing), so any Ctrl+C left .env claiming WhatsApp was ready when
the bridge had no creds.json. Also propagate the env var when the
user keeps an existing pairing on a re-run.
• /platform slash command (hermes_cli/commands.py + gateway/run.py):
new gateway-only command for manual circuit-breaker control.
/platform list — show connected + failed/paused platforms
/platform pause <name> — silence a known-broken platform
/platform resume <name> — re-queue a paused platform
Tests:
• New: pause/resume helpers, /platform list|pause|resume command,
WhatsApp creds.json preflight, WhatsApp setup ordering.
• Updated: stale assertions that codified the old 'exit and let
systemd restart' behavior in test_runner_fatal_adapter.py,
test_runner_startup_failures.py, and test_platform_reconnect.py
(the 20-attempt give-up test became a circuit-breaker pause test).
5488 tests pass in tests/gateway/.
Two loopback-redirect OAuth flows (xAI Grok, Spotify) silently fail when
Hermes runs on a remote host: the auth server redirects to
127.0.0.1:<port> on the user's laptop, not on the remote box. The
--no-browser flag only suppresses webbrowser.open() — it doesn't change
the bind address. Symptom xAI surfaces is 'Could not establish
connection. We couldn't reach your app.', followed by a 'xAI
authorization timed out waiting for the local callback' on the CLI side.
Changes
- hermes_cli/auth.py: new _print_loopback_ssh_hint() helper, called from
_xai_oauth_loopback_login() and _spotify_login() right after they
print the redirect URI. Silent off SSH; on SSH prints the exact
'ssh -N -L <port>:127.0.0.1:<port>' command using the actually-bound
port (not the hardcoded constant — the listener auto-bumps when the
preferred port is busy), a provider-specific docs URL, and a link to
the new shared guide.
- website/docs/guides/oauth-over-ssh.md (new): single source of truth
for the tunnel pattern — TL;DR command, jump-box / ProxyJump variant,
mosh+tmux+ControlMaster gotchas, troubleshooting.
- website/docs/guides/xai-grok-oauth.md: fix the two sections that
claimed --no-browser alone was enough; link to the shared guide.
- website/docs/user-guide/features/spotify.md: expand the existing
one-liner; link to the shared guide.
- website/sidebars.ts: register the new page.
- tests/hermes_cli/test_auth_loopback_ssh_hint.py: 7 unit tests
covering SSH-vs-not, loopback-vs-not, malformed URIs, port echo,
with and without provider docs URL.
Fresh Windows installs were failing on first run with:
⚠ uv python install error: Downloading cpython-3.11.15-windows-x86_64-none (24.5MiB)
✗ Installation failed: Python was not found; run without arguments
to install from the Microsoft Store...
Two bugs compounding:
1) EAP=Stop swallows uv's stderr progress as an exception. uv writes
download progress ("Downloading cpython-3.11.15-windows-x86_64-none
(24.5MiB)") to stderr. With $ErrorActionPreference = "Stop" set at
the top of the script plus 2>&1 capture, PowerShell wraps each stderr
line as an ErrorRecord and throws on the first one — even though uv
exits 0 and Python was installed successfully. This was previously
fixed in commit ec1714e71 (May 8) but lost in the May 12 release
squash (413990c94). Reapply the EAP=Continue + verify-via
'uv python find' pattern.
2) System-python fallback invokes the Microsoft Store stub. When the uv
paths fall through, the legacy 'python --version' check invokes
%LOCALAPPDATA%\\Microsoft\\WindowsApps\\python.exe, a 0-byte
reparse-point stub that prints 'Python was not found...' to stdout
and exits non-zero. Get-Command matches it. The resulting error
message is what the user sees as the final installer crash. Detect
and skip the stub by checking for the \\WindowsApps\\ path
component or a 0-byte file size before invoking python.
Also save/restore EAP defensively in the catch blocks so a throw before
the assignment can't leave EAP in 'Continue'.
Wraps every sync->async coroutine-scheduling site in the codebase with a
new agent.async_utils.safe_schedule_threadsafe() helper that closes the
coroutine on scheduling failure (closed loop, shutdown race, etc.)
instead of leaking it as 'coroutine was never awaited' RuntimeWarnings
plus reference leaks.
22 production call sites migrated across the codebase:
- acp_adapter/events.py, acp_adapter/permissions.py
- agent/lsp/manager.py
- cron/scheduler.py (media + text delivery paths)
- gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper
which now delegates to safe_schedule_threadsafe)
- gateway/run.py (10 sites: telegram rename, agent:step hook, status
callback, interim+bg-review, clarify send, exec-approval button+text,
temp-bubble cleanup, channel-directory refresh)
- plugins/memory/hindsight, plugins/platforms/google_chat
- tools/browser_supervisor.py (3), browser_cdp_tool.py,
computer_use/cua_backend.py, slash_confirm.py
- tools/environments/modal.py (_AsyncWorker)
- tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to
factory-style so the coroutine is never constructed on a dead loop)
- tui_gateway/ws.py
Tests: new tests/agent/test_async_utils.py covers helper behavior under
live loop, dead loop, None loop, and scheduling exceptions. Regression
tests added at three PR-original sites (acp events, acp permissions,
mcp loop runner) mirroring contributor's intent.
Live-tested end-to-end:
- Helper stress test: 1500 schedules across live/dead/race scenarios,
zero leaked coroutines
- Race exercised: 5000 schedules with loop killed mid-flight, 100 ok /
4900 None returns, zero leaks
- hermes chat -q with terminal tool call (exercises step_callback bridge)
- MCP probe against failing subprocess servers + factory path
- Real gateway daemon boot + SIGINT shutdown across multiple platform
adapter inits
- WSTransport 100 live + 50 dead-loop writes
- Cron delivery path live + dead loop
Salvages PR #2657 — adopts contributor's intent over a much wider site
list and a single centralized helper instead of inline try/except at
each site. 3 of the original PR's 6 sites no longer exist on main
(environments/patches.py deleted, DingTalk refactored to native async);
the equivalent fix lives in tools/environments/modal.py instead.
Co-authored-by: JithendraNara <jithendranaidunara@gmail.com>
Build on @aydnOktay's cronjob fix by routing the cronjob check through
the shared 'env_var_enabled' helper in utils.py (same truthy set:
1/true/yes/on) and applying the same semantics to the 8 sibling call
sites that read HERMES_INTERACTIVE / HERMES_GATEWAY_SESSION /
HERMES_EXEC_ASK / HERMES_CRON_SESSION with bare os.getenv() truthy
checks:
- tools/approval.py: _is_gateway_approval_context (2), check_command_safety (2),
check_all_command_guards (3) -- 7 sites total
- tools/terminal_tool.py: _handle_sudo_failure, sudo password prompt -- 2 sites
- tools/skills_tool.py: _is_gateway_surface -- 1 site
Without this, a user who exports HERMES_INTERACTIVE=0 in their shell
still gets interactive sudo prompts, approval prompts, and gateway
skill-install paths -- only the cronjob tool was hardened. Now all
consumers agree on the same false-like values.
Also drops the duplicate _is_truthy_env helper from cronjob_tools.py
in favour of the existing canonical utils.env_var_enabled.
Tests: extend the parametrized regression coverage to all three
session env vars (HERMES_INTERACTIVE / HERMES_GATEWAY_SESSION /
HERMES_EXEC_ASK) symmetrically. tests/tools/test_cronjob_tools.py:
60/60 pass; tests/tools/{approval,terminal_tool,skills_tool,
cron_approval_mode,hardline_blocklist}.py: 378/378 pass.
Follow-up to #26534 (xai-oauth provider). The new guide and integrations
page were shipped with the salvage, but four reference/enumeration pages
still listed every other OAuth provider without xai-oauth:
- reference/cli-commands.md — `--provider` choices list
- reference/environment-variables.md — HERMES_INFERENCE_PROVIDER values
- user-guide/configuration.md — auxiliary-task provider list, OAuth
tip block (mirrored from MiniMax OAuth),
and provider table row
- user-guide/features/fallback-providers.md — provider table
Drop accounts.mouseion.dev and localhost:20000 / 127.0.0.1:20000 from
the loopback callback CORS allowlist — leftover dev origins. The
redirect_uri is bound to 127.0.0.1 and gated by PKCE + state, so only
xAI's own auth origins are needed.
Co-Authored-By: Jaaneek <Jaaneek@users.noreply.github.com>
Per @mark-xai's review on PR #26457 and the xAI model retirement on
2026-05-15: grok-code-fast-1 is being retired today and aliases redirect
to grok-4.3 (already pinned to the top of the xAI model list by this
PR). Update the two xAI Responses-API test fixtures Mark flagged plus
the picker fallback default in hermes_cli/main.py that uses the same
literal.
The previous "Logging Out" section showed `hermes auth remove xai-oauth`
with no positional target — argparse rejects that and the command does
not clear the singleton OAuth state anyway. The correct command for the
"clear everything" intent is `hermes auth logout xai-oauth`. Also point
users at `hermes auth remove xai-oauth <target>` for single-pool-row
deletion.
The xAI prompt_cache_key block carried two long comment paragraphs
that either restated setdefault semantics, narrated the SDK
type-validation mechanism, or recapped the historical motivation for
the extra_body indirection — all already covered by the test
docstring at test_xai_responses_sends_cache_key_via_extra_body
(which links to the xAI docs). Also restored the truncated link in
the body-injection comment.
No behavior change.
The new resolve_xai_http_credentials() resolver was using os.getenv()
for the XAI_API_KEY/XAI_BASE_URL fallback path, which dropped the
~/.hermes/.env contract guarded by PR #17140 / #17163. Users with
XAI_API_KEY in dotenv only would see "No xAI credentials found" even
though the key was configured.
Separately, _transcribe_xai started consulting creds["base_url"] (which
always returns at least the default https://api.x.ai/v1) ahead of the
public XAI_STT_BASE_URL env override, so the per-tool override stopped
working.
- tools/xai_http.py: add module-level get_env_value() wrapper that
reads ~/.hermes/.env first (via hermes_cli.config.get_env_value),
then os.environ. Resolver uses it for the API-key/base-url fallback.
- tools/transcription_tools.py: restore precedence so XAI_STT_BASE_URL
wins over creds["base_url"].
- tests/tools/test_transcription_dotenv_fallback.py +
tests/tools/test_tts_dotenv_fallback.py: repoint the per-call-site
patches at the new resolution point (tools.xai_http.get_env_value).
The end-to-end regression-guard test (which patches load_env) is
unchanged and still passes.
The contributor's commit author email is the legacy GitHub noreply
form (no leading numeric "id+"), so it doesn't match the
check-attribution workflow's auto-resolve regex
(\+.*@users\.noreply\.github\.com). Register it explicitly in
AUTHOR_MAP so the PR #26457 attribution check passes.
Two bugs in the `hermes tools` reconfigure flow caused picking xAI Grok
Imagine for video_gen (or image_gen) to feel like a no-op:
1. `_is_provider_active()` had a branch for `image_gen_plugin_name` but
none for `video_gen_plugin_name`, so a row marked as the active xAI
video provider was never recognized as active. The picker fell through
to the env-var fallback in `_detect_active_provider_index()`, which
matched the FAL row (because `FAL_KEY` is set), so the picker visually
defaulted to FAL even though the user had selected xAI.
2. `_plugin_video_gen_providers()` and `_plugin_image_gen_providers()`
built picker rows from the plugin's `get_setup_schema()` but only
copied `name`, `badge`, `tag`, `env_vars`. The xAI plugins declare
`post_setup: "xai_grok"` so the picker should run the OAuth /
API-key prompt hook after selection — that key was silently dropped,
so the hook never fired from the picker rows.
Adds the missing `video_gen_plugin_name` branch (placed before the
`managed_nous_feature` block, mirroring the existing image_gen branch)
and propagates `post_setup` from the plugin schema into both picker-row
builders. Adds focused tests in `test_video_gen_picker.py` and
`test_image_gen_picker.py`.
Adds a new authentication provider that lets SuperGrok subscribers sign
in to Hermes with their xAI account via the standard OAuth 2.0 PKCE
loopback flow, instead of pasting a raw API key from console.x.ai.
Highlights
----------
* OAuth 2.0 PKCE loopback login against accounts.x.ai with discovery,
state/nonce, and a strict CORS-origin allowlist on the callback.
* Authorize URL carries `plan=generic` (required for non-allowlisted
loopback clients) and `referrer=hermes-agent` for best-effort
attribution in xAI's OAuth server logs.
* Token storage in `auth.json` with file-locked atomic writes; JWT
`exp`-based expiry detection with skew; refresh-token rotation
synced both ways between the singleton store and the credential
pool so multi-process / multi-profile setups don't tear each other's
refresh tokens.
* Reactive 401 retry: on a 401 from the xAI Responses API, the agent
refreshes the token, swaps it back into `self.api_key`, and retries
the call once. Guarded against silent account swaps when the active
key was sourced from a different (manual) pool entry.
* Auxiliary tasks (curator, vision, embeddings, etc.) route through a
dedicated xAI Responses-mode auxiliary client instead of falling back
to OpenRouter billing.
* Direct HTTP tools (`tools/xai_http.py`, transcription, TTS, image-gen
plugin) resolve credentials through a unified runtime → singleton →
env-var fallback chain so xai-oauth users get them for free.
* `hermes auth add xai-oauth` and `hermes auth remove xai-oauth N` are
wired through the standard auth-commands surface; remove cleans up
the singleton loopback_pkce entry so it doesn't silently reinstate.
* `hermes model` provider picker shows
"xAI Grok OAuth (SuperGrok Subscription)" and the model-flow falls
back to pool credentials when the singleton is missing.
Hardening
---------
* Discovery and refresh responses validate the returned
`token_endpoint` host against the same `*.x.ai` allowlist as the
authorization endpoint, blocking MITM persistence of a hostile
endpoint.
* Discovery / refresh / token-exchange `response.json()` calls are
wrapped to raise typed `AuthError` on malformed bodies (captive
portals, proxy error pages) instead of leaking JSONDecodeError
tracebacks.
* `prompt_cache_key` is routed through `extra_body` on the codex
transport (sending it as a top-level kwarg trips xAI's SDK with a
TypeError).
* Credential-pool sync-back preserves `active_provider` so refreshing
an OAuth entry doesn't silently flip the active provider out from
under the running agent.
Testing
-------
* New `tests/hermes_cli/test_auth_xai_oauth_provider.py` (~63 tests)
covers JWT expiry, OAuth URL params (plan + referrer), CORS origins,
redirect URI validation, singleton↔pool sync, concurrency races,
refresh error paths, runtime resolution, and malformed-JSON guards.
* Extended `test_credential_pool.py`, `test_codex_transport.py`, and
`test_run_agent_codex_responses.py` cover the pool sync-back,
`extra_body` routing, and 401 reactive refresh paths.
* 165 tests passing on this branch via `scripts/run_tests.sh`.
* fix(tui): restrict fast-echo bypass to ASCII so Vietnamese/CJK/IME input renders correctly
The composer's fast-echo path (canFastAppend / canFastBackspace) writes
characters straight to stdout to skip an Ink re-render on the hot
typing path. The previous guard only checked
'stringWidth(text) === text.length', which lets a lot of non-ASCII
through:
- Vietnamese precomposed letters (ề, ắ, ờ, ự, ...) report width 1 and
length 1, but a Vietnamese Telex / IME stack produces them across
multiple keystrokes; the intermediate composition state must be
drawn by Ink so the rendered cell, the stored value, and the
cursor column stay in lockstep when the final commit replaces the
preview.
- NFD combining marks (U+0300..U+036F) are zero-width but length 1,
so even a passing equality lets them slip and silently desync the
cell column.
- CJK/East-Asian wide and emoji rejected only because their length
differs, but the boundary was shape-shaped, not intent-shaped.
User-visible bug from the original report:
Example: eê noiói nge neène
-> the bypass committed the IME preview char before the diacritic
replaced it, leaving doubled letters on screen.
Fix: gate fast-echo on pure printable ASCII (0x20-0x7e). The
performance-critical English typing path is unchanged; everything else
goes through the normal Ink render path so layout stays accurate.
Also extracts the shape preconditions as pure exported helpers
(canFastAppendShape / canFastBackspaceShape) so the regression matrix
is testable without spinning up a TextInput.
Tests: ui-tui/src/__tests__/textInputFastEcho.test.ts adds 20 cases
covering ASCII still works, Vietnamese precomposed + NFD, CJK, emoji,
NBSP / Latin-1, ANSI / control bytes, multi-line, and end-of-line
preconditions. Verified RED on the previous guard (11 of 20 fail) and
GREEN on the new guard.
Refs: #5221, #7443, #17602, #17603 (similar wide-char rendering bugs).
* docs(tui): clarify Vietnamese char terminology in regression comment
Address Copilot review: 'single byte width' implied UTF-8 byte semantics,
but the relevant property is JS code units (`text.length === 1`) and
display width (`stringWidth === 1`). Reworded to match.
* fix(langfuse): reject placeholder credentials with one-shot warning
When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY
at a template value like 'placeholder', 'test-key', or 'your-langfuse-key',
the Langfuse SDK silently accepts the credentials at construction time and
drops every trace at flush time. No warning, no error — just an empty
Langfuse dashboard the operator only notices hours later.
Add prefix-based validation in _get_langfuse() against the documented
'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side.
Anything else fires a single warning naming the offending env var(s)
with a log-safe value preview (full string for short placeholders so the
operator knows which template they left in place; truncated for long
values so a real secret pasted into the wrong field never hits the log),
then short-circuits via the existing _INIT_FAILED cache so the warning
fires once per process, not once per hook invocation.
The check sits after the 'Langfuse is None' SDK-installed guard so hosts
without the optional langfuse SDK don't see misleading 'set real keys'
hints when the actionable fix is 'pip install langfuse'. Missing
credentials remains the documented opt-out path and stays silent — no
log noise for unconfigured installs.
Fixes#22763Fixes#23823
* fix(langfuse): use actual API request messages for generation input
on_pre_llm_request previously used the messages kwarg alone, which
could be None when Hermes passes the payload via request_messages,
conversation_history, or user_message instead. Add _coerce_request_messages
to pick the first available list across all variants, falling back to a
synthetic user message. Generations now show the real outbound payload
rather than an empty input.
* fix(langfuse): record tool call outputs in traces
Tool observations showed input (arguments) but output was always
undefined. Root cause: when tool_call_id is empty, pre_tool_call stored
observations under a unique time-based key that post_tool_call could
never reconstruct, so every tool span was closed without output by the
_finish_trace sweep.
Fix pre/post matching by routing empty-tool_call_id tools through a
per-name FIFO queue (pending_tools_by_name) instead of the time-based
key. Tools with a tool_call_id continue to use the id-keyed dict.
Also:
- Preserve OpenAI-style nested function shape in serialized tool calls
so Langfuse renders name/arguments correctly
- Keep name + tool_call_id on role:tool messages for proper pairing
- Backfill tool results onto the matching turn_tool_calls entry so the
generation's tool-call record carries the result alongside arguments
- Coerce request messages from whichever field the runtime provides
(request_messages, messages, conversation_history, user_message)
* fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test
Self-review of the combined #22345 + #23831 salvage surfaced three issues
worth fixing in the same PR rather than as follow-ups:
1. Drop is_first_turn from the pre_api_request hook. The boolean expression
`not bool(conversation_history)` was wrong: conversation_history is
reassigned to None mid-run after compression (5 sites in run_agent.py),
so the value flips False -> True mid-conversation on every post-compression
API call. The langfuse plugin never consumed it, so the kwarg was both
misleading AND dead.
2. Replace copy.deepcopy(request_messages) with shallow list() copy. The
pre_api_request hook contract discards return values (invoke_hook never
writes back to api_kwargs), and the langfuse plugin's _serialize_messages
already builds its own snapshot dicts via _safe_value. A deepcopy on every
API call would walk every tool result and base64 image — significant
overhead for no real isolation benefit. Shallow copy of the outer list
protects against later mutations of api_messages without paying for the
inner-dict walk.
3. Rename test_empty_tool_call_id_concurrent_fifo_order ->
test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a
real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8
threads behind a barrier to actually exercise _STATE_LOCK on the
pending_tools_by_name queue. The original test was sequential and only
validated Python list semantics; this one validates the lock discipline.
4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED —
that function does not exist. Tests reload the module via sys.modules.pop
+ importlib.import_module instead.
Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass.
---------
Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: Brian Conklin <brian@dralth.com>
PR #22345 by @btorresgil authors commits as 'Brian Conklin
<brian@dralth.com>' (git config carries a different name/email than the
GitHub account). GitHub's commit-author mapping correctly attributes these
commits to @btorresgil based on the public-key registration, but Hermes'
release attribution audit reads the raw commit email, not the GitHub
mapping. Without this AUTHOR_MAP entry, salvaging #22345 would fail
`scripts/contributor_audit.py` strict mode at release time.
Prerequisite for the langfuse trace fix salvage that cherry-picks
@btorresgil's commits onto current main.