Compare commits

...

232 commits

Author SHA1 Message Date
Siddharth Balyan
3e61703b08
fix(nix): use --rebuild in fix-lockfiles to bypass cached FOD store paths (#15444)
* fix(nix): use --rebuild in fix-lockfiles to bypass cached FOD store paths

fix-lockfiles checked npm lockfile hashes by running
`nix build .#<attr>.npmDeps`, but fetchNpmDeps is a fixed-output
derivation — if the old store path exists locally, Nix returns it from
cache without re-fetching. This caused the script to report "ok" even
when hashes were stale, while CI (with no cache) failed with a hash
mismatch.

Adding --rebuild forces Nix to re-derive and verify the output hash
against the declared one, catching staleness regardless of local cache
state. Also updates the tui and web npm deps hashes that were stale.

* fix(nix): regenerate ui-tui lockfile to add missing @emnapi entries

npm ci was failing because @emnapi/core and @emnapi/runtime were
missing from ui-tui/package-lock.json despite being required as peer
deps by @napi-rs/wasm-runtime (via @rolldown/binding-wasm32-wasi).

Running npm install --package-lock-only adds the missing entries.
The npmDepsHash reverts to its previous value since fetchNpmDeps was
already fetching these packages as transitive dependencies.
2026-04-25 06:14:32 +05:30
Teknium
05d8f11085
fix(/model): show provider-enforced context length, not raw models.dev (#15438)
/model gpt-5.5 on openai-codex showed 'Context: 1,050,000 tokens' because
the display block used ModelInfo.context_window directly from models.dev.
Codex OAuth actually enforces 272K for the same slug, and the agent's
compressor already runs at 272K via get_model_context_length() — so the
banner + real context budget said 272K while /model lied with 1M.

Route the display context through a new resolve_display_context_length()
helper that always prefers agent.model_metadata.get_model_context_length
(which knows about Codex OAuth, Copilot, Nous caps) and only falls back
to models.dev when that returns nothing.

Fix applied to all 3 /model display sites:
  cli.py _handle_model_switch
  gateway/run.py picker on_model_selected callback
  gateway/run.py text-fallback confirmation

Reported by @emilstridell (Telegram, April 2026).
2026-04-24 17:21:38 -07:00
Teknium
13038dc747 fix(skills): ship google-workspace deps as [google] extra; make setup.py 3.9-parseable
Closes #13626.

Two follow-ups on top of the _hermes_home helper from @jerome-benoit's #12729:

1. Declare a [google] optional extra in pyproject.toml
   (google-api-python-client, google-auth-oauthlib, google-auth-httplib2) and
   include it in [all]. Packagers (Nix flake, Homebrew) now ship the deps by
   default, so `setup.py --check` does not need to shell out to pip at
   runtime — the imports succeed and install_deps() is never reached.
   This fixes the Nix breakage where pip/ensurepip are stripped.

2. Add `from __future__ import annotations` to setup.py so the PEP 604
   `str | None` annotation parses on Python 3.9 (macOS system python).
   Previously system python3 SyntaxError'd before any code ran.

install_deps() error message now also points users at the extra instead of
just the raw pip command.
2026-04-24 16:45:27 -07:00
Teknium
629e108ee2 chore(release): map jerome.benoit@sap.com to jerome-benoit 2026-04-24 16:45:27 -07:00
Jérôme Benoit
c34d3f4807 fix(skills): factor HERMES_HOME resolution into shared _hermes_home helper
The three google-workspace scripts (setup.py, google_api.py, gws_bridge.py)
each had their own way of resolving HERMES_HOME:

- setup.py imported hermes_constants (crashes outside Hermes process)
- google_api.py used os.getenv inline (no strip, no empty handling)
- gws_bridge.py defined its own local get_hermes_home() (duplicate)

Extract the common logic into _hermes_home.py which:
- Delegates to hermes_constants when available (profile support, etc.)
- Falls back to os.getenv with .strip() + empty-as-unset handling
- Provides display_hermes_home() with ~/ shortening for profiles

All three scripts now import from _hermes_home instead of duplicating.

7 regression tests cover the fallback path: env var override, default
~/.hermes, empty env var, display shortening, profile paths, and
custom non-home paths.

Closes #12722
2026-04-24 16:45:27 -07:00
Teknium
f14264c438 chore(release): map simbamax99@gmail.com to @simbam99 2026-04-24 16:42:31 -07:00
simbam99
19a3e2ce8e fix(gateway): follow compression continuations during /resume 2026-04-24 16:42:31 -07:00
Teknium
d58b305adf refactor(deepseek-reasoning): consolidate detection into helpers + regression tests
Extracts _needs_kimi_tool_reasoning() for symmetry with the existing
_needs_deepseek_tool_reasoning() helper, so _copy_reasoning_content_for_api
uses the same detection logic as _build_assistant_message. Future changes
to either provider's signals now only touch one function.

Adds tests/run_agent/test_deepseek_reasoning_content_echo.py covering:
- All 3 DeepSeek detection signals (provider, model, host)
- Poisoned history replay (empty string fallback)
- Plain assistant turns NOT padded
- Explicit reasoning_content preserved
- Reasoning field promoted to reasoning_content
- Existing Kimi/Moonshot detection intact
- Non-thinking providers left alone

21 tests, all pass.
2026-04-24 16:38:29 -07:00
Teknium
e93cc934c7 chore(release): map chenzeshi@live.com -> chen1749144759 in AUTHOR_MAP 2026-04-24 16:38:29 -07:00
chen1749144759
93a2d6b307 fix: add DeepSeek reasoning_content echo for tool-call messages
DeepSeek V4 thinking mode requires reasoning_content on every
assistant message that includes tool_calls. When this field is
missing from persisted history, replaying the session causes
HTTP 400: 'The reasoning_content in the thinking mode must be
passed back to the API.'

Two-part fix (refs #15250):

1. _copy_reasoning_content_for_api: Merge the Kimi-only and
   DeepSeek detection into a single needs_tool_reasoning_echo
   check. This handles already-poisoned persisted sessions by
   injecting an empty reasoning_content on replay.

2. _build_assistant_message: Store reasoning_content='' on new
   DeepSeek tool-call messages at creation time, preventing
   future session poisoning at the source.

Additional fix:
3. _handle_max_iterations: Add missing call to
   _copy_reasoning_content_for_api in the max-iterations flush
   path (previously only main loop and flush_memories had it).

Detection covers:
- provider == 'deepseek'
- model name containing 'deepseek' (case-insensitive)
- base URL matching api.deepseek.com (for custom provider)
2026-04-24 16:38:29 -07:00
Teknium
4fade39c90 chore(release): map benjaminsehl noreply email in AUTHOR_MAP 2026-04-24 16:04:37 -07:00
Benjamin Sehl
f731c2c2bd fix(gateway/bluebubbles): align iMessage delivery with non-editable UX 2026-04-24 16:04:37 -07:00
Brian D. Evans
00c3d848d8 fix(memory): skip external-provider sync on interrupted turns (#15218)
``run_conversation`` was calling ``memory_manager.sync_all(
original_user_message, final_response)`` at the end of every turn
where both args were present.  That gate didn't consider the
``interrupted`` local flag, so an external memory backend received
partial assistant output, aborted tool chains, or mid-stream resets as
durable conversational truth.  Downstream recall then treated the
not-yet-real state as if the user had seen it complete, poisoning the
trust boundary between "what the user took away from the turn" and
"what Hermes was in the middle of producing when the interrupt hit".

Extracted the inline sync block into a new private method
``AIAgent._sync_external_memory_for_turn(original_user_message,
final_response, interrupted)`` so the interrupt guard is a single
visible check at the top of the method instead of hidden in a
boolean-and at the call site.  That also gives tests a clean seam to
assert on — the pre-fix layout buried the logic inside the 3,000-line
``run_conversation`` function where no focused test could reach it.

The new method encodes three independent skip conditions:

  1. ``interrupted`` → skip entirely (the #15218 fix).  Applies even
     when ``final_response`` and ``original_user_message`` happen to
     be populated — an interrupt may have landed between a streamed
     reply and the next tool call, so the strings on disk are not
     actually the turn the user took away.
  2. No memory manager / no final_response / no user message →
     preserve existing skip behaviour (nothing new for providerless
     sessions, system-initiated refreshes, tool-only turns that never
     resolved, etc.).
  3. Sync_all / queue_prefetch_all exceptions → swallow.  External
     memory providers are strictly best-effort; a misconfigured or
     offline backend must never block the user from seeing their
     response.

The prefetch side-effect is gated on the same interrupt flag: the
user's next message is almost certainly a retry of the same intent,
and a prefetch keyed on the interrupted turn would fire against stale
context.

### Tests (16 new, all passing on py3.11 venv)

``tests/run_agent/test_memory_sync_interrupted.py`` exercises the
helper directly on a bare ``AIAgent`` (``__new__`` pattern that the
interrupt-propagation tests already use).  Coverage:

- Interrupted turn with full-looking response → no sync (the fix)
- Interrupted turn with long assistant output → no sync (the interrupt
  could have landed mid-stream; strings-on-disk lie)
- Normal completed turn → sync_all + queue_prefetch_all both called
  with the right args (regression guard for the positive path)
- No final_response / no user_message / no memory manager → existing
  pre-fix skip paths still apply
- sync_all raises → exception swallowed, prefetch still attempted
- queue_prefetch_all raises → exception swallowed after sync succeeded
- 8-case parametrised matrix across (interrupted × final_response ×
  original_user_message) asserts sync fires iff interrupted=False AND
  both strings are non-empty

Closes #15218

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:30:18 -07:00
Yukipukii1
fd10463069 fix(env): safely quote ~/ subpaths in wrapped cd commands 2026-04-24 15:25:12 -07:00
sprmn24
c599a41b84 fix(auth): preserve corrupt auth.json and warn instead of silently resetting
_load_auth_store() caught all parse/read exceptions and silently
returned an empty store, making corruption look like a logout with
no diagnostic information and no way to recover the original file.

Now copies the corrupt file to auth.json.corrupt before resetting,
and logs a warning with the exception and backup path.
2026-04-24 15:22:44 -07:00
Teknium
c7d62b3fe3 chore(release): map ebukau84@gmail.com -> UgwujaGeorge in AUTHOR_MAP 2026-04-24 15:22:19 -07:00
Teknium
36d68bcb82 fix(api-server): persist incomplete snapshot on asyncio.CancelledError too
Extends PR #15171 to also cover the server-side cancellation path (aiohttp
shutdown, request-level timeout) — previously only ConnectionResetError
triggered the incomplete-snapshot write, so cancellations left the store
stuck at the in_progress snapshot written on response.created.

Factors the incomplete-snapshot build into a _persist_incomplete_if_needed()
helper called from both the ConnectionResetError and CancelledError
branches; the CancelledError handler re-raises so cooperative cancellation
semantics are preserved.

Adds two regression tests that drive _write_sse_responses directly (the
TestClient disconnect path races the server handler, which makes the
end-to-end assertion flaky).
2026-04-24 15:22:19 -07:00
UgwujaGeorge
a29bad2a3c fix(api-server): persist response snapshot on client disconnect when store=True 2026-04-24 15:22:19 -07:00
sprmn24
7957da7a1d fix(web_server): hold _oauth_sessions_lock during PKCE session state writes
_submit_anthropic_pkce() retrieved sess under _oauth_sessions_lock but
wrote back to sess["status"] and sess["error_message"] outside the lock.
A concurrent session GC or cancel could race with these writes, producing
inconsistent session state.

Wrap all 4 sess write sites in _oauth_sessions_lock:
- network exception path (Token exchange failed)
- missing access_token path
- credential save failure path
- success path (approved)
2026-04-24 15:22:04 -07:00
Cyprian Kowalczyk
fd3864d8bd feat(cli): wrap /compress in _busy_command to block input during compression
Before this, typing during /compress was accepted by the classic CLI
prompt and landed in the next prompt after compression finished,
effectively consuming a keystroke for a prompt that was about to be
replaced. Wrapping the body in self._busy_command('Compressing
context...') blocks input rendering for the duration, matching the
pattern /skills install and other slow commands already use.

Salvages the useful part of #10303 (@iRonin). The `_compressing` flag
added to run_agent.py in the original PR was dead code (set in 3 spots,
read nowhere — not by cli.py, not by run_agent.py, not by the Ink TUI
which doesn't use _busy_command at all) and was dropped.
2026-04-24 15:21:22 -07:00
Yukipukii1
8ea389a7f8 fix(gateway/config): coerce quoted boolean values in config parsing 2026-04-24 15:20:05 -07:00
knockyai
3e6c108565 fix(gateway): honor queue mode in runner PRIORITY interrupt path
When display.busy_input_mode is 'queue', the runner-level PRIORITY block
in _handle_message was still calling running_agent.interrupt() for every
text follow-up to an active session. The adapter-level busy handler
already honors queue mode (commit 9d147f7fd), but this runner-level path
was an unconditional interrupt regardless of config.

Adds a queue-mode branch that queues the follow-up via
_queue_or_replace_pending_event() and returns without interrupting.

Salvages the useful part of #12070 (@knockyai). The config fan-out to
per-platform extra was redundant — runner already loads busy_input_mode
directly via _load_busy_input_mode().
2026-04-24 15:18:34 -07:00
Teknium
e3a1a9c24d
chore(release): map julia@alexland.us -> alexg0bot in AUTHOR_MAP (#15384) 2026-04-24 15:18:09 -07:00
Teknium
e3697e20a6 chore(release): map iRonin personal email to GitHub login 2026-04-24 15:17:09 -07:00
Teknium
ed91b79b7e fix(cli): keep Ctrl+D no-op when only attachments pending
Follow-up to @iRonin's Ctrl+D EOF fix. If the input text is empty but
the user has pending attached images, do nothing rather than exiting —
otherwise a stray Ctrl+D silently discards the attachments.
2026-04-24 15:17:09 -07:00
CK iRonin.IT
08d5c9c539 fix: Ctrl+D deletes char under cursor, only exits on empty input (bash/zsh behaviour) 2026-04-24 15:17:09 -07:00
Julia Bennet
1dcf79a864 feat: add slash command for busy input mode 2026-04-24 15:15:26 -07:00
teknium1
2de8a7a229 fix(skills): drop raw_content to avoid doubling skill payload
skill_view response went to the model verbatim; duplicating the SKILL.md
body as raw_content on every tool call added token cost with no agent-facing
benefit. Remove the field and update tests to assert on content only.

The slash/preload caller (agent/skill_commands.py) already falls back to
content when raw_content is absent, and it calls skill_view(preprocess=False)
anyway, so content is already unrendered on that path.
2026-04-24 15:15:07 -07:00
helix4u
ead66f0c92 fix(skills): apply inline shell in skill_view 2026-04-24 15:15:07 -07:00
Allard
0bcbc9e316 docs(faq): Update docs on backups
- update faq answer with new `backup` command in release 0.9.0
- move profile export section together with backup section so related information can be read more easily
- add table comparison between `profile export` and `backup` to assist users if understanding the nuances between both
2026-04-24 15:14:08 -07:00
Teknium
2d444fc84d
fix(run_agent): handle unescaped control chars in tool_call arguments (#15356)
Extends _repair_tool_call_arguments() to cover the most common local-model
JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and
newlines inside JSON string values (memory save summaries, file contents,
etc.). Previously fell through to '{}' replacement, losing the call.

Adds two repair passes:
  - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form
  - Pass 4: escape 0x00-0x1F control chars inside string values, then retry

Ports the core utility from #12068 / PR #12093 without the larger plumbing
change (that PR also replaced json.loads at 8 call sites; current main's
_repair_tool_call_arguments is already the single chokepoint, so the
upgrade happens transparently for every existing caller).

Credit: @truenorth-lj for the original utility design.

4 new regression tests covering literal newlines, tabs, re-serialisation
to strict=True-valid output, and the trailing-comma + control-char
combination case.
2026-04-24 15:06:41 -07:00
Teknium
bb53d79d26 chore(release): map q19dcp@gmail.com -> aj-nt in AUTHOR_MAP 2026-04-24 15:03:07 -07:00
AJ
17fc84c256 fix: repair malformed tool call args in streaming assembly before flagging as truncated
When the streaming path (chat completions) assembled tool call deltas and
detected malformed JSON arguments, it set has_truncated_tool_args=True but
passed the broken args through unchanged. This triggered the truncation
handler which returned a partial result and killed the session (/new required).

_many_ malformations are repairable: trailing commas, unclosed brackets,
Python None, empty strings. _repair_tool_call_arguments() already existed
for the pre-API-request path but wasn't called during streaming assembly.

Now when JSON parsing fails during streaming assembly, we attempt repair
via _repair_tool_call_arguments() before flagging as truncated. If repair
succeeds (returns valid JSON), the tool call proceeds normally. Only truly
unrepairable args fall through to the truncation handler.

This prevents the most common session-killing failure mode for models like
GLM-5.1 that produce trailing commas or unclosed brackets.

Tests: 12 new streaming assembly repair tests, all 29 existing repair
tests still passing.
2026-04-24 15:03:07 -07:00
Teknium
b7c1d77e55 fix(dashboard): remove unimplemented 'block' busy_input_mode option
The web UI schema advertised 'block' as a busy_input_mode choice, but
no implementation ever existed — the gateway and CLI both silently
collapsed 'block' (and anything other than 'queue') to 'interrupt'.
Users who picked 'block' in the dashboard got interrupts anyway.

Drop 'block' from the select options. The two supported modes are
'interrupt' (default) and 'queue'.
2026-04-24 15:01:38 -07:00
luyao618
7a192b124e fix(run_agent): repair corrupted tool_call arguments before sending to provider
When a session is split by context compression mid-tool-call, an assistant
message may end up with truncated/invalid JSON in tool_calls[*].function.arguments.
On the next turn this is replayed verbatim and providers reject the entire request
with HTTP 400 invalid_tool_call_format, bricking the conversation in a loop that
cannot recover without manual session quarantine.

This patch adds a defensive sanitizer that runs immediately before
client.chat.completions.create() in AIAgent.run_conversation():

- Validates each assistant tool_calls[*].function.arguments via json.loads
- Replaces invalid/empty arguments with '{}'
- Injects a synthetic tool response (or prepends a marker to the existing one)
  so downstream messages keep valid tool_call_id pairing
- Logs each repair with session_id / message_index / preview for observability

Defense in depth: corruption can originate from compression splits, manual edits,
or plugin bugs. Sanitizing at the send chokepoint catches all sources.

Adds 7 unit tests covering: truncated JSON, empty string, None, non-string args,
existing matching tool response (no duplicate injection), non-assistant messages
ignored, multiple repairs.

Fixes #15236
2026-04-24 14:55:47 -07:00
Teknium
4093ee9c62
fix(codex): detect leaked tool-call text in assistant content (#15347)
gpt-5.x on the Codex Responses API sometimes degenerates and emits
Harmony-style `to=functions.<name> {json}` serialization as plain
assistant-message text instead of a structured `function_call` item.
The intent never makes it into `response.output` as a function_call,
so `tool_calls` is empty and `_normalize_codex_response()` returns
the leaked text as the final content. Downstream (e.g. delegate_task),
this surfaces as a confident-looking summary with `tool_trace: []`
because no tools actually ran — the Taiwan-embassy-email bug report.

Detect the pattern, scrub the content, and return finish_reason=
'incomplete' so the existing Codex-incomplete continuation path
(run_agent.py:11331, 3 retries) gets a chance to re-elicit a proper
function_call item. Encrypted reasoning items are preserved so the
model keeps its chain-of-thought on the retry.

Regression tests: leaked text triggers incomplete, real tool calls
alongside leak-looking text are preserved, clean responses pass
through unchanged.

Reported on Discord (gpt-5.4 / openai-codex).
2026-04-24 14:39:59 -07:00
helix4u
6a957a74bc fix(memory): add write origin metadata 2026-04-24 14:37:55 -07:00
Teknium
14b27bb68c chore(release): map @tochukwuada in AUTHOR_MAP
Contributor email for PR #15161 salvage (debthemelon
<thomasgeorgevii09@gmail.com>).
2026-04-24 14:32:21 -07:00
Teknium
ef9355455b test: regression coverage for checkpoint dedup and inf/nan coercion
Covers the two bugs salvaged from PR #15161:

- test_batch_runner_checkpoint: TestFinalCheckpointNoDuplicates asserts
  the final aggregated completed_prompts list has no duplicate indices,
  and keeps a sanity anchor test documenting the pre-fix pattern so a
  future refactor that re-introduces it is caught immediately.

- test_model_tools: TestCoerceNumberInfNan asserts _coerce_number
  returns the original string for inf/-inf/nan/Infinity inputs and that
  the result round-trips through strict (allow_nan=False) json.dumps.
2026-04-24 14:32:21 -07:00
debthemelon
dbdefa43c8 fix: eliminate duplicate checkpoint entries and JSON-unsafe coercion
batch_runner: completed_prompts_set is already fully populated by the
time the aggregation loop runs (incremental updates happen at result
collection time), so the subsequent extend() call re-added every
completed prompt index a second time. Removed the redundant variable
and extend, and write sorted(completed_prompts_set) directly to the
final checkpoint instead.

model_tools: _coerce_number returned Python float('inf')/float('nan')
for inf/nan strings rather than the original string. json.dumps raises
ValueError for these values, so any tool call where the model emitted
"inf" or "nan" for a numeric parameter would crash at serialization.
Changed the guard to return the original string, matching the
function's documented "returns original string on failure" contract.
2026-04-24 14:32:21 -07:00
Teknium
db9d6375fb
feat(models): add openai/gpt-5.5 and gpt-5.5-pro to OpenRouter + Nous Portal (#15343)
Replaces gpt-5.4 / gpt-5.4-pro entries in the OpenRouter fallback snapshot
and the Nous Portal curated list. Other aggregators (Vercel AI Gateway)
and provider-native lists are unchanged.
2026-04-24 14:31:47 -07:00
helix4u
8a2506af43 fix(aux): surface auxiliary failures in UI 2026-04-24 14:31:21 -07:00
helix4u
e7590f92a2 fix(telegram): honor no_proxy for explicit proxy setup 2026-04-24 14:31:04 -07:00
brooklyn!
a5129c72ef
Merge pull request #15337 from NousResearch/bb/tui-kawaii-default-off
fix(tui): keep default personality neutral
2026-04-24 16:23:00 -05:00
Brooklyn Nicholson
53fc10fc9a fix(tui): keep default personality neutral 2026-04-24 16:19:23 -05:00
brooklyn!
93ddff53e3
Merge pull request #15321 from NousResearch/bb/tui-inline-diff-tooltrail-order
fix(tui): render tool trail before anchored inline diffs
2026-04-24 15:20:42 -05:00
Brooklyn Nicholson
de596aca1c fix(tui): render tool trail before anchored inline diffs
Inline diff segments were anchored relative to assistant narration, but the
turn details pane still rendered after streamSegments. On completion that put
the diff before the tool telemetry that produced it. When a turn has anchored
diff segments, commit the accumulated thinking/tool trail as a pre-diff trail
message, then render the diff and final summary.
2026-04-24 15:07:02 -05:00
brooklyn!
6f1eed3968
Merge pull request #15274 from NousResearch/bb/tui-null-config-guard
fix(tui): tolerate + warn on null sections in config.yaml
2026-04-24 13:02:12 -05:00
Brooklyn Nicholson
e3940f9807 fix(tui): guard personality overlay when personalities is null
TUI auto-resolves `display.personality` at session init, unlike the base CLI.
If config contains `agent.personalities: null`, `_resolve_personality_prompt`
called `.get()` on None and failed before model/provider selection.
Normalize null personalities to `{}` and surface a targeted config warning.
2026-04-24 12:57:51 -05:00
Brooklyn Nicholson
bfa60234c8 feat(tui): warn on bare null sections in config.yaml
Tolerating null top-level keys silently drops user settings (e.g.
`agent.system_prompt` next to a bare `agent:` line is gone). Probe at
session create, log via `logger.warning`, and surface in the boot info
under `config_warning` — rendered in the TUI feed alongside the existing
`credential_warning` banner.
2026-04-24 12:49:02 -05:00
Brooklyn Nicholson
fd9b692d33 fix(tui): tolerate null top-level sections in config.yaml
YAML parses bare keys like `agent:` or `display:` as None. `dict.get(key, {})`
returns that None instead of the default (defaults only fire on missing keys),
so every `cfg.get("agent", {}).get(...)` chain in tui_gateway/server.py
crashed agent init with `'NoneType' object has no attribute 'get'`.

Guard all 21 sites with `(cfg.get(X) or {})`. Regression test covers the
null-section init path reported on Twitter against the new TUI.
2026-04-24 12:43:09 -05:00
Austin Pickett
c61547c067
Merge pull request #14890 from NousResearch/bb/tui-web-chat-unified
feat(web): dashboard Chat tab — xterm.js + JSON-RPC sidecar (supersedes #12710 + #13379)
2026-04-24 10:35:43 -07:00
brooklyn!
7f0f67d5f7
Merge pull request #15266 from NousResearch/bb/fix-tui-section-toggle
fix(tui): chevrons re-toggle even when section default is expanded
2026-04-24 12:24:27 -05:00
Brooklyn Nicholson
f5e2a77a80 fix(tui): chevrons re-toggle even when section default is expanded
Recovers the manual click on the details accordion: with #14968's new
SECTION_DEFAULTS (thinking/tools start `expanded`), every panel render
was OR-ing the local open toggle against `visible.X === 'expanded'`.
That pinned `open=true` for the default-expanded sections, so clicking
the chevron flipped the local state but the panel never collapsed.

Local toggle is now the sole source of truth at render time; the
useState init still seeds from the resolved visibility (so first paint
is correct) and the existing useEffect still re-syncs when the user
mutates visibility at runtime via `/details`.

Same OR-lock cleared inside SubagentAccordion (`showChildren ||
openX`) — pre-existing but the same shape, so expand-all on the
spawn tree no longer makes inner sections un-collapsible either.
2026-04-24 12:22:20 -05:00
Austin Pickett
850fac14e3 chore: address copilot comments 2026-04-24 12:51:04 -04:00
Austin Pickett
5500b51800 chore: fix lint 2026-04-24 12:32:10 -04:00
Austin Pickett
63975aa75b fix: mobile chat in new layout 2026-04-24 12:07:46 -04:00
Teknium
62c14d5513 refactor(gateway): extract WhatsApp identity helpers into shared module
Follow-up to the canonical-identity session-key fix: pull the
JID/LID normalize/expand/canonical helpers into gateway/whatsapp_identity.py
instead of living in two places. gateway/session.py (session-key build) and
gateway/run.py (authorisation allowlist) now both import from the shared
module, so the two resolution paths can't drift apart.

Also switches the auth path from module-level _hermes_home (cached at
import time) to dynamic get_hermes_home() lookup, which matches the
session-key path and correctly reflects HERMES_HOME env overrides. The
lone test that monkeypatched gateway.run._hermes_home for the WhatsApp
auth path is updated to set HERMES_HOME env var instead; all other
tests that monkeypatch _hermes_home for unrelated paths (update,
restart drain, shutdown marker, etc.) still work — the module-level
_hermes_home is untouched.
2026-04-24 07:55:55 -07:00
Keira Voss
10deb1b87d fix(gateway): canonicalize WhatsApp identity in session keys
Hermes' WhatsApp bridge routinely surfaces the same person under either
a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid),
and may flip between the two for a single human within the same
conversation. Before this change, build_session_key used the raw
identifier verbatim, so the bridge reshuffling an alias form produced
two distinct session keys for the same person — in two places:

  1. DM chat_id — a user's DM sessions split in half, transcripts and
     per-sender state diverge.
  2. Group participant_id (with group_sessions_per_user enabled) — a
     member's per-user session inside a group splits in half for the
     same reason.

Add a canonicalizer that walks the bridge's lid-mapping-*.json files
and picks the shortest/numeric-preferred alias as the stable identity.
build_session_key now routes both the DM chat_id and the group
participant_id through this helper when the platform is WhatsApp.
All other platforms and chat types are untouched.

Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier
as public helpers. Plugins that need per-sender behaviour (role-based
routing, per-contact authorization, policy gating) need the same
identity resolution Hermes uses internally; without a public helper,
each plugin would have to re-implement the walker against the bridge's
internal on-disk format. Keeping this alongside build_session_key
makes it authoritative and one refactor away if the bridge ever
changes shape.

_expand_whatsapp_aliases stays private — it's an implementation detail
of how the mapping files are walked, not a contract callers should
depend on.
2026-04-24 07:55:55 -07:00
emozilla
f49afd3122 feat(web): add /api/pty WebSocket bridge to embed TUI in dashboard
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.

Architecture:

    browser <Terminal> (xterm.js)
           │  onData ───► ws.send(keystrokes)
           │  onResize ► ws.send('\x1b[RESIZE:cols;rows]')
           │  write   ◄── ws.onmessage (PTY bytes)
           ▼
    FastAPI /api/pty (token-gated, loopback-only)
           ▼
    PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent

Components
----------

hermes_cli/pty_bridge.py
  Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
  master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
  inherently byte-oriented and UTF-8 boundaries may land mid-read),
  non-blocking select-based reads, TIOCSWINSZ resize, idempotent
  SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
  is WSL-supported only).

hermes_cli/web_server.py
  @app.websocket("/api/pty") endpoint gated by the existing
  _SESSION_TOKEN (via ?token= query param since browsers can't set
  Authorization on WS upgrades). Loopback-only enforcement. Reader task
  uses run_in_executor to pump PTY bytes without blocking the event
  loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
  before forwarding to the PTY. The endpoint resolves the TUI argv
  through a _resolve_chat_argv hook so tests can inject fake commands
  without building the real TUI.

Tests
-----

tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.

tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.

96 tests pass under scripts/run_tests.sh.

(cherry picked from commit 29b337bca7)

feat(web): add Chat tab with xterm.js terminal + Sessions resume button

(cherry picked from commit 3d21aee8 by emozilla, conflicts resolved
 against current main: BUILTIN_ROUTES table + plugin slot layout)

fix(tui): replace OSC 52 jargon in /copy confirmation

When the user ran /copy successfully, Ink confirmed with:

  sent OSC52 copy sequence (terminal support required)

That reads like a protocol spec to everyone who isn't a terminal
implementer. The caveat was a historical artifact — OSC 52 wasn't
universally supported when this message was written, so the TUI
honestly couldn't guarantee the copy had landed anywhere.

Today every modern terminal (including the dashboard's embedded
xterm.js) handles OSC 52 reliably. Say what the user actually wants
to know — that it copied, and how much — matching the message the
TUI already uses for selection copy:

  copied 1482 chars

(cherry picked from commit a0701b1d5a)

docs: document the dashboard Chat tab

AGENTS.md — new subsection under TUI Architecture explaining that the
dashboard embeds the real hermes --tui rather than rewriting it,
with pointers to the pty_bridge + WebSocket endpoint and the rule
'never add a parallel chat surface in React.'

website/docs/user-guide/features/web-dashboard.md — user-facing Chat
section inside the existing Web Dashboard page, covering how it works
(WebSocket + PTY + xterm.js), the Sessions-page resume flow, and
prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows).

(cherry picked from commit 2c2e32cc45)

feat(tui-gateway): transport-aware dispatch + WebSocket sidecar

Decouples the JSON-RPC dispatcher from its I/O sink so the same handler
surface can drive multiple transports concurrently. The PTY chat tab
already speaks to the TUI binary as bytes — this adds a structured
event channel alongside it for dashboard-side React widgets that need
typed events (tool.start/complete, model picker state, slash catalog)
that PTY can't surface.

- `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding
  + module-level `StdioTransport` fallback. The stdio stream resolves
  through a lambda so existing tests that monkey-patch `_real_stdout`
  keep passing without modification.
- `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI
  endpoint mounting lives in hermes_cli/web_server.py.
- `tui_gateway/server.py`:
  - `write_json` routes via session transport (for async events) →
    contextvar transport (for in-request writes) → stdio fallback.
  - `dispatch(req, transport=None)` binds the transport for the request
    lifetime and propagates it to pool workers via `contextvars.copy_context`
    so async handlers don't lose their sink.
  - `_init_session` and the manual-session create path stash the
    request's transport so out-of-band events (subagent.complete, etc.)
    fan out to the right peer.

`tui_gateway.entry` (Ink's stdio handshake) is unchanged externally —
it falls through every precedence step into the stdio fallback, byte-
identical to the previous behaviour.

feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal

Composes the two transports into a single Chat tab:

  ┌─────────────────────────────────────────┬──────────────┐
  │  xterm.js / PTY  (emozilla #13379)      │ ChatSidebar  │
  │  the literal hermes --tui process       │  /api/ws     │
  └─────────────────────────────────────────┴──────────────┘
        terminal bytes                          structured events

The terminal pane stays the canonical chat surface — full TUI fidelity,
slash commands, model picker, mouse, skin engine, wide chars all paint
inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket
to the same gateway and renders metadata that PTY can't surface to
React chrome:

  • model + provider badge with connection state (click → switch)
  • running tool-call list (driven by tool.start / tool.progress /
    tool.complete events)
  • model picker dialog (gateway-driven, reuses ModelPickerDialog)

The sidecar is best-effort. If the WS can't connect (older gateway,
network hiccup, missing token) the terminal pane keeps working
unimpaired — sidebar just shows the connection-state badge in the
appropriate tone.

- `web/src/components/ChatSidebar.tsx` — new component (~270 lines).
  Owns its GatewayClient, drives the model picker through
  `slash.exec`, fans tool events into a capped tool list.
- `web/src/pages/ChatPage.tsx` — split layout: terminal pane
  (`flex-1`) + sidebar (`w-80`, `lg+` only).
- `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback
  guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`.

Co-authored-by: emozilla <emozilla@nousresearch.com>

refactor(web): /clean pass on ChatSidebar + ChatPage lint debt

- ChatSidebar: lift gw out of useRef into a useMemo derived from a
  reconnect counter. React 19's react-hooks/refs and react-hooks/
  set-state-in-effect rules both fire when you touch a ref during
  render or call setState from inside a useEffect body. The
  counter-derived gw is the canonical pattern for "external resource
  that needs to be replaceable on user action" — re-creating the
  client comes from bumping `version`, the effect just wires + tears
  down. Drops the imperative `gwRef.current = …` reassign in
  reconnect, drops the truthy ref guard in JSX. modelLabel +
  banner inlined as derived locals (one-off useMemo was overkill).
- ChatPage: lazy-init the banner state from the missing-token check
  so the effect body doesn't have to setState on first run. Drops
  the unused react-hooks/exhaustive-deps eslint-disable. Adds a
  scoped no-control-regex disable on the SGR mouse parser regex
  (the \\x1b is intentional for xterm escape sequences).

All my-touched files now lint clean. Remaining warnings on web/
belong to pre-existing files this PR doesn't touch.

Verified: vitest 249/249, ui-tui eslint clean, web tsc clean,
python imports clean.

chore: uptick

fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary

The /api/pty endpoint spawns `hermes --tui` as a child process with its
own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in
the dashboard server with a separate _sessions dict. Tool events fire on
the child's gateway and never reach the WS sidecar, so the sidebar's
tool.start/progress/complete listeners always observed an empty list.

Drop the misleading list (and the now-orphaned ToolCall primitive),
keep model badge + connection state + model picker + error banner —
those work because they're sidecar-local concerns. Surfacing tool calls
in the sidebar requires cross-process forwarding (PTY child opens a
back-WS to the dashboard, gateway tees emits onto stdio + sidecar
transport) — proper feature for a follow-up.

feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast

The dashboard's /api/pty spawns hermes --tui as a child process; tool
events fire in the python tui_gateway grandchild and never crossed the
process boundary into the in-process WS sidecar — so the sidebar tool
list was always empty.

Cross-process forwarding:

- tui_gateway: TeeTransport (transport.py) + WsPublisherTransport
  (event_publisher.py, sync websockets client). entry.py installs the
  tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring
  every dispatcher emit to a back-WS without disturbing Ink's stdio
  handshake.

- hermes_cli/web_server.py: new /api/pub (publisher) + /api/events
  (subscriber) endpoints with a per-channel registry. /api/pty now
  accepts ?channel= and propagates the sidecar URL via env. start_server
  also stashes app.state.bound_port so the URL is constructable.

- web/src/pages/ChatPage.tsx: generates a channel UUID per mount,
  passes it to /api/pty and as a prop to ChatSidebar.

- web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans
  tool.start/progress/complete back into the ToolCall list. Restores
  the ToolCall primitive.

Tests: 4 new TestPtyWebSocket cases cover channel propagation,
broadcast fan-out, and missing-channel rejection (10 PTY tests pass,
120 web_server tests overall).

fix(web): address Copilot review on #14890

Five threads, all real:

- gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting
  the open handshake.  Server emits `gateway.ready` immediately after
  accept, so a listener attached after the open promise could race past
  the initial skin payload and lose it.

- ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber
  WS into the existing error banner.  4401/4403 (auth/loopback reject)
  surface as a "reload the page" message; mid-stream drops surface as
  "events feed disconnected" with the existing reconnect button.  Clean
  unmount closes (1000/1001) stay silent.

- web-dashboard.md: install hint was `pip install hermes-agent[web]` but
  ptyprocess lives in the `pty` extra, not `web`.  Switch to
  `hermes-agent[web,pty]` in both prerequisite blocks.

- AGENTS.md: previous "never add a parallel React chat surface" guidance
  was overbroad and contradicted this PR's sidebar.  Tightened to forbid
  re-implementing the transcript/composer/PTY terminal while explicitly
  allowing structured supporting widgets (sidebar / model picker /
  inspectors), matching the actual architecture.

- web/package-lock.json: regenerated cleanly so the wterm sibling
  workspace paths (extraneous machine-local entries) stop polluting CI.

Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean.

refactor(web): /clean pass on ChatSidebar events handler

Spotted in the round-2 review:

- Banner flashed on clean unmount: `ws.close()` from the effect cleanup
  fires `close` with code 1005, opened=true, neither 1000 nor 1001 —
  hit the "unexpected drop" branch.  Track `unmounting` in the effect
  scope and gate the banner through a `surface()` helper so cleanup
  closes stay silent.

- DRY the duplicated "events feed disconnected" string into a local
  const used by both the error and close handlers.

- Drop the `opened` flag (no longer needed once the unmount guard is
  the source of truth for "is this an expected close?").
2026-04-24 10:51:49 -04:00
Austin Pickett
1143f234e3
Merge pull request #14899 from NousResearch/feat/dashboard-layout
Feat/dashboard layout
2026-04-24 07:48:31 -07:00
Teknium
c4627f4933 chore(release): map Group G contributors in AUTHOR_MAP 2026-04-24 07:26:07 -07:00
bsgdigital
7c3e5706d8 fix(bedrock): Bedrock-aware _rebuild_anthropic_client helper on interrupt
Three interrupt-recovery sites in run_agent.py rebuilt self._anthropic_client
with build_anthropic_client(self._anthropic_api_key, ...) unconditionally.
When provider=bedrock + api_mode=anthropic_messages (AnthropicBedrock SDK
path), self._anthropic_api_key is the sentinel 'aws-sdk' — build_anthropic_client
doesn't accept that and the rebuild either crashed or produced a non-functional
client.

Extract a _rebuild_anthropic_client() helper that dispatches to
build_anthropic_bedrock_client(region) when provider='bedrock', falling back
to build_anthropic_client() for native Anthropic and other anthropic_messages
providers (MiniMax, Kimi, Alibaba, etc.). Three inline rebuild sites now call
the helper.

Partial salvage of #14680 by @bsgdigital — only the _rebuild_anthropic_client
helper. The normalize_model_name Bedrock-prefix piece was subsumed by #14664,
and the aux client aws_sdk branch was subsumed by #14770 (both in the same
salvage PR as this commit).
2026-04-24 07:26:07 -07:00
Andre Kurait
a9ccb03ccc fix(bedrock): evict cached boto3 client on stale-connection errors
## Problem

When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT
timeout, VPN flap, server-side TCP RST, proxy idle cull), the next
Converse call surfaces as one of:

  * botocore.exceptions.ConnectionClosedError / ReadTimeoutError /
    EndpointConnectionError / ConnectTimeoutError
  * urllib3.exceptions.ProtocolError
  * A bare AssertionError raised from inside urllib3 or botocore
    (internal connection-pool invariant check)

The agent loop retries the request 3x, but the cached boto3 client in
_bedrock_runtime_client_cache is reused across retries — so every
attempt hits the same dead connection pool and fails identically.
Only a process restart clears the cache and lets the user keep working.

The bare-AssertionError variant is particularly user-hostile because
str(AssertionError()) is an empty string, so the retry banner shows:

    ⚠️  API call failed: AssertionError
       📝 Error:

with no hint of what went wrong.

## Fix

Add two helpers to agent/bedrock_adapter.py:

  * is_stale_connection_error(exc) — classifies exceptions that
    indicate dead-client/dead-socket state. Matches botocore
    ConnectionError + HTTPClientError subtrees, urllib3
    ProtocolError / NewConnectionError, and AssertionError
    raised from a frame whose module name starts with urllib3.,
    botocore., or boto3.. Application-level AssertionErrors are
    intentionally excluded.

  * invalidate_runtime_client(region) — per-region counterpart to
    the existing reset_client_cache(). Evicts a single cached
    client so the next call rebuilds it (and its connection pool).

Wire both into the Converse call sites:

  * call_converse() / call_converse_stream() in
    bedrock_adapter.py (defense-in-depth for any future caller)
  * The two direct client.converse(**kwargs) /
    client.converse_stream(**kwargs) call sites in run_agent.py
    (the paths the agent loop actually uses)

On a stale-connection exception, the client is evicted and the
exception re-raised unchanged. The agent's existing retry loop then
builds a fresh client on the next attempt and recovers without
requiring a process restart.

## Tests

tests/agent/test_bedrock_adapter.py gets three new classes (14 tests):

  * TestInvalidateRuntimeClient — per-region eviction correctness;
    non-cached region returns False.
  * TestIsStaleConnectionError — classifies botocore
    ConnectionClosedError / EndpointConnectionError /
    ReadTimeoutError, urllib3 ProtocolError, library-internal
    AssertionError (both urllib3.* and botocore.* frames), and
    correctly ignores application-level AssertionError and
    unrelated exceptions (ValueError, KeyError).
  * TestCallConverseInvalidatesOnStaleError — end-to-end: stale
    error evicts the cached client, non-stale error (validation)
    leaves it alone, successful call leaves it cached.

All 116 tests in test_bedrock_adapter.py pass.

Signed-off-by: Andre Kurait <andrekurait@gmail.com>
2026-04-24 07:26:07 -07:00
Tranquil-Flow
7dc6eb9fbf fix(agent): handle aws_sdk auth type in resolve_provider_client
Bedrock's aws_sdk auth_type had no matching branch in
resolve_provider_client(), causing it to fall through to the
"unhandled auth_type" warning and return (None, None).  This broke
all auxiliary tasks (compression, memory, summarization) for Bedrock
users — the main conversation loop worked fine, but background
context management silently failed.

Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via
build_anthropic_bedrock_client(), using boto3's default credential
chain (IAM roles, SSO, env vars, instance metadata).  Default
auxiliary model is Haiku for cost efficiency.

Closes #13919
2026-04-24 07:26:07 -07:00
Andre Kurait
b290297d66 fix(bedrock): resolve context length via static table before custom-endpoint probe
## Problem

`get_model_context_length()` in `agent/model_metadata.py` had a resolution
order bug that caused every Bedrock model to fall back to the 128K default
context length instead of reaching the static Bedrock table (200K for
Claude, etc.).

The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in
`_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False.
The resolution order then ran the custom-endpoint probe (step 2) *before*
the Bedrock branch (step 4b), which:

  1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`).
  2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the
     bedrock-runtime URL (Bedrock doesn't serve this shape).
  3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the
     "probe-down" branch — never reaching the Bedrock static table.

Result: users on Bedrock saw 128K context for Claude models that
actually support 200K on Bedrock, causing premature auto-compression.

## Fix

Promote the Bedrock branch from step 4b to step 1b, so it runs *before*
the custom-endpoint probe at step 2. The static table in
`bedrock_adapter.py::get_bedrock_context_length()` is the authoritative
source for Bedrock (the ListFoundationModels API doesn't expose context
window sizes), so there's no reason to probe `/models` first.

The original step 4b is replaced with a one-line breadcrumb comment
pointing to the new location, to make the resolution-order docstring
accurate.

## Changes

- `agent/model_metadata.py`
  - Add step 1b: Bedrock static-table branch (unchanged predicate, moved).
  - Remove dead step 4b block, replace with breadcrumb comment.
  - Update resolution-order docstring to include step 1b.

- `tests/agent/test_model_metadata.py`
  - New `TestBedrockContextResolution` class (3 tests):
    - `test_bedrock_provider_returns_static_table_before_probe`:
      confirms `provider="bedrock"` hits the static table and does NOT
      call `fetch_endpoint_model_metadata` (regression guard).
    - `test_bedrock_url_without_provider_hint`: confirms the
      `bedrock-runtime.*.amazonaws.com` host match works without an
      explicit `provider=` hint.
    - `test_non_bedrock_url_still_probes`: confirms the probe still
      fires for genuinely-custom endpoints (no over-reach).

## Testing

  pytest tests/agent/test_model_metadata.py -q
  # 83 passed in 1.95s (3 new + 80 existing)

## Risk

Very low.

- Predicate is identical to the original step 4b — no behaviour change
  for non-Bedrock paths.
- Original step 4b was dead code for the user-facing case (always hit
  the 128K fallback first), so removing it cannot regress behaviour.
- Bedrock path now short-circuits before any network I/O — faster too.
- `ImportError` fall-through preserved so users without `boto3`
  installed are unaffected.

## Related

- This is a prerequisite for accurate context-window accounting on
  Bedrock — the fix for #14710 (stale-connection client eviction)
  depends on correct context sizing to know when to compress.

Signed-off-by: Andre Kurait <andrekurait@gmail.com>
2026-04-24 07:26:07 -07:00
Qi Ke
f2fba4f9a1 fix(anthropic): auto-detect Bedrock model IDs in normalize_model_name (#12295)
Bedrock model IDs use dots as namespace separators (anthropic.claude-opus-4-7,
us.anthropic.claude-sonnet-4-5-v1:0), not version separators.
normalize_model_name() was unconditionally converting all dots to hyphens,
producing invalid IDs that Bedrock rejects with HTTP 400/404.

This affected both the main agent loop (partially mitigated by
_anthropic_preserve_dots in run_agent.py) and all auxiliary client calls
(compression, session_search, vision, etc.) which go through
_AnthropicCompletionsAdapter and never pass preserve_dots=True.

Fix: add _is_bedrock_model_id() to detect Bedrock namespace prefixes
(anthropic., us., eu., ap., jp., global.) and skip dot-to-hyphen
conversion for these IDs regardless of the preserve_dots flag.
2026-04-24 07:26:07 -07:00
Teknium
fcc05284fc
fix(delegate): tool-activity-aware heartbeat stale detection (#13041) (#15183)
A child running a legitimately long-running tool (terminal command,
browser fetch, big file read) holds current_tool set and keeps
api_call_count frozen while the tool runs. The previous stale check
treated that as idle after 5 heartbeat cycles (~150s), stopped
touching the parent, and let the gateway kill the session.

Split the threshold in two:
- _HEARTBEAT_STALE_CYCLES_IDLE=5 (~150s)  — applied only when
  current_tool is None (child wedged between turns)
- _HEARTBEAT_STALE_CYCLES_IN_TOOL=20 (~600s) — applied when the child
  is inside a tool call

Stale counter also resets when current_tool changes (new tool =
progress). The hard child_timeout_seconds (default 600s) is still
the final cap, so genuinely stuck tools don't get to block forever.
2026-04-24 07:25:19 -07:00
Teknium
1840c6a57d
feat(spotify): wire setup wizard into 'hermes tools' + document cron usage (#15180)
A — 'hermes tools' activation now runs the full Spotify wizard.

Previously a user had to (1) toggle the Spotify toolset on in 'hermes
tools' AND (2) separately run 'hermes auth spotify' to actually use
it. The second step was a discovery gap — the docs mentioned it but
nothing in the TUI pointed users there.

Now toggling Spotify on calls login_spotify_command as a post_setup
hook. If the user has no client_id yet, the interactive wizard walks
them through Spotify app creation; if they do, it skips straight to
PKCE. Either way, one 'hermes tools' pass leaves Spotify toggled on
AND authenticated. SystemExit from the wizard (user abort) leaves the
toolset enabled and prints a 'run: hermes auth spotify' hint — it
does NOT fail the toolset toggle.

Dropped the TOOL_CATEGORIES env_vars list for Spotify. The wizard
handles HERMES_SPOTIFY_CLIENT_ID persistence itself, and asking users
to type env var names before the wizard fires was UX-backwards — the
point of the wizard is that they don't HAVE a client_id yet.

B — Docs page now covers cron + Spotify.

New 'Scheduling: Spotify + cron' section with two working examples
(morning playlist, wind-down pause) using the real 'hermes cron add'
CLI surface (verified via 'cron add --help'). Covers the active-device
gotcha, Premium gating, memory isolation, and links to the cron docs.

Also fixed a stale '9 Spotify tools' reference in the setup copy —
we consolidated to 7 tools in #15154.

Validation:
- scripts/run_tests.sh tests/hermes_cli/test_tools_config.py
    tests/hermes_cli/test_spotify_auth.py
    tests/tools/test_spotify_client.py
  → 54 passed
- website: node scripts/prebuild.mjs && npx docusaurus build
  → SUCCESS, no new warnings
2026-04-24 07:24:28 -07:00
Blind Dev
591aa159aa
feat: allow Telegram chat allowlists for groups and forums (#15027)
* feat: allow Telegram chat allowlists for groups and forums

* chore: map web3blind noreply email for release attribution

---------

Co-authored-by: web3blind <web3blind@users.noreply.github.com>
2026-04-24 07:23:14 -07:00
Austin Pickett
d3e56b9f39 chore: refac 2026-04-24 10:17:57 -04:00
Teknium
c6b734e24d chore(release): map Group B contributors in AUTHOR_MAP 2026-04-24 07:14:00 -07:00
Wooseong Kim
54146ae07c fix(aux): refresh cached auth after 401 2026-04-24 07:14:00 -07:00
Wooseong Kim
be6b83562d fix(aux): force anthropic oauth refresh after 401
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 07:14:00 -07:00
5park1e
e1106772d9 fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain
Bug 3 — Stale OAuth token not detected in 'hermes model':
- _model_flow_anthropic used 'has_creds = bool(existing_key)' which treats
  any non-empty token (including expired OAuth tokens) as valid.
- Added existing_is_stale_oauth check: if the only credential is an OAuth
  token (sk-ant- prefix) with no valid cc_creds fallback, mark it stale
  and force the re-auth menu instead of silently accepting a broken token.

Bug 4 — macOS Keychain credentials never read:
- Claude Code >=2.1.114 migrated from ~/.claude/.credentials.json to the
  macOS Keychain under service 'Claude Code-credentials'.
- Added _read_claude_code_credentials_from_keychain() using the 'security'
  CLI tool; read_claude_code_credentials() now tries Keychain first then
  falls back to JSON file.
- Non-Darwin platforms return None from Keychain read immediately.

Tests:
- tests/agent/test_anthropic_keychain.py: 11 cases covering Darwin-only
  guard, security command failures, JSON parsing, fallback priority.
- tests/hermes_cli/test_anthropic_model_flow_stale_oauth.py: 8 cases
  covering stale OAuth detection, API key passthrough, cc_creds fallback.

Refs: #12905
2026-04-24 07:14:00 -07:00
nightq
5383615db5 fix: recognize Claude Code OAuth tokens (cc- prefix) in _is_oauth_token
Fixes NousResearch/hermes-agent#9813

Root cause: _is_oauth_token() only recognized sk-ant-* and eyJ* patterns,
but Claude Code OAuth tokens from CLAUDE_CODE_OAUTH_TOKEN use cc- prefix
Fix: Add cc- prefix detection so these tokens route through Bearer auth
2026-04-24 07:14:00 -07:00
Maymun
56086e3fd7 fix(auth): write Anthropic OAuth token files atomically to prevent corruption 2026-04-24 07:14:00 -07:00
Teknium
8d12fb1e6b
refactor(spotify): convert to built-in bundled plugin under plugins/spotify (#15174)
Moves the Spotify integration from tools/ into plugins/spotify/,
matching the existing pattern established by plugins/image_gen/ for
third-party service integrations.

Why:
- tools/ should be reserved for foundational capabilities (terminal,
  read_file, web_search, etc.). tools/providers/ was a one-off
  directory created solely for spotify_client.py.
- plugins/ is already the home for image_gen backends, memory
  providers, context engines, and standalone hook-based plugins.
  Spotify is a third-party service integration and belongs alongside
  those, not in tools/.
- Future service integrations (eventually: Deezer, Apple Music, etc.)
  now have a pattern to copy.

Changes:
- tools/spotify_tool.py → plugins/spotify/tools.py (handlers + schemas)
- tools/providers/spotify_client.py → plugins/spotify/client.py
- tools/providers/ removed (was only used for Spotify)
- New plugins/spotify/__init__.py with register(ctx) calling
  ctx.register_tool() × 7. The handler/check_fn wiring is unchanged.
- New plugins/spotify/plugin.yaml (kind: backend, bundled, auto-load).
- tests/tools/test_spotify_client.py: import paths updated.

tools_config fix — _DEFAULT_OFF_TOOLSETS now wins over plugin auto-enable:
- _get_platform_tools() previously auto-enabled unknown plugin
  toolsets for new platforms. That was fine for image_gen (which has
  no toolset of its own) but bad for Spotify, which explicitly
  requires opt-in (don't ship 7 tool schemas to users who don't use
  it). Added a check: if a plugin toolset is in _DEFAULT_OFF_TOOLSETS,
  it stays off until the user picks it in 'hermes tools'.

Pre-existing test bug fix:
- tests/hermes_cli/test_plugins.py::test_list_returns_sorted
  asserted names were sorted, but list_plugins() sorts by key
  (path-derived, e.g. image_gen/openai). With only image_gen plugins
  bundled, name and key order happened to agree. Adding plugins/spotify
  broke that coincidence (spotify sorts between openai-codex and xai
  by name but after xai by key). Updated test to assert key order,
  which is what the code actually documents.

Validation:
- scripts/run_tests.sh tests/hermes_cli/test_plugins.py \
    tests/hermes_cli/test_tools_config.py \
    tests/hermes_cli/test_spotify_auth.py \
    tests/tools/test_spotify_client.py \
    tests/tools/test_registry.py
  → 143 passed
- E2E plugin load: 'spotify' appears in loaded plugins, all 7 tools
  register into the spotify toolset, check_fn gating intact.
2026-04-24 07:06:11 -07:00
Teknium
e5d41f05d4
feat(spotify): consolidate tools (9→7), add spotify skill, surface in hermes setup (#15154)
Three quality improvements on top of #15121 / #15130 / #15135:

1. Tool consolidation (9 → 7)
   - spotify_saved_tracks + spotify_saved_albums → spotify_library with
     kind='tracks'|'albums'. Handler code was ~90 percent identical
     across the two old tools; the merge is a behavioral no-op.
   - spotify_activity dropped. Its 'now_playing' action was a duplicate
     of spotify_playback.get_currently_playing (both return identical
     204/empty payloads). Its 'recently_played' action moves onto
     spotify_playback as a new action — history belongs adjacent to
     live state.
   - Net: each API call ships 2 fewer tool schemas when the Spotify
     toolset is enabled, and the action surface is more discoverable
     (everything playback-related is on one tool).

2. Spotify skill (skills/media/spotify/SKILL.md)
   Teaches the agent canonical usage patterns so common requests don't
   balloon into 4+ tool calls:
   - 'play X' = one search, then play by URI (not search + scan +
     describe + play)
   - 'what's playing' = single get_currently_playing (no preflight
     get_state chain)
   - Don't retry on '403 Premium required' or '403 No active device' —
     both require user action
   - URI/URL/bare-ID format normalization
   - Full failure-mode reference for 204/401/403/429

3. Surfaced in 'hermes setup' tool status
   Adds 'Spotify (PKCE OAuth)' to the tool status list when
   auth.json has a Spotify access/refresh token. Matches the
   homeassistant pattern but reads from auth.json (OAuth-based) rather
   than env vars.

Docs updated to reflect the new 7-tool surface, and mention the
companion skill in the 'Using it' section.

Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after
renaming/replacing the spotify_activity tests with library +
recently_played coverage). Docusaurus build clean.
2026-04-24 06:14:51 -07:00
Austin Pickett
0fdbfad2b0 feat: embed docs 2026-04-24 09:04:11 -04:00
Teknium
9d1b277e1d chore(release): map Group H contributors in AUTHOR_MAP 2026-04-24 05:48:15 -07:00
XieNBi
4a51ab61eb fix(cli): non-zero /model counts for native OpenAI and direct API rows 2026-04-24 05:48:15 -07:00
Brian D. Evans
7f26cea390 fix(models): strip models/ prefix in Gemini validator (#12532)
Salvage of the Gemini-specific piece from PR #12585 by @briandevans.
Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed
with 'models/' (native Gemini-API convention), so set-membership against
curated bare IDs drops every model. Strip the prefix before comparison.

The Anthropic static-catalog piece of #12585 was subsumed by #12618's
_fetch_anthropic_models() branch landing earlier in the same salvage PR.
Full branch cherry-pick was skipped because it also carried unrelated
catalog-version regressions.
2026-04-24 05:48:15 -07:00
H-Ali13381
2303dd8686 fix(models): use Anthropic-native headers for model validation
The generic /v1/models probe in validate_requested_model() sent a plain
'Authorization: Bearer <key>' header, which works for OpenAI-compatible
endpoints but results in a 401 Unauthorized from Anthropic's API.
Anthropic requires x-api-key + anthropic-version headers (or Bearer for
OAuth tokens from Claude Code).

Add a provider-specific branch for normalized == 'anthropic' that calls
the existing _fetch_anthropic_models() helper, which already handles
both regular API keys and Claude Code OAuth tokens correctly.  This
mirrors the pattern already used for openai-codex, copilot, and bedrock.

The branch also includes:
- fuzzy auto-correct (cutoff 0.9) for near-exact model ID typos
- fuzzy suggestions (cutoff 0.5) when the model is not listed
- graceful fall-through when the token cannot be resolved or the
  network is unreachable (accepts with a warning rather than hard-fail)
- a note that newer/preview/snapshot model IDs can be gate-listed
  and may still work even if not returned by /v1/models

Fixes Anthropic provider users seeing 'service unreachable' errors
when running /model <claude-model> because every probe 401'd.
2026-04-24 05:48:15 -07:00
wangshengyang2004
647900e813 fix(cli): support model validation for anthropic_messages and cloudflare-protected endpoints
- probe_api_models: add api_mode param; use x-api-key + anthropic-version
  headers for anthropic_messages mode (Anthropic's native Models API auth)
- probe_api_models: add User-Agent header to avoid Cloudflare 403 blocks
  on third-party OpenAI-compatible endpoints
- validate_requested_model: pass api_mode through from switch_model
- validate_requested_model: for anthropic_messages mode, attempt probe with
  correct auth; if probe fails (many proxies don't implement /v1/models),
  accept the model with an informational warning instead of rejecting
- fetch_api_models: propagate api_mode to probe_api_models
2026-04-24 05:48:15 -07:00
Teknium
25465fd8d7 test(gateway): on_session_finalize fires on idle-expiry + AUTHOR_MAP
Regression test for #14981. Verifies that _session_expiry_watcher fires
on_session_finalize for each session swept out of the store, matching
the contract documented for /new, /reset, CLI shutdown, and gateway stop.

Verified the test fails cleanly on pre-fix code (hook call list missing
sess-expired) and passes with the fix applied.
2026-04-24 05:40:52 -07:00
Stefan Dimitrov
260ae62134 Invoke session finalize hooks on expiry flush 2026-04-24 05:40:52 -07:00
Teknium
9be17bb84f
docs(spotify): expand feature page with tool reference, Free/Premium matrix, troubleshooting (#15135)
The initial Spotify docs page shipped in #15130 was a setup guide. This
expands it into a full feature reference:

- Per-tool parameter table for all 9 tools, extracted from the real
  schemas in tools/spotify_tool.py (actions, required/optional args,
  premium gating).
- Free vs Premium feature matrix — which actions work on which tier,
  so Free users don't assume Spotify tools are useless to them.
- Active-device prerequisite called out at the top; this is the #1
  cause of '403 no active device' reports for every Spotify
  integration.
- SSH / headless section explaining that browser auto-open is skipped
  when SSH_CLIENT/SSH_TTY is set, and how to tunnel the callback port.
- Token lifecycle: refresh on 401, persistence across restarts, how
  to revoke server-side via spotify.com/account/apps.
- Example prompt list so users know what to ask the agent.
- Troubleshooting expanded: no-active-device, Premium-required, 204
  now_playing, INVALID_CLIENT, 429, 401 refresh-revoked, wizard not
  opening browser.
- 'Where things live' table mapping auth.json / .env / Spotify app.

Verified with 'node scripts/prebuild.mjs && npx docusaurus build'
— page compiles, no new warnings.
2026-04-24 05:38:02 -07:00
Teknium
fe9d9a26d8 chore(release): map Group F contributors in AUTHOR_MAP 2026-04-24 05:35:43 -07:00
Tranquil-Flow
ee83a710f0 fix(gateway,cron): activate fallback_model when primary provider auth fails
When the primary provider raises AuthError (expired OAuth token,
revoked API key), the error was re-raised before AIAgent was created,
so fallback_model was never consulted. Now both gateway/run.py and
cron/scheduler.py catch AuthError specifically and attempt to resolve
credentials from the fallback_providers/fallback_model config chain
before propagating the error.

Closes #7230
2026-04-24 05:35:43 -07:00
vlwkaos
f7f7588893 fix(agent): only set rate-limit cooldown when leaving primary; add tests 2026-04-24 05:35:43 -07:00
LeonSGP43
a9fd8d7c88 fix(agent): default missing fallback chain on switch 2026-04-24 05:35:43 -07:00
CruxExperts
46451528a5 fix(agent): pass config_context_length in fallback activation path
Try to activate fallback model after errors was calling get_model_context_length()
without the config_context_length parameter, causing it to fall through to
DEFAULT_FALLBACK_CONTEXT (128K) even when config.yaml has an explicit
model.context_length value (e.g. 204800 for MiniMax-M2.7).

This mirrors the fix already present in switch_model() at line 1988, which
correctly passes config_context_length. The fallback path was missed.

Fixes: context_length forced to 128K on fallback activation
2026-04-24 05:35:43 -07:00
Bartok9
4e27e498f1 fix(agent): exclude ssl.SSLError from is_local_validation_error to prevent non-retryable abort
ssl.SSLError (and its subclass ssl.SSLCertVerificationError) inherits from
OSError *and* ValueError via Python's MRO. The is_local_validation_error
check used isinstance(api_error, (ValueError, TypeError)) to detect
programming bugs that should abort immediately — but this inadvertently
caught ssl.SSLError, treating a TLS transport failure as a non-retryable
client error.

The error classifier already maps SSLCertVerificationError to
FailoverReason.timeout with retryable=True (its type name is in
_TRANSPORT_ERROR_TYPES), but the inline isinstance guard was overriding
that classification and triggering an unnecessary abort.

Fix: add ssl.SSLError to the exclusion list alongside the existing
UnicodeEncodeError carve-out so TLS errors fall through to the
classifier's retryable path.

Closes #14367
2026-04-24 05:35:43 -07:00
Teknium
ba44a3d256
fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133)
Two small fixes triggered by a support report where the user saw a
cryptic 'HTTP 400 - Error 400 (Bad Request)!!1' (Google's GFE HTML
error page, not a real API error) on every gemini-2.5-pro request.

The underlying cause was an empty GOOGLE_API_KEY / GEMINI_API_KEY, but
nothing in our output made that diagnosable:

1. hermes_cli/dump.py: the api_keys section enumerated 23 providers but
   omitted Google entirely, so users had no way to verify from 'hermes
   dump' whether the key was set. Added GOOGLE_API_KEY and GEMINI_API_KEY
   rows.

2. agent/gemini_native_adapter.py: GeminiNativeClient.__init__ accepted
   an empty/whitespace api_key and stamped it into the x-goog-api-key
   header, which made Google's frontend return a generic HTML 400 long
   before the request reached the Generative Language backend. Now we
   raise RuntimeError at construction with an actionable message
   pointing at GOOGLE_API_KEY/GEMINI_API_KEY and aistudio.google.com.

Added a regression test that covers '', '   ', and None.
2026-04-24 05:35:17 -07:00
Teknium
a1caec1088
fix(agent): repair CamelCase + _tool suffix tool-call emissions (#15124)
Claude-style and some Anthropic-tuned models occasionally emit tool
names as class-like identifiers: TodoTool_tool, Patch_tool,
BrowserClick_tool, PatchTool. These failed strict-dict lookup in
valid_tool_names and triggered the 'Unknown tool' self-correction
loop, wasting a full turn of iteration and tokens.

_repair_tool_call already handled lowercase / separator / fuzzy
matches but couldn't bridge the CamelCase-to-snake_case gap or the
trailing '_tool' suffix that Claude sometimes tacks on. Extend it
with two bounded normalization passes:

  1. CamelCase -> snake_case (via regex lookbehind).
  2. Strip trailing _tool / -tool / tool suffix (case-insensitive,
     applied twice so TodoTool_tool reduces all the way: strip
     _tool -> TodoTool, snake -> todo_tool, strip 'tool' -> todo).

Cheap fast-paths (lowercase / separator-normalized) still run first
so the common case stays zero-cost. Fuzzy match remains the last
resort unchanged.

Tests: tests/run_agent/test_repair_tool_call_name.py covers the
three original reports (TodoTool_tool, Patch_tool, BrowserClick_tool),
plus PatchTool, WriteFileTool, ReadFile_tool, write-file_Tool,
patch-tool, and edge cases (empty, None, '_tool' alone, genuinely
unknown names).

18 new tests + 17 existing arg-repair tests = 35/35 pass.

Closes #14784
2026-04-24 05:32:08 -07:00
Teknium
05394f2f28
feat(spotify): interactive setup wizard + docs page (#15130)
Previously 'hermes auth spotify' crashed with 'HERMES_SPOTIFY_CLIENT_ID
is required' if the user hadn't manually created a Spotify developer
app and set env vars. Now the command detects a missing client_id and
walks the user through the one-time app registration inline:

- Opens https://developer.spotify.com/dashboard in the browser
- Tells the user exactly what to paste into the Spotify form
  (including the correct default redirect URI, 127.0.0.1:43827)
- Prompts for the Client ID
- Persists HERMES_SPOTIFY_CLIENT_ID to ~/.hermes/.env so subsequent
  runs skip the wizard
- Continues straight into the PKCE OAuth flow

Also prints the docs URL at both the start of the wizard and the end
of a successful login so users can find the full guide.

Adds website/docs/user-guide/features/spotify.md with the complete
setup walkthrough, tool reference, and troubleshooting, and wires it
into the sidebar under User Guide > Features > Advanced.

Fixes a stale redirect URI default in the hermes_cli/tools_config.py
TOOL_CATEGORIES entry (was 8888/callback from the PR description
instead of the actual DEFAULT_SPOTIFY_REDIRECT_URI value
43827/spotify/callback defined in auth.py).
2026-04-24 05:30:05 -07:00
Teknium
0d32411310 chore(release): map Group D contributors in AUTHOR_MAP 2026-04-24 05:28:45 -07:00
Brian D. Evans
e87a2100f6 fix(mcp): auto-reconnect + retry once when the transport session expires (#13383)
Streamable HTTP MCP servers may garbage-collect their server-side
session state while the OAuth token remains valid — idle TTL, server
restart, pod rotation, etc.  Before this fix, the tool-call handler
treated the resulting "Invalid or expired session" error as a plain
tool failure with no recovery path, so **every subsequent call on
the affected server failed until the gateway was manually
restarted**.  Reporter: #13383.

The OAuth-based recovery path (``_handle_auth_error_and_retry``)
already exists for 401s, but it only fires on auth errors.  Session
expiry slipped through because the access token is still valid —
nothing 401'd, so the existing recovery branch was skipped.

Fix
---
Add a sibling function ``_handle_session_expired_and_retry`` that
detects MCP session-expiry via ``_is_session_expired_error`` (a
narrow allow-list of known-stable substrings: ``"invalid or expired
session"``, ``"session expired"``, ``"session not found"``,
``"unknown session"``, etc.) and then uses the existing transport
reconnect mechanism:

* Sets ``MCPServerTask._reconnect_event`` — the server task's
  lifecycle loop already interprets this as "tear down the current
  ``streamablehttp_client`` + ``ClientSession`` and rebuild them,
  reusing the existing OAuth provider instance".
* Waits up to 15 s for the new session to come back ready.
* Retries the original call once.  If the retry succeeds, returns
  its result and resets the circuit-breaker error count.  If the
  retry raises, or if the reconnect doesn't ready in time, falls
  through to the caller's generic error path.

Unlike the 401 path, this does **not** call ``handle_401`` — the
access token is already valid and running an OAuth refresh would be
a pointless round-trip.

All 5 MCP handlers (``call_tool``, ``list_resources``, ``read_resource``,
``list_prompts``, ``get_prompt``) now consult both recovery paths
before falling through:

    recovered = _handle_auth_error_and_retry(...)          # 401 path
    if recovered is not None: return recovered
    recovered = _handle_session_expired_and_retry(...)     # new
    if recovered is not None: return recovered
    # generic error response

Narrow scope — explicitly not changed
-------------------------------------
* **Detection is string-based on a 5-entry allow-list.**  The MCP
  SDK wraps JSON-RPC errors in ``McpError`` whose exception type +
  attributes vary across SDK versions, so matching on message
  substrings is the durable path.  Kept narrow to avoid false
  positives — a regular ``RuntimeError("Tool failed")`` will NOT
  trigger spurious reconnects (pinned by
  ``test_is_session_expired_rejects_unrelated_errors``).
* **No change to the existing 401 recovery flow.**  The new path is
  consulted only after the auth path declines (returns ``None``).
* **Retry count stays at 1.**  If the reconnect-then-retry also
  fails, we don't loop — the error surfaces normally so the model
  sees a failed tool call rather than a hang.
* **``InterruptedError`` is explicitly excluded** from session-expired
  detection so user-cancel signals always short-circuit the same
  way they did before (pinned by
  ``test_is_session_expired_rejects_interrupted_error``).

Regression coverage
-------------------
``tests/tools/test_mcp_tool_session_expired.py`` (new, 16 cases):

Unit tests for ``_is_session_expired_error``:
* ``test_is_session_expired_detects_invalid_or_expired_session`` —
  reporter's exact wpcom-mcp text.
* ``test_is_session_expired_detects_expired_session_variant`` —
  "Session expired" / "expired session" variants.
* ``test_is_session_expired_detects_session_not_found`` — server GC
  variant ("session not found", "unknown session").
* ``test_is_session_expired_is_case_insensitive``.
* ``test_is_session_expired_rejects_unrelated_errors`` — narrow-scope
  canary: random RuntimeError / ValueError / 401 don't trigger.
* ``test_is_session_expired_rejects_interrupted_error`` — user cancel
  must never route through reconnect.
* ``test_is_session_expired_rejects_empty_message``.

Handler integration tests:
* ``test_call_tool_handler_reconnects_on_session_expired`` — reporter's
  full repro: first call raises "Invalid or expired session", handler
  signals ``_reconnect_event``, retries once, returns the retry's
  success result with no ``error`` key.
* ``test_call_tool_handler_non_session_expired_error_falls_through``
  — preserved-behaviour canary: random tool failures do NOT trigger
  reconnect.
* ``test_session_expired_handler_returns_none_without_loop`` —
  defensive: cold-start / shutdown race.
* ``test_session_expired_handler_returns_none_without_server_record``
  — torn-down server falls through cleanly.
* ``test_session_expired_handler_returns_none_when_retry_also_fails``
  — no retry loop on repeated failure.

Parametrised across all 4 non-``tools/call`` handlers:
* ``test_non_tool_handlers_also_reconnect_on_session_expired``
  [list_resources / read_resource / list_prompts / get_prompt].

**15 of 16 fail on clean ``origin/main`` (``6fb69229``)** with
``ImportError: cannot import name '_is_session_expired_error'``
— the fix's surface symbols don't exist there yet.  The 1 passing
test is an ordering artefact of pytest-xdist worker collection.

Validation
----------
``source venv/bin/activate && python -m pytest
tests/tools/test_mcp_tool_session_expired.py -q`` → **16 passed**.

Broader MCP suite (5 files:
``test_mcp_tool.py``, ``test_mcp_tool_401_handling.py``,
``test_mcp_tool_session_expired.py``, ``test_mcp_reconnect_signal.py``,
``test_mcp_oauth.py``) → **230 passed, 0 regressions**.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 05:28:45 -07:00
AntAISecurityLab
8c2732a9f9 fix(security): strip MCP auth on cross-origin redirect
Add event hook to httpx.AsyncClient in MCP HTTP transport that strips
Authorization headers when a redirect targets a different origin,
preventing credential leakage to third-party servers.
2026-04-24 05:28:45 -07:00
Alexazhu
15050fd965 fix(mcp_oauth): raise RuntimeError instead of asserting OAuth port is set
``tools/mcp_oauth.py`` relied on ``assert _oauth_port is not None`` to
guard the module-level port set by ``build_oauth_auth``. Python's
``-O`` / ``-OO`` optimization flags strip ``assert`` statements
entirely, so a deployment that runs ``python -O -m hermes ...``
silently loses the check: ``_oauth_port`` stays ``None`` and the
failure surfaces much later as an obscure ``int()`` or
``http.server.HTTPServer((host, None))`` TypeError rather than the
intended "OAuth callback port not set" signal.

Replace with an explicit ``if … raise RuntimeError(...)`` so the
invariant is preserved regardless of the interpreter's optimization
level. Docstring updated to document the new exception.

Found during a proactive audit of ``assert`` statements in
non-test code paths.
2026-04-24 05:28:45 -07:00
Amanuel Tilahun Bogale
5fa2f4258a fix: serialize Pydantic AnyUrl fields when persisting MCP OAuth state
OAuth client information and token responses from the MCP SDK contain
Pydantic AnyUrl fields (client_uri, redirect_uris, etc.). The previous
model_dump() call returned a dict with these AnyUrl objects still as
their native Python type, which then crashed json.dumps with:

  TypeError: Object of type AnyUrl is not JSON serializable

This caused any OAuth-based MCP server (e.g. alphaxiv) to fail
registration with an "OAuth flow error" traceback during startup.

Adding mode="json" tells Pydantic to serialize all fields to
JSON-compatible primitives (AnyUrl -> str, datetime -> ISO string, etc.)
before returning the dict, so the standard json.dumps can handle it.

Three call sites fixed:
- HermesTokenStorage.set_tokens
- HermesTokenStorage.set_client_info
- build_oauth_auth pre-registration write
2026-04-24 05:28:45 -07:00
0xbyt4
4ac731c841 fix(model-normalize): pass DeepSeek V-series IDs through instead of folding to deepseek-chat
`_normalize_for_deepseek` was mapping every non-reasoner input into
`deepseek-chat` on the assumption that DeepSeek's API accepts only two
model IDs. That assumption no longer holds — `deepseek-v4-pro` and
`deepseek-v4-flash` are first-class IDs accepted by the direct API,
and on aggregators `deepseek-chat` routes explicitly to V3 (DeepInfra
backend returns `deepseek-chat-v3`). So a user picking V4 Pro through
the model picker was being silently downgraded to V3.

Verified 2026-04-24 against Nous portal's OpenAI-compat surface:
  - `deepseek/deepseek-v4-flash` → provider: DeepSeek,
    model: deepseek-v4-flash-20260423
  - `deepseek/deepseek-chat`     → provider: DeepInfra,
    model: deepseek/deepseek-chat-v3

Fix:
- Add `deepseek-v4-pro` and `deepseek-v4-flash` to
  `_DEEPSEEK_CANONICAL_MODELS` so exact matches pass through.
- Add `_DEEPSEEK_V_SERIES_RE` (`^deepseek-v\d+(...)?$`) so future
  V-series IDs (`deepseek-v5-*`, dated variants) keep passing through
  without another code change.
- Update docstring + module header to reflect the new rule.

Tests:
- New `TestDeepseekVSeriesPassThrough` — 8 parametrized cases covering
  bare, vendor-prefixed, case-variant, dated, and future V-series IDs
  plus end-to-end `normalize_model_for_provider(..., "deepseek")`.
- New `TestDeepseekCanonicalAndReasonerMapping` — regression coverage
  for canonical pass-through, reasoner-keyword folding, and
  fall-back-to-chat behaviour.
- 77/77 pass.

Reported on Discord (Ufonik, Don Piedro): `/model > Deepseek >
deepseek-v4-pro` surfaced
`Normalized 'deepseek-v4-pro' to 'deepseek-chat'`. Picker listing
showed the v4 names, so validation also rejected the post-normalize
`deepseek-chat` as "not in provider listing" — the contradiction
users saw. Normalizer now respects the picker's choice.
2026-04-24 05:24:54 -07:00
Austin Pickett
4f5669a569 feat: add docs link 2026-04-24 08:22:44 -04:00
Teknium
acd78a457e
fix(docker): reap orphaned subprocesses via tini as PID 1 (#15116)
Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes #15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
2026-04-24 05:22:34 -07:00
Teknium
4ff7950f7f chore(spotify): gate toolset off by default, add to hermes tools UI
Follow-up on top of #15096 cherry-pick:
- Remove spotify_* from _HERMES_CORE_TOOLS (keep only in the 'spotify'
  toolset, so the 9 Spotify tool schemas are not shipped to every user).
- Add 'spotify' to CONFIGURABLE_TOOLSETS + _DEFAULT_OFF_TOOLSETS so new
  installs get it opt-in via 'hermes tools', matching homeassistant/rl.
- Wire TOOL_CATEGORIES entry pointing at 'hermes auth spotify' for the
  actual PKCE login (optional HERMES_SPOTIFY_CLIENT_ID /
  HERMES_SPOTIFY_REDIRECT_URI env vars).
- scripts/release.py: map contributor email to GitHub login.
2026-04-24 05:20:38 -07:00
Dilee
7e9dd9ca45 Add native Spotify tools with PKCE auth 2026-04-24 05:20:38 -07:00
Teknium
3392d1e422 chore(release): map Group E contributors in AUTHOR_MAP 2026-04-24 05:20:05 -07:00
konsisumer
785d168d50 fix(credential_pool): add Nous OAuth cross-process auth-store sync
Concurrent Hermes processes (e.g. cron jobs) refreshing a Nous OAuth token
via resolve_nous_runtime_credentials() write the rotated tokens to auth.json.
The calling process's pool entry becomes stale, and the next refresh against
the already-rotated token triggers a 'refresh token reuse' revocation on
the Nous Portal.

_sync_nous_entry_from_auth_store() reads auth.json under the same lock used
by resolve_nous_runtime_credentials, and adopts the newer token pair before
refreshing the pool entry. This complements #15111 (which preserved the
obtained_at timestamps through seeding).

Partial salvage of #10160 by @konsisumer — only the agent/credential_pool.py
changes + the 3 Nous-specific regression tests. The PR also touched 10
unrelated files (Dockerfile, tips.py, various tool tests) which were
dropped as scope creep.

Regression tests:
- test_sync_nous_entry_from_auth_store_adopts_newer_tokens
- test_sync_nous_entry_noop_when_tokens_match
- test_nous_exhausted_entry_recovers_via_auth_store_sync
2026-04-24 05:20:05 -07:00
Michael Steuer
cd221080ec fix: validate nous auth status against runtime credentials 2026-04-24 05:20:05 -07:00
Prasad Subrahmanya
1fc77f995b fix(agent): fall back on rate limit when pool has no rotation room
Extracts pool-rotation-room logic into `_pool_may_recover_from_rate_limit`
so single-credential pools no longer block the eager-fallback path on 429.

The existing check `pool is not None and pool.has_available()` lets
fallback fire only after the pool marks every entry as exhausted.  With
exactly one credential in the pool (the common shape for Gemini OAuth,
Vertex service accounts, and any personal-key setup), `has_available()`
flips back to True as soon as the cooldown expires — Hermes retries
against the same entry, hits the same daily-quota 429, and burns the
retry budget in a tight loop before ever reaching the configured
`fallback_model`.  Observed in the wild as 4+ hours of 429 noise on a
single Gemini key instead of falling through to Vertex as configured.

Rotation is only meaningful with more than one credential — gate on
`len(pool.entries()) > 1`.  Multi-credential pools keep the current
wait-for-rotation behaviour unchanged.

Fixes #11314.  Related to #8947, #10210, #7230.  Narrower scope than
open PRs #8023 (classifier change) and #11492 (503/529 credential-pool
bypass) — this addresses the single-credential 429 case specifically
and does not conflict with either.

Tests: 6 new unit tests in tests/run_agent/test_provider_fallback.py
covering (a) None pool, (b) single-cred available, (c) single-cred in
cooldown, (d) 2-cred available rotates, (e) multi-cred all cooling-down
falls back, (f) many-cred available rotates.  All 18 tests in the file
pass.
2026-04-24 05:20:05 -07:00
jakubkrcmar
1af44a13c0 fix(model_picker): detect mapped-provider auth-store credentials 2026-04-24 05:20:05 -07:00
Andy
fff7ee31ae fix: clarify auth retry guidance 2026-04-24 05:20:05 -07:00
YueLich
6fcaf5ebc2 fix: rotate credential pool on 403 (Forbidden) responses
Previously _handle_credential_pool_error handled 401, 402, and 429
but silently ignored 403. When a provider returns 403 for a revoked or
unauthorised credential (e.g. Nous agent_key invalidated by a newer
login), the pool was never rotated and every subsequent request
continued to use the same failing credential.

Treat 403 the same as 402: immediately mark the current credential
exhausted and rotate to the next pool entry, since a Forbidden response
will not resolve itself with a retry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 05:20:05 -07:00
vominh1919
461899894e fix: increment request_count in least_used pool strategy
The least_used strategy selected entries via min(request_count) but
never incremented the counter. All entries stayed at count=0, so the
strategy degenerated to fill_first behavior with no actual load balancing.

Now increments request_count after each selection and persists the update.
2026-04-24 05:20:05 -07:00
Teknium
b3aed6cfd8 chore(release): map l0hde and difujia in AUTHOR_MAP 2026-04-24 05:09:08 -07:00
NiuNiu Xia
76329196c1 fix(copilot): wire live /models max_prompt_tokens into context-window resolver
The Copilot provider resolved context windows via models.dev static data,
which does not include account-specific models (e.g. claude-opus-4.6-1m
with 1M context). This adds the live Copilot /models API as a higher-
priority source for copilot/copilot-acp/github-copilot providers.

New helper get_copilot_model_context() in hermes_cli/models.py extracts
capabilities.limits.max_prompt_tokens from the cached catalog. Results
are cached in-process for 1 hour.

In agent/model_metadata.py, step 5a queries the live API before falling
through to models.dev (step 5b). This ensures account-specific models
get correct context windows while standard models still have a fallback.

Part 1 of #7731.
Refs: #7272
2026-04-24 05:09:08 -07:00
NiuNiu Xia
d7ad07d6fe fix(copilot): exchange raw GitHub token for Copilot API JWT
Raw GitHub tokens (gho_/github_pat_/ghu_) are now exchanged for
short-lived Copilot API tokens via /copilot_internal/v2/token before
being used as Bearer credentials. This is required to access
internal-only models (e.g. claude-opus-4.6-1m with 1M context).

Implementation:
- exchange_copilot_token(): calls the token exchange endpoint with
  in-process caching (dict keyed by SHA-256 fingerprint), refreshed
  2 minutes before expiry. No disk persistence — gateway is long-running
  so in-memory cache is sufficient.
- get_copilot_api_token(): convenience wrapper with graceful fallback —
  returns exchanged token on success, raw token on failure.
- Both callers (hermes_cli/auth.py and agent/credential_pool.py) now
  pipe the raw token through get_copilot_api_token() before use.

12 new tests covering exchange, caching, expiry, error handling,
fingerprinting, and caller integration. All 185 existing copilot/auth
tests pass.

Part 2 of #7731.
2026-04-24 05:09:08 -07:00
l0hde
2cab8129d1 feat(copilot): add 401 auth recovery with automatic token refresh and client rebuild
When using GitHub Copilot as provider, HTTP 401 errors could cause
Hermes to silently fall back to the next model in the chain instead
of recovering. This adds a one-shot retry mechanism that:

1. Re-resolves the Copilot token via the standard priority chain
   (COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token)
2. Rebuilds the OpenAI client with fresh credentials and Copilot headers
3. Retries the failed request before falling back

The fix handles the common case where the gho_* OAuth token remains
valid but the httpx client state becomes stale (e.g. after startup
race conditions or long-lived sessions).

Key design decisions:
- Always rebuild client even if token string unchanged (recovers stale state)
- Uses _apply_client_headers_for_base_url() for canonical header management
- One-shot flag guard prevents infinite 401 loops (matches existing pattern
  used by Codex/Nous/Anthropic providers)
- No token exchange via /copilot_internal/v2/token (returns 404 for some
  account types; direct gho_* auth works reliably)

Tests: 3 new test cases covering end-to-end 401->refresh->retry,
client rebuild verification, and same-token rebuild scenarios.
Docs: Updated providers.md with Copilot auth behavior section.
2026-04-24 05:09:08 -07:00
MestreY0d4-Uninter
7d2f93a97f fix: set HOME for Copilot ACP subprocesses
Pass an explicit HOME into Copilot ACP child processes so delegated ACP runs do not fail when the ambient environment is missing HOME.

Prefer the per-profile subprocess home when available, then fall back to HOME, expanduser('~'), pwd.getpwuid(...), and /home/openclaw. Add regression tests for both profile-home preference and clean HOME fallback.

Refs #11068.
2026-04-24 05:09:08 -07:00
Teknium
78450c4bd6
fix(nous-oauth): preserve obtained_at in pool + actionable message on RT reuse (#15111)
Two narrow fixes motivated by #15099.

1. _seed_from_singletons() was dropping obtained_at, agent_key_obtained_at,
   expires_in, and friends when seeding device_code pool entries from the
   providers.nous singleton. Fresh credentials showed up with
   obtained_at=None, which broke downstream freshness-sensitive consumers
   (self-heal hooks, pool pruning by age) — they treated just-minted
   credentials as older than they actually were and evicted them.

2. When the Nous Portal OAuth 2.1 server returns invalid_grant with
   'Refresh token reuse detected' in the error_description, rewrite the
   message to explain the likely cause (an external process consumed the
   rotated RT without persisting it back) and the mitigation. The generic
   reuse message led users to report this as a Hermes persistence bug when
   the actual trigger was typically a third-party monitoring script calling
   /api/oauth/token directly. Non-reuse errors keep their original server
   description untouched.

Closes #15099.

Regression tests:
- tests/agent/test_credential_pool.py::test_nous_seed_from_singletons_preserves_obtained_at_timestamps
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_token_reuse_detection_surfaces_actionable_message
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_non_reuse_error_keeps_original_description
2026-04-24 05:08:46 -07:00
Teknium
852c7f3be3
feat(cron): per-job workdir for project-aware cron runs (#15110)
Cron jobs can now specify a per-job working directory. When set, the job
runs as if launched from that directory: AGENTS.md / CLAUDE.md /
.cursorrules from that dir are injected into the system prompt, and the
terminal / file / code-exec tools use it as their cwd (via TERMINAL_CWD).
When unset, old behaviour is preserved (no project context files, tools
use the scheduler's cwd).

Requested by @bluthcy.

## Mechanism

- cron/jobs.py: create_job / update_job accept 'workdir'; validated to
  be an absolute existing directory at create/update time.
- cron/scheduler.py run_job: if job.workdir is set, point TERMINAL_CWD
  at it and flip skip_context_files to False before building the agent.
  Restored in finally on every exit path.
- cron/scheduler.py tick: workdir jobs run sequentially (outside the
  thread pool) because TERMINAL_CWD is process-global. Workdir-less jobs
  still run in the parallel pool unchanged.
- tools/cronjob_tools.py + hermes_cli/cron.py + hermes_cli/main.py:
  expose 'workdir' via the cronjob tool and 'hermes cron create/edit
  --workdir ...'. Empty string on edit clears the field.

## Validation

- tests/cron/test_cron_workdir.py (21 tests): normalize, create, update,
  JSON round-trip via cronjob tool, tick partition (workdir jobs run on
  the main thread, not the pool), run_job env toggle + restore in finally.
- Full targeted suite (tests/cron/, test_cronjob_tools.py, test_cron.py,
  test_config_cwd_bridge.py, test_worktree.py): 314/314 passed.
- Live smoke: hermes cron create --workdir $(pwd) works; relative path
  rejected; list shows 'Workdir:'; edit --workdir '' clears.
2026-04-24 05:07:01 -07:00
Teknium
0e235947b9
fix(redact): honor security.redact_secrets from config.yaml (#15109)
agent/redact.py snapshots _REDACT_ENABLED from HERMES_REDACT_SECRETS at
module-import time. hermes_cli/main.py calls setup_logging() early, which
transitively imports agent.redact — BEFORE any config bridge has run. So
users who set 'security.redact_secrets: false' in config.yaml (instead of
HERMES_REDACT_SECRETS=false in .env) had the toggle silently ignored in
both 'hermes chat' and 'hermes gateway run'.

Bridge config.yaml -> env var in hermes_cli/main.py BEFORE setup_logging.
.env still wins (only set env when unset) — config.yaml is the fallback.

Regression tests in tests/hermes_cli/test_redact_config_bridge.py spawn
fresh subprocesses to verify:
- redact_secrets: false in config.yaml disables redaction
- default (key absent) leaves redaction enabled
- .env HERMES_REDACT_SECRETS=true overrides config.yaml
2026-04-24 05:03:26 -07:00
Teknium
c2b3db48f5
fix(agent): retry on json.JSONDecodeError instead of treating it as a local validation error (#15107)
json.JSONDecodeError inherits from ValueError. The agent loop's
non-retryable classifier at run_agent.py ~L10782 treated any
ValueError/TypeError as a local programming bug and short-circuited
retry. Without a carve-out, a transient JSONDecodeError from a
provider that returned a malformed response body, a truncated stream,
or a router-layer corruption would fail the turn immediately.

Add JSONDecodeError to the existing UnicodeEncodeError exclusion
tuple so the classified-retry logic (which already handles 429/529/
context-overflow/etc.) gets to run on bad-JSON errors.

Tests (tests/run_agent/test_jsondecodeerror_retryable.py):
  - JSONDecodeError: NOT local validation
  - UnicodeEncodeError: NOT local validation (existing carve-out)
  - bare ValueError: IS local validation (programming bug)
  - bare TypeError: IS local validation (programming bug)
  - source-level assertion that run_agent.py still carries the carve-out
    (guards against accidental revert)

Closes #14782
2026-04-24 05:02:58 -07:00
Teknium
1eb29e6452
fix(opencode): derive api_mode from target model, not stale config default (#15106)
/model kimi-k2.6 on opencode-zen (or glm-5.1 on opencode-go) returned OpenCode's
website 404 HTML page when the user's persisted model.default was a Claude or
MiniMax model. The switched-to chat_completions request hit
https://opencode.ai/zen (or /zen/go) with no /v1 suffix.

Root cause: resolve_runtime_provider() computed api_mode from
model_cfg.get('default') instead of the model being requested. With a Claude
default, it resolved api_mode=anthropic_messages, stripped /v1 from base_url
(required for the Anthropic SDK), then switch_model()'s opencode_model_api_mode
override flipped api_mode back to chat_completions without restoring /v1.

Fix: thread an optional target_model kwarg through resolve_runtime_provider
and _resolve_runtime_from_pool_entry. When the caller is performing an explicit
mid-session model switch (i.e. switch_model()), the target model drives both
api_mode selection and the conditional /v1 strip. Other callers (CLI init,
gateway init, cron, ACP, aux client, delegate, account_usage, tui_gateway) pass
nothing and preserve the existing config-default behavior.

Regression tests added in test_model_switch_opencode_anthropic.py use the REAL
resolver (not a mock) to guard the exact Quentin-repro scenario. Existing tests
that mocked resolve_runtime_provider with 'lambda requested:' had their mock
signatures widened to '**kwargs' to accept the new kwarg.
2026-04-24 04:58:46 -07:00
Teknium
7634c1386f
feat(delegate): diagnostic dump when a subagent times out with 0 API calls (#15105)
When a subagent in delegate_task times out before making its first LLM
request, write a structured diagnostic file under
~/.hermes/logs/subagent-timeout-<sid>-<ts>.log capturing enough state
for the user (and us) to debug the hang. The old error message —
'Subagent timed out after Ns with no response. The child may be stuck
on a slow API call or unresponsive network request.' — gave no
observability for the 0-API-call case, which is the hardest to reason
about remotely.

The diagnostic captures:
  - timeout config vs actual duration
  - goal (truncated to 1000 chars)
  - child config: model, provider, api_mode, base_url, max_iterations,
    quiet_mode, platform, _delegate_role, _delegate_depth
  - enabled_toolsets + loaded tool names
  - system prompt byte/char count (catches oversized prompts that
    providers silently choke on)
  - tool schema count + byte size
  - child's get_activity_summary() snapshot
  - Python stack of the worker thread at the moment of timeout
    (reveals whether the hang is in credential resolution, transport,
    prompt construction, etc.)

Wiring:
  - _run_single_child captures the worker thread via a small wrapper
    around child.run_conversation so we can look up its stack at
    timeout.
  - After a FuturesTimeoutError, we pull child.get_activity_summary()
    to read api_call_count. If 0 AND it was a timeout (not a raise),
    _dump_subagent_timeout_diagnostic() is invoked.
  - The returned path is surfaced in the error string so the parent
    agent (and therefore the user / gateway) sees exactly where to look.
  - api_calls > 0 timeouts keep the old 'stuck on slow API call'
    phrasing since that's the correct diagnosis for those.

This does NOT change any behavior for successful subagent runs,
non-timeout errors, or subagents that made at least one API call
before hanging.

Tests: 7 cases (tests/tools/test_delegate_subagent_timeout_diagnostic.py)
  - output format + required sections + field values
  - long-goal truncation with [truncated] marker
  - missing / already-exited worker thread branches
  - unwritable HERMES_HOME/logs/ returns None without raising
  - _run_single_child wiring: 0 API calls → dump + diagnostic_path in error
  - _run_single_child wiring: N>0 API calls → no dump, old message

Refs: #14726
2026-04-24 04:58:32 -07:00
Teknium
3cb43df2cd chore(release): add georgex8001 to AUTHOR_MAP 2026-04-24 04:54:16 -07:00
georgex8001
1dca2e0a28 fix(runtime): resolve bare custom provider to loopback or CUSTOM_BASE_URL
When /model selects Custom but model.provider in YAML still reflects a prior provider, trust model.base_url only for loopback hosts or when provider is custom. Consult CUSTOM_BASE_URL before OpenRouter defaults (#14676).
2026-04-24 04:54:16 -07:00
Teknium
2f39dbe471 chore(release): map j3ffffff and A-FdL-Prog in AUTHOR_MAP 2026-04-24 04:53:32 -07:00
Matt Maximo
271f0e6eb0 fix(model): let Codex setup reuse or reauthenticate 2026-04-24 04:53:32 -07:00
Devzo
813dbd9b40 fix(codex): route auth failures to fallback provider chain
Two related paths where Codex auth failures silently swallowed the
fallback chain instead of switching to the next provider:

1. cli.py — _ensure_runtime_credentials() calls resolve_runtime_provider()
   before each turn. When provider is explicitly configured (not "auto"),
   an AuthError from token refresh is re-raised and printed as a bold-red
   error, returning False before the agent ever starts. The fallback chain
   was never tried. Fix: on AuthError, iterate fallback_providers and
   switch to the first one that resolves successfully.

2. run_agent.py — inside the codex_responses validity gate (inner retry
   loop), response.status in {"failed","cancelled"} with non-empty output
   items was treated as a valid response and broke out of the retry loop,
   reaching _normalize_codex_response() outside the fallback machinery.
   That function raises RuntimeError on status="failed", which propagates
   to the outer except with no fallback logic. Fix: detect terminal status
   codes before the output_items check and set response_invalid=True so
   the existing fallback chain fires normally.
2026-04-24 04:53:32 -07:00
j3ffffff
f76df30e08 fix(auth): parse OpenAI nested error shape in Codex token refresh
OpenAI's OAuth token endpoint returns errors in a nested shape —
{"error": {"code": "refresh_token_reused", "message": "..."}} —
not the OAuth spec's flat {"error": "...", "error_description": "..."}.
The existing parser only handled the flat shape, so:

- `err.get("error")` returned a dict, the `isinstance(str)` guard
  rejected it, and `code` stayed `"codex_refresh_failed"`.
- The dedicated `refresh_token_reused` branch (with its actionable
  "re-run codex + hermes auth" message and `relogin_required=True`)
  never fired.
- Users saw the generic "Codex token refresh failed with status 401"
  when another Codex client (CLI, VS Code extension) had consumed
  their single-use refresh token — giving no hint that re-auth was
  required.

Parse both shapes, mapping OpenAI's nested `code`/`type` onto the
existing `code` variable so downstream branches (`refresh_token_reused`,
`invalid_grant`, etc.) fire correctly.

Add regression tests covering:
- nested `refresh_token_reused` → actionable message + relogin_required
- nested generic code → code + message surfaced
- flat OAuth-spec `invalid_grant` still handled (back-compat)
- unparseable body → generic fallback message, relogin_required=False

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 04:53:32 -07:00
Teknium
227afcd80f chore(release): map jiechengwu@pony.ai to Jason2031
AUTHOR_MAP entry for the cherry-picked commit in salvaged PR #13483
so release notes attribute correctly.
2026-04-24 04:52:11 -07:00
Teknium
06b60b76cd fix(docker): safer docker-compose defaults for UID and dashboard bind
Follow-up to salvaged PR #13483:
- Default HERMES_UID/HERMES_GID to 10000 (matches Dockerfile's useradd
  and the entrypoint's default) instead of 1001. Users should set these
  to their own id -u / id -g; document that in the header.
- Dashboard service: bind to 127.0.0.1 without --insecure by default.
  The dashboard stores API keys; the original compose file exposed it on
  0.0.0.0 with auth explicitly disabled, which the dashboard's own
  --insecure help text flags as DANGEROUS.
- Add header comments explaining HERMES_UID usage, the dashboard
  security posture, and how to expose the API server safely.
2026-04-24 04:52:11 -07:00
Jiecheng Wu
14c9f7272c fix(docker): fix HERMES_UID permission handling and add docker-compose.yml
- Remove 'USER hermes' from Dockerfile so entrypoint runs as root and can
  usermod/groupmod before gosu drop. Add chmod -R a+rX /opt/hermes so any
  remapped UID can read the install directory.
- Fix entrypoint chown logic: always chown -R when HERMES_UID is remapped
  from default 10000, not just when top-level dir ownership mismatches.
- Add docker-compose.yml with gateway + dashboard services.
- Add .hermes to .gitignore.
2026-04-24 04:52:11 -07:00
LeonSGP43
ccc8fccf77 fix(cli): validate user-defined providers consistently 2026-04-24 04:48:56 -07:00
Teknium
3aa1a41e88
feat(gemini): block free-tier keys at setup + surface guidance on 429 (#15100)
Google AI Studio's free tier (<= 250 req/day for gemini-2.5-flash) is
exhausted in a handful of agent turns, so the setup wizard now refuses
to wire up Gemini when the supplied key is on the free tier, and the
runtime 429 handler appends actionable billing guidance.

Setup-time probe (hermes_cli/main.py):
- `_model_flow_api_key_provider` fires one minimal generateContent call
  when provider_id == 'gemini' and classifies the response as
  free/paid/unknown via x-ratelimit-limit-requests-per-day header or
  429 body containing 'free_tier'.
- Free  -> print block message, refuse to save the provider, return.
- Paid  -> 'Tier check: paid' and proceed.
- Unknown (network/auth error) -> 'could not verify', proceed anyway.

Runtime 429 handler (agent/gemini_native_adapter.py):
- `gemini_http_error` appends billing guidance when the 429 error body
  mentions 'free_tier', catching users who bypass setup by putting
  GOOGLE_API_KEY directly in .env.

Tests: 21 unit tests for the probe + error path, 4 tests for the
setup-flow block. All 67 existing gemini tests still pass.
2026-04-24 04:46:17 -07:00
Teknium
346601ca8d
fix(context): invalidate stale Codex OAuth cache entries >= 400k (#15078)
PR #14935 added a Codex-aware context resolver but only new lookups
hit the live /models probe. Users who had run Hermes on gpt-5.5 / 5.4
BEFORE that PR already had the wrong value (e.g. 1,050,000 from
models.dev) persisted in ~/.hermes/context_length_cache.yaml, and the
cache-first lookup in get_model_context_length() returns it forever.

Symptom (reported in the wild by Ludwig, min heo, Gaoge on current
main at 6051fba9d, which is AFTER #14935):
  * Startup banner shows context usage against 1M
  * Compression fires late and then OpenAI hard-rejects with
    'context length will be reduced from 1,050,000 to 128,000'
    around the real 272k boundary.

Fix: when the step-1 cache returns a value for an openai-codex lookup,
check whether it's >= 400k. Codex OAuth caps every slug at 272k (live
probe values) so anything at or above 400k is definitionally a
pre-#14935 leftover. Drop that entry from the on-disk cache and fall
through to step 5, which runs the live /models probe and repersists
the correct value (or 272k from the hardcoded fallback if the probe
fails). Non-Codex providers and legitimately-cached Codex entries at
272k are untouched.

Changes:
- agent/model_metadata.py:
  * _invalidate_cached_context_length() — drop a single entry from
    context_length_cache.yaml and rewrite the file.
  * Step-1 cache check in get_model_context_length() now gates
    provider=='openai-codex' entries >= 400k through invalidation
    instead of returning them.

Tests (3 new in TestCodexOAuthContextLength):
- stale 1.05M Codex entry is dropped from disk AND re-resolved
  through the live probe to 272k; unrelated cache entries survive.
- fresh 272k Codex entry is respected (no probe call, no invalidation).
- non-Codex 1M entries (e.g. anthropic/claude-opus-4.6 on OpenRouter)
  are unaffected — the guard is strictly scoped to openai-codex.

Full tests/agent/test_model_metadata.py: 88 passed.
2026-04-24 04:46:07 -07:00
Teknium
18f3fc8a6f
fix(tests): resolve 17 persistent CI test failures (#15084)
Make the main-branch test suite pass again. Most failures were tests
still asserting old shapes after recent refactors; two were real source
bugs.

Source fixes:
- tools/mcp_tool.py: _kill_orphaned_mcp_children() slept 2s on every
  shutdown even when no tracked PIDs existed, making test_shutdown_is_parallel
  measure ~3s for 3 parallel 1s shutdowns. Early-return when pids is empty.
- hermes_cli/tips.py: tip 105 was 157 chars; corpus max is 150.

Test fixes (mostly stale mock targets / missing fixture fields):
- test_zombie_process_cleanup, test_agent_cache: patch run_agent.cleanup_vm
  (the local name bound at import), not tools.terminal_tool.cleanup_vm.
- test_browser_camofox: patch tools.browser_camofox.load_config, not
  hermes_cli.config.load_config (the source module, not the resolved one).
- test_flush_memories_codex._chat_response_with_memory_call: add
  finish_reason, tool_call.id, tool_call.type so the chat_completions
  transport normalizer doesn't AttributeError.
- test_concurrent_interrupt: polling_tool signature now accepts
  messages= kwarg that _invoke_tool() passes through.
- test_minimax_provider: add _fallback_chain=[] to the __new__'d agent
  so switch_model() doesn't AttributeError.
- test_skills_config: SKILLS_DIR MagicMock + .rglob stopped working
  after the scanner switched to agent.skill_utils.iter_skill_index_files
  (os.walk-based). Point SKILLS_DIR at a real tmp_path and patch
  agent.skill_utils.get_external_skills_dirs.
- test_browser_cdp_tool: browser_cdp toolset was intentionally split into
  'browser-cdp' (commit 96b0f3700) so its stricter check_fn doesn't gate
  the whole browser toolset; test now expects 'browser-cdp'.
- test_registry: add tools.browser_dialog_tool to the expected
  builtin-discovery set (PR #14540 added it).
- test_file_tools TestPatchHints: patch_tool surfaces hints as a '_hint'
  key on the JSON payload, not inline '[Hint: ...' text.
- test_write_deny test_hermes_env: resolve .env via get_hermes_home() so
  the path matches the profile-aware denylist under hermetic HERMES_HOME.
- test_checkpoint_manager test_falls_back_to_parent: guard the walk-up
  so a stray /tmp/pyproject.toml on the host doesn't pick up /tmp as the
  project root.
- test_quick_commands: set cli.session_id in the __new__'d CLI so the
  alias-args path doesn't trip AttributeError when fuzzy-matching leaks
  a skill command across xdist test distribution.
2026-04-24 03:46:46 -07:00
Teknium
1f9c368622
fix(gemini): drop integer/number/boolean enums from tool schemas (#15082)
Gemini's Schema validator requires every `enum` entry to be a string,
even when the parent `type` is integer/number/boolean. Discord's
`auto_archive_duration` parameter (`type: integer, enum: [60, 1440,
4320, 10080]`) tripped this on every request that shipped the full
tool catalog to generativelanguage.googleapis.com, surfacing as
`Gateway: Non-retryable client error: Gemini HTTP 400 (INVALID_ARGUMENT)
Invalid value ... (TYPE_STRING), 60` and aborting the turn.

Sanitize by dropping the `enum` key when the declared type is numeric
or boolean and any entry is non-string. The `type` and `description`
survive, so the model still knows the allowed values; the tool handler
keeps its own runtime validation. Other providers (OpenAI,
OpenRouter, Anthropic) are unaffected — the sanitizer only runs for
native Gemini / cloudcode adapters.

Reported by @selfhostedsoul on Discord with hermes debug share.
2026-04-24 03:40:00 -07:00
Nicolò Boschi
edff2fbe7e feat(hindsight): optional bank_id_template for per-agent / per-user banks
Adds an optional bank_id_template config that derives the bank name at
initialize() time from runtime context. Existing users with a static
bank_id keep the current behavior (template is empty by default).

Supported placeholders:
  {profile}   — active Hermes profile (agent_identity kwarg)
  {workspace} — Hermes workspace (agent_workspace kwarg)
  {platform}  — cli, telegram, discord, etc.
  {user}      — platform user id (gateway sessions)
  {session}   — session id

Unsafe characters in placeholder values are sanitized, and empty
placeholders collapse cleanly (e.g. "hermes-{user}" with no user
becomes "hermes"). If the template renders empty, the static bank_id
is used as a fallback.

Common uses:
  bank_id_template: hermes-{profile}            # isolate per Hermes profile
  bank_id_template: {workspace}-{profile}       # workspace + profile scoping
  bank_id_template: hermes-{user}               # per-user banks for gateway
2026-04-24 03:38:17 -07:00
Nicolò Boschi
f9c6c5ab84 fix(hindsight): scope document_id per process to avoid resume overwrite (#6602)
Reusing session_id as document_id caused data loss on /resume: when
the session is loaded again, _session_turns starts empty and the next
retain replaces the entire previously stored content.

Now each process lifecycle gets its own document_id formed as
{session_id}-{startup_timestamp}, so:
- Same session, same process: turns accumulate into one document (existing behavior)
- Resume (new process, same session): writes a new document, old one preserved
- Forks: child process gets its own document; parent's doc is untouched

Also adds session lineage tags so all processes for the same session
(or its parent) can still be filtered together via recall:
- session:<session_id> on every retain
- parent:<parent_session_id> when initialized with parent_session_id

Closes #6602
2026-04-24 03:38:17 -07:00
Teknium
3a86f70969 test(hindsight): update materialize-profile-env test for HINDSIGHT_TIMEOUT
The existing test_local_embedded_setup_materializes_profile_env expected
exact equality on ~/.hermes/.env content; the new HINDSIGHT_TIMEOUT=120
line from the timeout feature now appears in that file. Append it to the
expected string so the test reflects the new post_setup output.
2026-04-24 03:36:02 -07:00
tekgnosis-net
f1ba2f0c0b fix(hindsight): use configured timeout in _run_sync for all async operations
The previous commit added HINDSIGHT_TIMEOUT as a configurable env var,
but _run_sync still used the hardcoded _DEFAULT_TIMEOUT (120s). All
async operations (recall, retain, reflect, aclose) now go through an
instance method that uses self._timeout, so the configured value is
actually applied.

Also: added backward-compatible alias comment for the module-level
function.
2026-04-24 03:36:02 -07:00
tekgnosis-net
403c82b6b6 feat(hindsight): add configurable HINDSIGHT_TIMEOUT env var
The Hindsight Cloud API can take 30-40 seconds per request. The
hardcoded 30s timeout was too aggressive and caused frequent
timeout errors. This patch:

1. Adds HINDSIGHT_TIMEOUT environment variable (default: 120s)
2. Adds timeout to the config schema for setup wizard visibility
3. Uses the configurable timeout in both _run_sync() and client creation
4. Reads from config.json or env var, falling back to 120s default

This makes the timeout upgrade-proof — users can set it via env var
or config without patching source code.

Signed-off-by: Kumar <kumar@tekgnosis.net>
2026-04-24 03:36:02 -07:00
Jason Perlow
93a74f74bf fix(hindsight): preserve shared event loop across provider shutdowns
The module-global `_loop` / `_loop_thread` pair is shared across every
`HindsightMemoryProvider` instance in the process — the plugin loader
creates one provider per `AIAgent`, and the gateway creates one `AIAgent`
per concurrent chat session (Telegram/Discord/Slack/CLI).

`HindsightMemoryProvider.shutdown()` stopped the shared loop when any one
session ended. That stranded the aiohttp `ClientSession` and `TCPConnector`
owned by every sibling provider on a now-dead loop — they were never
reachable for close and surfaced as the `Unclosed client session` /
`Unclosed connector` warnings reported in #11923.

Fix: stop stopping the shared loop in `shutdown()`. Per-provider cleanup
still closes that provider's own client via `self._client.aclose()`. The
loop runs on a daemon thread and is reclaimed on process exit; keeping
it alive between provider shutdowns means sibling providers can drain
their own sessions cleanly.

Regression tests in `tests/plugins/memory/test_hindsight_provider.py`
(`TestSharedEventLoopLifecycle`):

- `test_shutdown_does_not_stop_shared_event_loop` — two providers share
  the loop; shutting down one leaves the loop live for the other. This
  test reproduces the #11923 leak on `main` and passes with the fix.
- `test_client_aclose_called_on_cloud_mode_shutdown` — each provider's
  own aiohttp session is still closed via `aclose()`.

Fixes #11923.
2026-04-24 03:34:12 -07:00
Teknium
b4c030025f chore(release): map Nicecsh in AUTHOR_MAP
Required by CI for the #15030 salvage — Nicecsh's commits
(cshong2017@outlook.com) carry their authorship into main.
2026-04-24 03:33:29 -07:00
Teknium
42d6ab5082 test(gateway): unify discord mock via shared conftest; drop duplicated mock in model_picker test
The cherry-picked model_picker test installed its own discord mock at
module-import time via a local _ensure_discord_mock(), overwriting
sys.modules['discord'] with a mock that lacked attributes other
gateway tests needed (Intents.default(), File, app_commands.Choice).
On pytest-xdist workers that collected test_discord_model_picker.py
first, the shared mock in tests/gateway/conftest.py got clobbered and
downstream tests failed with AttributeError / TypeError against
missing mock attrs. Classic sys.modules cross-test pollution (see
xdist-cross-test-pollution skill).

Fix:
- Extend the canonical _ensure_discord_mock() in tests/gateway/conftest.py
  to cover everything the model_picker test needs: real View/Select/
  Button/SelectOption classes (not MagicMock sentinels), an Embed
  class that preserves title/description/color kwargs for assertion,
  and Color.greyple.
- Strip the duplicated mock-setup block from test_discord_model_picker.py
  and rely on the shared mock that conftest installs at collection
  time.

Regression check:
  scripts/run_tests.sh tests/gateway/ tests/hermes_cli/ -k 'discord or model or copilot or provider' -o 'addopts='
  1291 passed (was 1288 passed + 3 xdist-ordered failures before this commit).
2026-04-24 03:33:29 -07:00
Nicecsh
fe34741f32 fix(model): repair Discord Copilot /model flow
Keep Discord Copilot model switching responsive and current by refreshing picker data from the live catalog when possible, correcting the curated fallback list, and clearing stale controls before the switch completes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 03:33:29 -07:00
Nicecsh
2e2de124af fix(aux): normalize GitHub Copilot provider slugs
Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 03:33:29 -07:00
LeonSGP43
df55660e3c fix(hindsight): disable broken local runtime on unsupported CPUs 2026-04-24 03:33:14 -07:00
kshitij
7897f65a94
fix(normalize): lowercase Xiaomi model IDs for case-insensitive config (#15066)
Xiaomi's API (api.xiaomimimo.com) requires lowercase model IDs like
"mimo-v2.5-pro" but rejects mixed-case names like "MiMo-V2.5-Pro"
that users copy from marketing docs or the ProviderEntry description.

Add _LOWERCASE_MODEL_PROVIDERS set and apply .lower() to model names
for providers in this set (currently just xiaomi) after stripping the
provider prefix. This ensures any case variant in config.yaml is
normalized before hitting the API.

Other providers (minimax, zai, etc.) are NOT affected — their APIs
accept mixed case (e.g. MiniMax-M2.7).
2026-04-24 03:33:05 -07:00
bwjoke
3e994e38f7 [verified] fix: materialize hindsight profile env during setup 2026-04-24 03:30:11 -07:00
JC的AI分身
127048e643 fix(hindsight): accept snake_case api_key config 2026-04-24 03:30:03 -07:00
harryplusplus
d6b65bbc47 fix(hindsight): preserve non-ASCII text in retained conversation turns 2026-04-24 03:29:58 -07:00
Chris Danis
a5c7422f23 fix(hindsight): always write HINDSIGHT_LLM_API_KEY to .env, even when empty
When user runs
  ✓ Memory provider: built-in only
  Saved to config.yaml and leaves the API key blank,
the old code skipped writing it entirely. This caused the uvx daemon
launcher to fail at startup because it couldn't distinguish between
"key not configured" and "explicitly blank key."

Now HINDSIGHT_LLM_API_KEY is always written to .env so the value
is either set or explicitly empty.
2026-04-24 03:29:53 -07:00
Teknium
3c0a728607
chore(release): map hindsight PR contributors in AUTHOR_MAP (#15070)
Adds AUTHOR_MAP entries for perlowja, tangyuanjc, harryplusplus
ahead of merging PRs #14109, #13153, #13090.
2026-04-24 03:29:46 -07:00
Teknium
339123481e chore(release): map ericnicolaides (wildcat.local commit email) in AUTHOR_MAP 2026-04-24 03:21:29 -07:00
WildCat Eng Manager
9e6f34a76e docs: document prompt_caching.cache_ttl in cli-config example
Made-with: Cursor
2026-04-24 03:21:29 -07:00
WildCat Eng Manager
7626f3702e feat: read prompt caching cache_ttl from config
- Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in)
- Document DEFAULT_CONFIG and developer guide example
- Add unit tests for default, 1h, and invalid TTL fallback

Made-with: Cursor
2026-04-24 03:21:29 -07:00
Teknium
9de555f3e3 chore(release): add 0xharryriddle to AUTHOR_MAP 2026-04-24 03:17:18 -07:00
Harry Riddle
ac25e6c99a feat(auth-codex): add config-provider fallback detection for logout in hermes-agent/hermes_cli/auth.py 2026-04-24 03:17:18 -07:00
Teknium
b2e124d082
refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047)
* refactor(commands): drop /provider and clean up slash registry

* refactor(commands): drop /plan special handler — use plain skill dispatch
2026-04-24 03:10:52 -07:00
Teknium
b29287258a
fix(aux-client): honor api_mode: anthropic_messages for named custom providers (#15059)
Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.

Two gaps caused this:

1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
   providers-dict branch (new-style) returned only name/base_url/api_key/
   model and dropped api_mode. The legacy custom_providers-list branch
   already propagated it correctly. The dict branch now parses and
   returns api_mode via _parse_api_mode() in both match paths.

2. agent/auxiliary_client.py::resolve_provider_client — the named
   custom provider block at ~L1740 ignored custom_entry['api_mode']
   and unconditionally built an OpenAI client (only wrapping for
   Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
   dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
   in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
   otherwise plain OpenAI. An explicit task-level api_mode override
   still wins over the provider entry's declared api_mode.

Fixes #15033

Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering

  - providers-dict preserves valid api_mode
  - invalid api_mode values are dropped
  - missing api_mode leaves the entry unchanged (no regression)
  - resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
    api_mode=anthropic_messages
  - full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
    with an auxiliary.<task> override
  - providers without api_mode still use the OpenAI-wire path
2026-04-24 03:10:30 -07:00
luyao618
bc15f526fb fix(agent): exclude prior-history tool messages from background review summary
Cherry-pick-of: 27b6a217b (PR #14967 by @luyao618)

Co-authored-by: luyao618 <364939526@qq.com>
2026-04-24 03:10:19 -07:00
Teknium
ba3284f34a chore(release): map salvage-batch contributors in AUTHOR_MAP
Adds three contributors whose commits land via this batch of salvage PRs:

- @mrunmayee17 (mrunmayeerane17@gmail.com) — Discord wildcard fix #14920
- @camaragon   (69489633+camaragon@users.noreply.github.com) — ACP MCP fix #14986
- @shamork     (shamork@outlook.com) — NO_PROXY bypass fix #14966

Required by CI, which rejects PRs with unmapped personal emails.
2026-04-24 03:04:42 -07:00
Teknium
f24956ba12 fix(resume): redirect --resume to the descendant that actually holds the messages
When context compression fires mid-session, run_agent's _compress_context
ends the current session, creates a new child session linked by
parent_session_id, and resets the SQLite flush cursor. New messages land
in the child; the parent row ends up with message_count = 0. A user who
runs 'hermes --resume <original_id>' sees a blank chat even though the
transcript exists — just under a descendant id.

PR #12920 already fixed the exit banner to print the live descendant id
at session end, but that didn't help users who resume by a session id
captured BEFORE the banner update (scripts, sessions list, old terminal
scrollback) or who type the parent id manually.

Fix: add SessionDB.resolve_resume_session_id() which walks the
parent→child chain forward and returns the first descendant with at
least one message row. Wire it into all three resume entry points:

  - HermesCLI._preload_resumed_session() (early resume at run() time)
  - HermesCLI._init_agent() (the classical resume path)
  - /resume slash command

Semantics preserved when the chain has no descendants with messages,
when the requested session already has messages, or when the id is
unknown. A depth cap of 32 guards against malformed loops.

This does NOT concatenate the pre-compression parent transcript into
the child — the whole point of compression is to shrink that, so
replaying it would blow the cache budget we saved. We just jump to
the post-compression child. The summary already reflects what was
compressed away.

Tests: tests/hermes_state/test_resolve_resume_session_id.py covers
  - the exact 6-session shape from the issue
  - passthrough when session has messages / no descendants
  - passthrough for nonexistent / empty / None input
  - middle-of-chain redirects
  - fork resolution (prefers most-recent child)

Closes #15000
2026-04-24 03:04:42 -07:00
Teknium
166b960fe4 test(proxy): regression tests for NO_PROXY bypass on keepalive client
Pin the behaviour added in the preceding commit — `_get_proxy_for_base_url()`
must return None for hosts covered by NO_PROXY and the HTTPS_PROXY otherwise,
and the full `_create_openai_client()` path must NOT mount HTTPProxy for a
NO_PROXY host.

Refs: #14966
2026-04-24 03:04:42 -07:00
shamork
cbc39a8672 fix(proxy): honor no_proxy for local custom endpoints 2026-04-24 03:04:42 -07:00
Cameron Aragon
dfc5563641 fix(acp): include MCP toolsets in ACP sessions 2026-04-24 03:04:42 -07:00
Teknium
8a1e247c6c fix(discord): honor wildcard '*' in ignored_channels and free_response_channels
Follow-up to the allowed_channels wildcard fix in the preceding commit.
The same '*' literal trap affected two other Discord channel config lists:

- DISCORD_IGNORED_CHANNELS: '*' was stored as the literal string in the
  ignored set, and the intersection check never matched real channel IDs,
  so '*' was a no-op instead of silencing every channel.
- DISCORD_FREE_RESPONSE_CHANNELS: same shape — '*' never matched, so
  the bot still required a mention everywhere.

Add a '*' short-circuit to both checks, matching the allowed_channels
semantics. Extend tests/gateway/test_discord_allowed_channels.py with
regression coverage for all three lists.

Refs: #14920
2026-04-24 03:04:42 -07:00
Mrunmayee Rane
8598746e86 fix(discord): honor wildcard '*' in DISCORD_ALLOWED_CHANNELS
allowed_channels: "*" in config (or DISCORD_ALLOWED_CHANNELS="*" env var)
is meant to allow all channels, but the check was comparing numeric channel
IDs against the literal string set {"*"} via set intersection — always empty,
so every message was silently dropped.

Add a "*" short-circuit before the set intersection, consistent with every
other platform's allowlist handling (Signal, Slack, Telegram all do this).

Fixes #14920
2026-04-24 03:04:42 -07:00
Teknium
f58a16f520
fix(auth): apply verify= to Codex OAuth /models probe (#15049)
Follow-up to PR #14533 — applies the same _resolve_requests_verify()
treatment to the one requests.get() site the PR missed (Codex OAuth
chatgpt.com /models probe). Keeps all seven requests.get() callsites
in model_metadata.py consistent so HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE /
SSL_CERT_FILE are honored everywhere.

Co-authored-by: teknium1 <teknium@hermes-agent>
2026-04-24 03:02:24 -07:00
Teknium
621fd348dc chore(release): add ReginaldasR to AUTHOR_MAP 2026-04-24 03:02:16 -07:00
Reginaldas
3e10f339fd fix(providers): send user agent to routermint endpoints 2026-04-24 03:02:16 -07:00
Teknium
5fdba79eb4 chore(release): add keiravoss94 AUTHOR_MAP entry 2026-04-24 03:02:03 -07:00
Keira Voss
2ba9b29f37 docs(plugins): correct pre_gateway_dispatch doc text and add hooks.md section
Follow-up to aeff6dfe:

- Fix semantic error in VALID_HOOKS inline comment ("after core auth" ->
  "before auth"). Hook intentionally runs BEFORE auth so plugins can
  handle unauthorized senders without triggering the pairing flow.
- Fix wrong class name in the same comment (HermesGateway ->
  GatewayRunner, matching gateway/run.py).
- Add a full ### pre_gateway_dispatch section in
  website/docs/user-guide/features/hooks.md (matches the pattern of
  every other plugin hook: signature, params table, fires-where,
  return-value table, use cases, two worked examples) plus a row in
  the quick-reference table.
- Add the anchor link on the plugins.md table row so it matches the
  other hook entries.

No code behavior change.
2026-04-24 03:02:03 -07:00
Keira Voss
1ef1e4c669 feat(plugins): add pre_gateway_dispatch hook
Introduces a new plugin hook `pre_gateway_dispatch` fired once per
incoming MessageEvent in `_handle_message`, after the internal-event
guard but before the auth / pairing chain. Plugins may return a dict
to influence flow:

    {"action": "skip",    "reason": "..."}  -> drop (no reply)
    {"action": "rewrite", "text":   "..."}  -> replace event.text
    {"action": "allow"}  /  None             -> normal dispatch

Motivation: gateway-level message-flow patterns that don't fit cleanly
into any single adapter — e.g. listen-only group-chat windows (buffer
ambient messages, collapse on @mention), or human-handover silent
ingest (record messages while an owner handles the chat manually).
Today these require forking core; with this hook they can live in a
single profile-agnostic plugin.

Hook runs BEFORE auth so plugins can handle unauthorized senders
(e.g. customer-service handover ingest) without triggering the
pairing-code flow. Exceptions in plugin callbacks are caught and
logged; the first non-None action dict wins, remaining results are
ignored.

Includes:
- `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py`
- Invocation block in `gateway/run.py::_handle_message`
- 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py`
  (skip, rewrite, allow, exception safety, internal-event bypass)
- 2 additional tests in `tests/hermes_cli/test_plugins.py`
- Table entry in `website/docs/user-guide/features/plugins.md`

Made-with: Cursor
2026-04-24 03:02:03 -07:00
0xbyt4
8aa37a0cf9 fix(auth): honor SSL CA env vars across httpx + requests callsites
- hermes_cli/auth.py: add _default_verify() with macOS Homebrew certifi
  fallback (mirrors weixin 3a0ec1d93). Extend env var chain to include
  REQUESTS_CA_BUNDLE so one env var works across httpx + requests paths.
- agent/model_metadata.py: add _resolve_requests_verify() reading
  HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE in priority
  order. Apply explicit verify= to all 6 requests.get callsites.
- Tests: 18 new unit tests + autouse platform pin on existing
  TestResolveVerifyFallback to keep its "returns True" assertions
  platform-independent.

Empirically verified against self-signed HTTPS server: requests honors
REQUESTS_CA_BUNDLE only; httpx honors SSL_CERT_FILE only. Hermes now
honors all three everywhere.

Triggered by Discord reports — Nous OAuth SSL failure on macOS
Homebrew Python; custom provider self-signed cert ignored despite
REQUESTS_CA_BUNDLE set in env.
2026-04-24 03:00:33 -07:00
Teknium
b0cb81a089 fix(auth): route alibaba_coding* aliases through resolve_provider
The aliases were added to hermes_cli/providers.py but auth.py has its own
_PROVIDER_ALIASES table inside resolve_provider() that is consulted before
PROVIDER_REGISTRY lookup. Without this, provider: alibaba_coding in
config.yaml (the exact repro from #14940) raised 'Unknown provider'.

Mirror the three aliases into auth.py so resolve_provider() accepts them.
2026-04-24 02:59:32 -07:00
ygd58
727d1088c4 fix(providers): register alibaba-coding-plan as a first-class provider
The alibaba-coding-plan provider (coding-intl.dashscope.aliyuncs.com/v1)
was not registered in providers.py or auth.py. When users set
provider: alibaba_coding or provider: alibaba-coding-plan in config.yaml,
Hermes could not resolve the credentials and fell back to OpenRouter
or rejected the request with HTTP 401/402 (issue #14940).

Changes:
- providers.py: add HermesOverlay for alibaba-coding-plan with
  ALIBABA_CODING_PLAN_BASE_URL env var support
- providers.py: add aliases alibaba_coding, alibaba-coding,
  alibaba_coding_plan -> alibaba-coding-plan
- auth.py: add ProviderConfig for alibaba-coding-plan with:
  - inference_base_url: https://coding-intl.dashscope.aliyuncs.com/v1
  - api_key_env_vars: ALIBABA_CODING_PLAN_API_KEY, DASHSCOPE_API_KEY

Fixes #14940
2026-04-24 02:59:32 -07:00
Teknium
a9a4416c7c
fix(compress): don't reach into ContextCompressor privates from /compress (#15039)
Manual /compress crashed with 'LCMEngine' object has no attribute
'_align_boundary_forward' when any context-engine plugin was active.
The gateway handler reached into _align_boundary_forward and
_find_tail_cut_by_tokens on tmp_agent.context_compressor, but those
are ContextCompressor-specific — not part of the generic ContextEngine
ABC — so every plugin engine (LCM, etc.) raised AttributeError.

- Add optional has_content_to_compress(messages) to ContextEngine ABC
  with a safe default of True (always attempt).
- Override it in the built-in ContextCompressor using the existing
  private helpers — preserves exact prior behavior for 'compressor'.
- Rewrite gateway /compress preflight to call the ABC method, deleting
  the private-helper reach-in.
- Add focus_topic to the ABC compress() signature. Make _compress_context
  retry without focus_topic on TypeError so older strict-sig plugins
  don't crash on manual /compress <focus>.
- Regression test with a fake ContextEngine subclass that only
  implements the ABC (mirrors LCM's surface).

Reported by @selfhostedsoul (Discord, Apr 22).
2026-04-24 02:55:43 -07:00
Teknium
4350668ae4 fix(transcription): fall back to CPU when CUDA runtime libs are missing
faster-whisper's device="auto" picks CUDA when ctranslate2's wheel
ships CUDA shared libs, even on hosts without the NVIDIA runtime
(libcublas.so.12 / libcudnn*). On those hosts the model often loads
fine but transcribe() fails at first dlopen, and the broken model
stays cached in the module-global — every subsequent voice message
in the gateway process fails identically until restart.

- Add _load_local_whisper_model() wrapper: try auto, catch missing-lib
  errors, retry on device=cpu compute_type=int8.
- Wrap transcribe() with the same fallback: evict cached model, reload
  on CPU, retry once. Required because the dlopen failure only surfaces
  at first kernel launch, not at model construction.
- Narrow marker list (libcublas, libcudnn, libcudart, 'cannot be loaded',
  'no kernel image is available', 'no CUDA-capable device', driver
  mismatch). Deliberately excludes 'CUDA out of memory' and similar —
  those are real runtime failures that should surface, not be silently
  retried on CPU.
- Tests for load-time fallback, runtime fallback (with cached-model
  eviction verified), and the OOM non-fallback path.

Reported via Telegram voice-message dumps on WSL2 hosts where libcublas
isn't installed by default.
2026-04-24 02:50:14 -07:00
Teknium
34c3e67109
fix: sanitize tool schemas for llama.cpp backends; restore MCP in TUI (#15032)
Local llama.cpp servers (e.g. ggml-org/llama.cpp:full-cuda) fail the entire
request with HTTP 400 'Unable to generate parser for this template. ...
Unrecognized schema: "object"' when any tool schema contains shapes its
json-schema-to-grammar converter can't handle:

  * 'type': 'object' without 'properties'
  * bare string schema values ('additionalProperties: "object"')
  * 'type': ['X', 'null'] arrays (nullable form)

Cloud providers accept these silently, so they ship from external MCP
servers (Atlassian, GCloud, Datadog) and from a couple of our own tools.

Changes

- tools/schema_sanitizer.py: walks the finalized tool list right before it
  leaves get_tool_definitions() and repairs the hostile shapes in a deep
  copy. No-op on well-formed schemas. Recurses into properties, items,
  additionalProperties, anyOf/oneOf/allOf, and $defs.
- model_tools.get_tool_definitions(): invoke the sanitizer as the last
  step so all paths (built-in, MCP, plugin, dynamically-rebuilt) get
  covered uniformly.
- tools/browser_cdp_tool.py, tools/mcp_tool.py: fix our own bare-object
  schemas so sanitization isn't load-bearing for in-repo tools.
- tui_gateway/server.py: _load_enabled_toolsets() was passing
  include_default_mcp_servers=False at runtime. That's the config-editing
  variant (see PR #3252) — it silently drops every default MCP server
  from the TUI's enabled_toolsets, which is why the TUI didn't hit the
  llama.cpp crash (no MCP tools sent at all). Switch to True so TUI
  matches CLI behavior.

Tests

tests/tools/test_schema_sanitizer.py (17 tests) covers the individual
failure modes, well-formed pass-through, deep-copy isolation, and
required-field pruning.

E2E: loaded the default 'hermes-cli' toolset with MCP discovery and
confirmed all 27 resolved tool schemas pass a llama.cpp-compatibility
walk (no 'object' node missing 'properties', no bare-string schema
values).
2026-04-24 02:44:46 -07:00
brooklyn!
5dda4cab41
Merge pull request #14968 from NousResearch/bb/tui-section-visibility
feat(tui): per-section visibility for the details accordion
2026-04-24 03:02:26 -05:00
Brooklyn Nicholson
6604e94c75 fix(tui): gate messageLine on content-bearing sections, not all sections
Round-2 Copilot review on #14968 caught two leftover spots that didn't
fully respect per-section overrides:

- messageLine.tsx (trail branch): the previous fix gated on
  `SECTION_NAMES.some(...)`, which stayed true whenever any section was
  visible.  With `thinking: 'expanded'` as the new built-in default,
  that meant `display.sections.tools: hidden` left an empty wrapper Box
  alive for trail messages.  Now gates on the actual content-bearing
  sections for a trail message — `tools` OR `activity` — so a
  tools-hidden config drops the wrapper cleanly.

- messageLine.tsx (showDetails): still keyed off the global
  `detailsMode !== 'hidden'`, so per-section overrides like
  `sections.thinking: expanded` couldn't escape global hidden for
  assistant messages with reasoning + tool metadata.  Recomputed via
  resolved per-section modes (`thinkingMode`/`toolsMode`).

- types.ts: rewrote the SectionVisibility doc comment to reflect the
  actual resolution order (explicit override → SECTION_DEFAULTS →
  global), so the docstring stops claiming "missing keys fall back to
  the global mode" when SECTION_DEFAULTS now layers in between.

All three lookups (thinking/tools/activity) are computed once at the
top of MessageLine and shared by every branch.
2026-04-24 03:01:06 -05:00
Brooklyn Nicholson
67bfd4b828 feat(tui): stream thinking + tools expanded by default
Extends SECTION_DEFAULTS so the out-of-the-box TUI shows the turn as
a live transcript (reasoning + tool calls streaming inline) instead of
a wall of `▸` chevrons the user has to click every turn.

Final default matrix:

  - thinking: expanded
  - tools:    expanded
  - activity: hidden    (unchanged from the previous commit)
  - subagents: falls through to details_mode (collapsed by default)

Everything explicit in `display.sections` still wins, so anyone who
already pinned an override keeps their layout.  One-line revert is
`display.sections.<name>: collapsed`.
2026-04-24 02:53:44 -05:00
Brooklyn Nicholson
70925363b6 fix(tui): per-section overrides escape global details_mode: hidden
Copilot review on #14968 caught that the early returns gated on the
global `detailsMode === 'hidden'` short-circuited every render path
before sectionMode() got a chance to apply per-section overrides — so
`details_mode: hidden` + `sections.tools: expanded` was silently a no-op.

Three call sites had the same bug shape; all now key off the resolved
section modes:

- ToolTrail: replace the `detailsMode === 'hidden'` early return with
  an `allHidden = every section resolved to hidden` check.  When that's
  true, fall back to the floating-alert backstop (errors/warnings) so
  quiet-mode users aren't blind to ambient failures, and update the
  comment block to match the actual condition.

- messageLine.tsx: drop the same `detailsMode === 'hidden'` pre-check
  on `msg.kind === 'trail'`; only skip rendering the wrapper when every
  section resolves to hidden (`SECTION_NAMES.some(...) !== 'hidden'`).

- useMainApp.ts: rebuild `showProgressArea` around `anyPanelVisible`
  instead of branching on the global mode.  This also fixes the
  suppressed Copilot concern about an empty wrapper Box rendering above
  the streaming area when ToolTrail returns null.

Regression test in details.test.ts pins the override-escapes-hidden
behaviour for tools/thinking/activity.  271/271 vitest, lints clean.
2026-04-24 02:49:58 -05:00
Brooklyn Nicholson
005cc29e98 refactor(tui): /clean pass on per-section visibility plumbing
- domain/details: extract `norm()`, fold parseDetailsMode + resolveSections
  into terser functional form, reject array values for resolveSections
- slash /details: destructure tokens, factor reset/mode into one dispatch,
  drop DETAIL_MODES set + DetailsMode/SectionName imports (parseDetailsMode
  + isSectionName narrow + return), centralize usage strings
- ToolTrail: collapse 4 separate xxxSection vars into one memoized
  `visible` map; effect deps stabilize on the memo identity instead of
  4 primitives
2026-04-24 02:42:03 -05:00
Brooklyn Nicholson
728767e910 feat(tui): hide the activity panel by default
The activity panel (gateway hints, terminal-parity nudges, background
notifications) is noise for the typical day-to-day user, who only cares
about thinking + tools + streamed content.  Make `hidden` the built-in
default for that section so users land on the quiet mode out of the box.

Tool failures still render inline on the failing tool row, so this
default suppresses the noise feed without losing the signal.

Opt back in with `display.sections.activity: collapsed` (chevron) or
`expanded` (always open) in `~/.hermes/config.yaml`, or live with
`/details activity collapsed`.

Implementation: SECTION_DEFAULTS in domain/details.ts, applied as the
fallback in `sectionMode()` between the explicit override and the
global details_mode.  Existing `display.sections.activity` overrides
take precedence — no migration needed for users who already set it.
2026-04-24 02:37:42 -05:00
Brooklyn Nicholson
78481ac124 feat(tui): per-section visibility for the details accordion
Adds optional per-section overrides on top of the existing global
details_mode (hidden | collapsed | expanded).  Lets users keep the
accordion collapsed by default while auto-expanding tools, or hide the
activity panel entirely without touching thinking/tools/subagents.

Config (~/.hermes/config.yaml):

    display:
      details_mode: collapsed
      sections:
        thinking: expanded
        tools:    expanded
        activity: hidden

Slash command:

  /details                              show current global + overrides
  /details [hidden|collapsed|expanded]  set global mode (existing)
  /details <section> <mode|reset>       per-section override (new)
  /details <section> reset              clear override

Sections: thinking, tools, subagents, activity.

Implementation:

- ui-tui/src/types.ts             SectionName + SectionVisibility
- ui-tui/src/domain/details.ts    parseSectionMode / resolveSections /
                                  sectionMode + SECTION_NAMES
- ui-tui/src/app/uiStore.ts +
  app/interfaces.ts +
  app/useConfigSync.ts            sections threaded into UiState
- ui-tui/src/components/
  thinking.tsx                    ToolTrail consults per-section mode for
                                  hidden/expanded behaviour; expandAll
                                  skips hidden sections; floating-alert
                                  fallback respects activity:hidden
- ui-tui/src/components/
  messageLine.tsx + appLayout.tsx pass sections through render tree
- ui-tui/src/app/slash/
  commands/core.ts                /details <section> <mode|reset> syntax
- tui_gateway/server.py           config.set details_mode.<section>
                                  writes to display.sections.<section>
                                  (empty value clears the override)
- website/docs/user-guide/tui.md  documented

Tests: 14 new (4 domain, 4 useConfigSync, 3 slash, 3 gateway).
Total: 269/269 vitest, all gateway tests pass.
2026-04-24 02:34:32 -05:00
Teknium
6051fba9dc
feat(banner): hyperlink startup banner title to latest GitHub release (#14945)
Wrap the existing version label in the welcome-banner panel title
('Hermes Agent v… · upstream … · local …') with an OSC-8 terminal
hyperlink pointing at the latest git tag's GitHub release page
(https://github.com/NousResearch/hermes-agent/releases/tag/<tag>).

Clickable in modern terminals (iTerm2, WezTerm, Windows Terminal,
GNOME Terminal, Kitty, etc.); degrades to plain text on terminals
without OSC-8 support. No new line added to the banner.

New get_latest_release_tag() helper runs 'git describe --tags
--abbrev=0' in the Hermes checkout (3s timeout, per-process cache,
silent fallback for non-git/pip installs and forks without tags).
2026-04-23 23:28:34 -07:00
Teknium
2acc8783d1
fix(errors): classify OpenRouter privacy-guardrail 404s distinctly (#14943)
OpenRouter returns a 404 with the specific message

  'No endpoints available matching your guardrail restrictions and data
   policy. Configure: https://openrouter.ai/settings/privacy'

when a user's account-level privacy setting excludes the only endpoint
serving a model (e.g. DeepSeek V4 Pro, which today is hosted only by
DeepSeek's own endpoint that may log inputs).

Before this change we classified it as model_not_found, which was
misleading (the model exists) and triggered provider fallback (useless —
the same account setting applies to every OpenRouter call).

Now it classifies as a new FailoverReason.provider_policy_blocked with
retryable=False, should_fallback=False.  The error body already contains
the fix URL, so the user still gets actionable guidance.
2026-04-23 23:26:29 -07:00
brooklyn!
acdcb167fb
fix(tui): harden terminal dimming and multiplexer copy (#14906)
- disable ANSI dim on VTE terminals by default so dark-background reasoning and accents stay readable
- suppress local multiplexer OSC52 echo while preserving remote passthrough and add regression coverage
2026-04-23 22:46:28 -07:00
Teknium
51f4c9827f
fix(context): resolve real Codex OAuth context windows (272k, not 1M) (#14935)
On ChatGPT Codex OAuth every gpt-5.x slug actually caps at 272,000 tokens,
but Hermes was resolving gpt-5.5 / gpt-5.4 to 1,050,000 (from models.dev)
because openai-codex aliases to the openai entry there. At 1.05M the
compressor never fires and requests hard-fail with 'context window
exceeded' around the real 272k boundary.

Verified live against chatgpt.com/backend-api/codex/models:
  gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex,
  gpt-5.2, gpt-5.1-codex-max → context_window = 272000

Changes:
- agent/model_metadata.py:
  * _fetch_codex_oauth_context_lengths() — probe the Codex /models
    endpoint with the OAuth bearer token and read context_window per
    slug (1h in-memory TTL).
  * _resolve_codex_oauth_context_length() — prefer the live probe,
    fall back to hardcoded _CODEX_OAUTH_CONTEXT_FALLBACK (all 272k).
  * Wire into get_model_context_length() when provider=='openai-codex',
    running BEFORE the models.dev lookup (which returns 1.05M). Result
    persists via save_context_length() so subsequent lookups skip the
    probe entirely.
  * Fixed the now-wrong comment on the DEFAULT_CONTEXT_LENGTHS gpt-5.5
    entry (400k was never right for Codex; it's the catch-all for
    providers we can't probe live).

Tests (4 new in TestCodexOAuthContextLength):
- fallback table used when no token is available (no models.dev leakage)
- live probe overrides the fallback
- probe failure (non-200) falls back to hardcoded 272k
- non-codex providers (openrouter, direct openai) unaffected

Non-codex context resolution is unchanged — the Codex branch only fires
when provider=='openai-codex'.
2026-04-23 22:39:47 -07:00
Teknium
2e78a2b6b2
feat(models): add deepseek-v4-pro and deepseek-v4-flash (#14934)
- OpenRouter: deepseek/deepseek-v4-pro, deepseek/deepseek-v4-flash
- Nous Portal (fallback list): same two slugs
- Native DeepSeek provider: bare deepseek-v4-pro, deepseek-v4-flash
  alongside existing deepseek-chat/deepseek-reasoner

Context length resolves via existing 'deepseek' substring entry (128K)
in DEFAULT_CONTEXT_LENGTHS.
2026-04-23 22:35:04 -07:00
Teknium
5a1c599412
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)

Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.

Supersedes #12550.

No code changes in this commit.

* feat(browser): add persistent CDP supervisor for dialog + frame detection

Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.

Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.

Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.

Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.

Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.

E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.

No agent-facing tool wiring in this commit (comes next).

* feat(browser): add browser_dialog tool wired to CDP supervisor

Agent-facing response-only tool. Schema:
  action: 'accept' | 'dismiss' (required)
  prompt_text: response for prompt() dialogs (optional)
  dialog_id: disambiguate when multiple dialogs queued (optional)

Handler:
  SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)

check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.

Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.

* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot

Supervisor lifecycle:
  * _get_session_info lazy-starts the supervisor after a session row is
    materialized — covers every backend code path (Browserbase, cdp_url
    override, /browser connect, future providers) with one hook.
  * cleanup_browser(task_id) stops the supervisor for that task first
    (before the backend tears down CDP).
  * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
  * /browser connect eagerly starts the supervisor for task 'default'
    so the first snapshot already shows pending_dialogs.
  * /browser disconnect stops the supervisor.

CDP URL resolution for the supervisor:
  1. BROWSER_CDP_URL / browser.cdp_url override.
  2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).

browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.

Config defaults:
  * browser.dialog_policy: 'must_respond' (new)
  * browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.

Deadlock fix in supervisor event dispatch:
  * _on_dialog_opening and _on_target_attached used to await CDP calls
    while the reader was still processing an event — but only the reader
    can set the response Future, so the call timed out.
  * Both now fire asyncio.create_task(...) so the reader stays pumping.
  * auto_dismiss/auto_accept now actually close the dialog immediately.

Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
  * supervisor start/snapshot
  * main-frame alert detection + dismiss
  * iframe.contentWindow alert
  * prompt() with prompt_text reply
  * respond with no pending dialog -> clean error
  * auto_dismiss clears on event
  * registry idempotency
  * registry stop -> snapshot reports inactive
  * browser_dialog tool no-supervisor error
  * browser_dialog invalid action
  * browser_dialog end-to-end via tool handler

xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.

* docs(browser): document browser_dialog tool + CDP supervisor

- user-guide/features/browser.md: new browser_dialog section with
  workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
  bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
  toolset row with note on pending_dialogs / frame_tree snapshot fields

Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).

* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility

Found via Browserbase E2E test that revealed two production-critical issues:

1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
   CDP proxy tears down our long-lived WebSocket whenever a short-lived
   client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
   Fixed with a reconnecting _run loop that re-attaches with exponential
   backoff on drops. _page_session_id and _child_sessions are reset on each
   reconnect; pending_dialogs and frames are preserved across reconnects.

2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
   Playwright-based CDP proxy dismisses alert/confirm/prompt before our
   Page.handleJavaScriptDialog call can respond. So pending_dialogs is
   empty by the time the agent reads a snapshot on Browserbase.

   Added a recent_dialogs ring buffer (capacity 20) that retains a
   DialogRecord for every dialog that opened, with a closed_by tag:
     * 'agent'       — agent called browser_dialog
     * 'auto_policy' — local auto_dismiss/auto_accept fired
     * 'watchdog'    — must_respond timeout auto-dismissed (300s default)
     * 'remote'      — browser/backend closed it on us (Browserbase)

   Agents on Browserbase now see the dialog history with closed_by='remote'
   so they at least know a dialog fired, even though they couldn't respond.

3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
   'message' field (CDP spec has only 'result' and 'userInput') but our
   _on_dialog_closed was matching on message. Fixed to match by session_id
   + oldest-first, with a safety assumption that only one dialog is in
   flight per session (the JS thread is blocked while a dialog is up).

Docs + tests updated:
  * browser.md: new availability matrix showing the three backends and
    which mode (pending / recent / response) each supports
  * developer-guide/browser-supervisor.md: three-field snapshot schema
    with closed_by semantics
  * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
    passing against real Chrome)

E2E verified both backends:
  * Local Chrome via /browser connect: detect + respond full workflow
    (smoke_supervisor.py all 7 scenarios pass)
  * Browserbase: detect via recent_dialogs with closed_by='remote'
    (smoke_supervisor_browserbase_v2.py passes)

Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.

* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)

Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.

The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.

Flow when a page calls alert('hi'):
  1. window.alert override intercepts, builds XHR GET to
     http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
  2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
  3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
     it as a pending dialog with bridge_request_id set
  4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
  5. Supervisor calls Fetch.fulfillRequest with JSON body:
     {accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
  6. The injected script parses the body, returns the appropriate value
     from the override (undefined for alert, bool for confirm, string|null
     for prompt)

This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.

Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.

Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).

E2E VERIFIED:
  * Local Chrome: 13/13 pytest tests green (12 original + new
    test_bridge_captures_prompt_and_returns_reply_text that asserts
    window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
  * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
    - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
    - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
      → page.prompt_ret === 'AGENT-REPLY' ✓
    - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
    - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓

Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.

* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)

Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).

Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.

Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.

Agent workflow:
  1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
  2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
                 params={'expression': 'document.title', 'returnByValue': True})
  3. Supervisor dispatches the call on the OOPIF's child session

Supervisor state fixes needed along the way:
  * _on_frame_detached now skips reason='swap' (frame migrating processes)
  * _on_frame_detached also skips when the frame is an OOPIF with a live
    child session — Browserbase fires spurious remove events when a
    same-origin iframe gets promoted to OOPIF
  * _on_target_detached clears cdp_session_id but KEEPS the frame record
    so the agent still sees the OOPIF in frame_tree during transient
    session flaps

E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
  browser_cdp(method='Runtime.evaluate',
              params={'expression': 'document.title', 'returnByValue': True},
              frame_id=<OOPIF>)
  → {'success': True, 'result': {'value': 'Example Domain'}}

  The iframe is <iframe src='https://example.com/'> inside a top-level
  data: URL page on a real Browserbase session. The agent Runtime.evaluates
  INSIDE the cross-origin iframe and gets example.com's title back.

Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
  * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
    verifies routing via supervisor, Runtime.evaluate returns 1+1=2
  * test_browser_cdp_frame_id_missing_supervisor — clean error when no
    supervisor attached
  * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
    frame_id

Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.

* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process

When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:

  * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
  * Chrome with --site-per-process so the cross-origin iframe becomes a
    real OOPIF in its own process
  * Navigate, find OOPIF in supervisor.frame_tree, call
    browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
    through the supervisor's child session
  * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
    inner page, retrieved via OOPIF eval)

PASSED on 2026-04-23.

Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.

chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.

Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.

* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count

Pre-merge docs audit revealed two gaps:

1. user-guide/configuration.md browser config example was missing the
   two new dialog_* knobs. Added with a short table explaining
   must_respond / auto_dismiss / auto_accept semantics and a link to
   the feature page for the full workflow.

2. reference/tools-reference.md header said '54 built-in tools' — real
   count on main is 54, this branch adds browser_dialog so it's 55.
   Fixed the header.  (browser count was already correctly bumped
   11 -> 12 in the earlier docs commit.)

No code changes.
2026-04-23 22:23:37 -07:00
Teknium
0f6eabb890
docs(website): dedicated page per bundled + optional skill (#14929)
Generates a full dedicated Docusaurus page for every one of the 132 skills
(73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/.
Each page carries the skill's description, metadata (version, author, license,
dependencies, platform gating, tags, related skills cross-linked to their own
pages), and the complete SKILL.md body that Hermes loads at runtime.

Previously the two catalog pages just listed skills with a one-line blurb and
no way to see what the skill actually did — users had to go read the source
repo. Now every skill has a browsable, searchable, cross-linked reference in
the docs.

- website/scripts/generate-skill-docs.py — generator that reads skills/ and
  optional-skills/, writes per-skill pages, regenerates both catalog indexes,
  and rewrites the Skills section of sidebars.ts. Handles MDX escaping
  (outside fenced code blocks: curly braces, unsafe HTML-ish tags) and
  rewrites relative references/*.md links to point at the GitHub source.
- website/docs/reference/skills-catalog.md — regenerated; each row links to
  the new dedicated page.
- website/docs/reference/optional-skills-catalog.md — same.
- website/sidebars.ts — Skills section now has Bundled / Optional subtrees
  with one nested category per skill folder.
- .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator
  before docusaurus build so CI stays in sync with the source SKILL.md files.

Build verified locally with `npx docusaurus build`. Only remaining warnings
are pre-existing broken link/anchor issues in unrelated pages.
2026-04-23 22:22:11 -07:00
Austin Pickett
809868e628 feat: refac 2026-04-24 01:04:19 -04:00
Teknium
eb93f88e1d chore(release): add MattMaximo to AUTHOR_MAP for PR #10450 salvage 2026-04-23 22:01:24 -07:00
Matt Maximo
3ccda2aa05 fix(mcp): seed protocol header before HTTP initialize 2026-04-23 22:01:24 -07:00
Austin Pickett
e5d2815b41 feat: add sidebar 2026-04-24 00:56:19 -04:00
Teknium
983bbe2d40
feat(skills): add design-md skill for Google's DESIGN.md spec (#14876)
* feat(config): make tool output truncation limits configurable

Port from anomalyco/opencode#23770: expose a new `tool_output` config
section so users can tune the hardcoded truncation caps that apply to
terminal output and read_file pagination.

Three knobs under `tool_output`:
- max_bytes (default 50_000) — terminal stdout/stderr cap
- max_lines (default 2000) — read_file pagination cap
- max_line_length (default 2000) — per-line cap in line-numbered view

All three keep their existing hardcoded values as defaults, so behaviour
is unchanged when the section is absent. Power users on big-context
models can raise them; small-context local models can lower them.

Implementation:
- New `tools/tool_output_limits.py` reads the section with defensive
  fallback (missing/invalid values → defaults, never raises).
- `tools/terminal_tool.py` MAX_OUTPUT_CHARS now comes from
  get_max_bytes().
- `tools/file_operations.py` normalize_read_pagination() and
  _add_line_numbers() now pull the limits at call time.
- `hermes_cli/config.py` DEFAULT_CONFIG gains the `tool_output` section
  so `hermes setup` writes defaults into fresh configs.
- Docs page `user-guide/configuration.md` gains a "Tool Output
  Truncation Limits" section with large-context and small-context
  example configs.

Tests (18 new in tests/tools/test_tool_output_limits.py):
- Default resolution with missing / malformed / non-dict config.
- Full and partial user overrides.
- Coercion of bad values (None, negative, wrong type, str int).
- Shortcut accessors delegate correctly.
- DEFAULT_CONFIG exposes the section with the right defaults.
- Integration: normalize_read_pagination clamps to the configured
  max_lines.

* feat(skills): add design-md skill for Google's DESIGN.md spec

Built-in skill under skills/creative/ that teaches the agent to author,
lint, diff, and export DESIGN.md files — Google's open-source
(Apache-2.0) format for describing a visual identity to coding agents.

Covers:
- YAML front matter + markdown body anatomy
- Full token schema (colors, typography, rounded, spacing, components)
- Canonical section order + duplicate-heading rejection
- Component property whitelist + variants-as-siblings pattern
- CLI workflow via 'npx @google/design.md' (lint/diff/export/spec)
- Lint rule reference including WCAG contrast checks
- Common YAML pitfalls (quoted hex, negative dimensions, dotted refs)
- Starter template at templates/starter.md

Package verified live on npm (@google/design.md@0.1.1).
2026-04-23 21:51:19 -07:00
Teknium
379b2273d9
fix(mcp): route stdio subprocess stderr to log file, not user TTY (#14901)
MCP stdio servers' stderr was being dumped directly onto the user's
terminal during hermes launch. Servers like FastMCP-based ones print a
large ASCII banner at startup; slack-mcp-server emits JSON logs; etc.
With prompt_toolkit / Rich rendering the TUI concurrently, these
unsolicited writes corrupt the terminal state — hanging the session
~80% of the time for one user with Google Ads Tools + slack-mcp
configured, forcing Ctrl+C and restart loops.

Root cause: `stdio_client(server_params)` in tools/mcp_tool.py was
called without `errlog=`, and the SDK's default is `sys.stderr` —
i.e. the real parent-process stderr, which is the TTY.

Fix: open a shared, append-mode log at $HERMES_HOME/logs/mcp-stderr.log
(created once per process, line-buffered, real fd required by asyncio's
subprocess machinery) and pass it as `errlog` to every stdio_client.
Each server's spawn writes a timestamped header so the shared log stays
readable when multiple servers are running. Falls back to /dev/null if
the log file cannot be opened.

Verified by E2E spawning a subprocess with the log fd as its stderr:
banner lines land in the log file, nothing reaches the calling TTY.
2026-04-23 21:50:25 -07:00
ethernet
7db2703b33
Merge pull request #14895 from NousResearch/tui-resume
fix(tui): keep FloatingOverlays visible when input is blocked
2026-04-24 01:44:50 -03:00
Ari Lotter
7c59e1a871 fix(tui): keep FloatingOverlays visible when input is blocked
FloatingOverlays (SessionPicker, ModelPicker, SkillsHub, pager,
completions) was nested inside the !isBlocked guard in ComposerPane.
When any overlay opened, isBlocked became true, which removed the
entire composer box from the tree — including the overlay that was
trying to render. This made /resume with no args appear to do nothing
(the input line vanished and no picker appeared).

Since 99d859ce (feat: refactor by splitting up app and doing proper
state), isBlocked gated only the text input lines so that
approval/clarify prompts and pickers rendered above a hidden composer.

The regression happened in 408fc893 (fix(tui): tighten composer — status
sits directly above input, overlays anchor to input) when
FloatingOverlays was moved into the input row for anchoring but
accidentally kept inside the !isBlocked guard.

so here, we render FloatingOverlays outside the !isBlocked guard inside
the same position:relative Box, so overlays
stay visible even when text input is hidden. Only the actual input
buffer lines and TextInput are gated now.

Fixes: /resume, /history, /logs, /model, /skills, and completion
dropdowns when blocked overlays are active.
2026-04-23 23:44:52 -04:00
brooklyn!
6fdbf2f2d7
Merge pull request #14820 from NousResearch/bb/tui-at-fuzzy-match
fix(tui): @<name> fuzzy-matches filenames across the repo
2026-04-23 19:40:43 -05:00
Brooklyn Nicholson
0a679cb7ad fix(tui): restore voice/panic handlers + scope fuzzy paths to cwd
Two fixes on top of the fuzzy-@ branch:

(1) Rebase artefact: re-apply only the fuzzy additions on top of
    fresh `tui_gateway/server.py`. The earlier commit was cut from a
    base 58 commits behind main and clobbered ~170 lines of
    voice.toggle / voice.record handlers and the gateway crash hooks
    (`_panic_hook`, `_thread_panic_hook`). Reset server.py to
    origin/main and re-add only:
      - `_FUZZY_*` constants + `_list_repo_files` + `_fuzzy_basename_rank`
      - the new fuzzy branch in the `complete.path` handler

(2) Path scoping (Copilot review): `git ls-files` returns repo-root-
    relative paths, but completions need to resolve under the gateway's
    cwd. When hermes is launched from a subdirectory, the previous
    code surfaced `@file:apps/web/src/foo.tsx` even though the agent
    would resolve that relative to `apps/web/` and miss. Fix:
      - `git -C root rev-parse --show-toplevel` to get repo top
      - `git -C top ls-files …` for the listing
      - `os.path.relpath(top + p, root)` per result, dropping anything
        starting with `../` so the picker stays scoped to cwd-and-below
        (matches Cmd-P workspace semantics)
    `apps/web/src/foo.tsx` ends up as `@file:src/foo.tsx` from inside
    `apps/web/`, and sibling subtrees + parent-of-cwd files don't leak.

New test `test_fuzzy_paths_relative_to_cwd_inside_subdir` builds a
3-package mono-repo, runs from `apps/web/`, and verifies completion
paths are subtree-relative + outside-of-cwd files don't appear.

Copilot review threads addressed: #3134675504 (path scoping),
#3134675532 (`voice.toggle` regression), #3134675541 (`voice.record`
regression — both were stale-base artefacts, not behavioural changes).
2026-04-23 19:38:33 -05:00
Brooklyn Nicholson
41b4d69167 Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/tui-at-fuzzy-match 2026-04-23 19:35:18 -05:00
brooklyn!
3f343cf7cf
Merge pull request #14822 from NousResearch/bb/tui-inline-diff-segment-anchor
fix(tui): anchor inline_diff to the segment where the edit happened
2026-04-23 19:32:21 -05:00
Brooklyn Nicholson
4ae5b58cb1 fix(tui): restore voice handlers + address copilot review
Rebase-artefact cleanup on this branch:

- Restore `voice.status` and `voice.transcript` cases in
  createGatewayEventHandler plus the `voice` / `submission` /
  `composer.setInput` ctx destructuring. They were added to main in
  the 58-commit gap that this branch was originally cut behind;
  dropping them was unintentional.
- Rebase the test ctx shape to match main (voice.* fakes,
  submission.submitRef, composer.setInput) and apply the same
  segment-anchor test rewrites on top.
- Drop the `#14XXX` placeholder from the tool.complete comment;
  replace with a plain-English rationale.
- Rewrite the broken mid-word "pushInlineDiff- Segment" in
  turnController's dedupe comment to refer to
  pushInlineDiffSegment and `kind: 'diff'` plainly.
- Collapse the filter predicate in recordMessageComplete from a
  4-line if/return into one boolean expression — same semantics,
  reads left-to-right as a single predicate.

Copilot review threads resolved: #3134668789, #3134668805,
#3134668822.
2026-04-23 19:22:41 -05:00
Brooklyn Nicholson
2258a181f0 fix(tui): give inline_diff segments blank-line breathing room
Visual polish on top of the segment-anchor change: diff blocks were
butting up against the narration around them. Tag diff-only segments
with `kind: 'diff'` (extended on Msg) and give them `marginTop={1}` +
`marginBottom={1}` in MessageLine, matching the spacing we already
use for user messages. Also swaps the regex-based `diffSegmentBody`
check for an explicit `kind === 'diff'` guard so the dedupe path is
clearer.
2026-04-23 19:11:59 -05:00
Brooklyn Nicholson
11b2942f16 fix(tui): anchor inline_diff to the segment where the edit happened
Revisits #13729. That PR buffered each `tool.complete`'s inline_diff
and merged them into the final assistant message body as a fenced
```diff block. The merge-at-end placement reads as "the agent wrote
this after the summary", even when the edit fired mid-turn — which
is both misleading and (per blitz feedback) feels like noise tacked
onto the end of every task.

Segment-anchored placement instead:

- On tool.complete with inline_diff, `pushInlineDiffSegment` calls
  `flushStreamingSegment` first (so any in-progress narration lands
  as its own segment), then pushes the ```diff block as its own
  segment into segmentMessages. The diff is now anchored BETWEEN the
  narration that preceded the edit and whatever the agent streams
  afterwards, which is where the edit actually happened.
- `recordMessageComplete` no longer merges buffered diffs. The only
  remaining dedupe is "drop diff-only segments whose body the final
  assistant text narrates verbatim (or whose diff fence the final
  text already contains)" — same tradeoff as before, kept so an
  agent that narrates its own diff doesn't render two stacked copies.
- Drops `pendingInlineDiffs` and `queueInlineDiff` — buffer + end-
  merge machinery is gone; segmentMessages is now the only source
  of truth.

Side benefit: Ctrl+C interrupt (`interruptTurn`) iterates
segmentMessages, so diff segments are now preserved in the
transcript when the user cancels after an edit. Previously the
pending buffer was silently dropped on interrupt.

Reported by Teknium during blitz usage: "no diffs are ever at the
end because it didn't make this file edit after the final message".
2026-04-23 19:02:44 -05:00
Brooklyn Nicholson
b08cbc7a79 fix(tui): @<name> fuzzy-matches filenames across the repo
Typing `@appChrome` in the composer should surface
`ui-tui/src/components/appChrome.tsx` without requiring the user to
first type the full directory path — matches the Cmd-P behaviour
users expect from modern editors.

The gateway's `complete.path` handler was doing a plain
`os.listdir(".")` + `startswith` prefix match, so basenames only
resolved inside the current working directory. This reworks it to:

- enumerate repo files via `git ls-files -z --cached --others
  --exclude-standard` (fast, honours `.gitignore`); fall back to a
  bounded `os.walk` that skips common vendor / build dirs when the
  working dir isn't a git repo. Results cached per-root with a 5s
  TTL so rapid keystrokes don't respawn git processes.
- rank basenames with a 5-tier scorer: exact → prefix → camelCase
  / word-boundary → substring → subsequence. Shorter basenames win
  ties; shorter rel paths break basename-length ties.
- only take the fuzzy branch when the query is bare (no `/`), is a
  context reference (`@...`), and isn't `@folder:` — path-ish
  queries and folder tags fall through to the existing
  directory-listing path so explicit navigation intent is
  preserved.

Completion rows now carry `display = basename`,
`meta = directory`, so the picker renders
`appChrome.tsx  ui-tui/src/components` on one row (basename bold,
directory dim) — the meta column was previously "dir" / "" and is
a more useful signal for fuzzy hits.

Reported by Ben Barclay during the TUI v2 blitz test.
2026-04-23 19:01:27 -05:00
ethernet
c95c6bdb7c
Merge pull request #14818 from NousResearch/ink-perf
perf(ink): cache text measurements across yoga flex re-passes
2026-04-23 20:58:54 -03:00
Ari Lotter
bd929ea514 perf(ink): cache text measurements across yoga flex re-passes
Adds a per-ink-text measurement cache keyed by width|widthMode to avoid
re-squashing and re-wrapping the same text when yoga calls measureFunc
multiple times per frame with different widths during flex layout re-pass.
2026-04-23 19:45:10 -04:00
Teknium
6a20e187dd test,chore: cover stringified array/object coercion + AUTHOR_MAP entry
Follow-up to the cherry-picked coercion commit: adds 9 regression tests
covering array/object parsing, invalid-JSON passthrough, wrong-shape
preservation, and the issue #3947 gmail-mcp scenario end-to-end.  Adds
dan@danlynn.com -> danklynn to scripts/release.py AUTHOR_MAP so the
salvage PR's contributor attribution doesn't break CI.
2026-04-23 16:38:38 -07:00
Dan Lynn
9ff21437a0 fix(mcp): coerce stringified arrays/objects in tool args
When a tool schema declares `type: array` or `type: object` and the model
emits the value as a JSON string (common with complex oneOf discriminated
unions), the MCP server rejects it with -32602 "expected array, received
string".  Extend `_coerce_value` to attempt `json.loads` for these types
and replace the string with the parsed value before dispatch.

Root cause confirmed via live testing: `add_reminders.reminders` uses a
oneOf discriminated union (relative/absolute/location) that triggers model
output drift.  Sending a real array passes validation; sending a string
reproduces the exact error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 16:38:38 -07:00
0xbyt4
44a0cbe525 fix(tui): voice mode starts OFF each launch (CLI parity)
The voice.toggle handler was persisting display.voice_enabled /
display.voice_tts to config.yaml, so a TUI session that ever turned
voice on would re-open with it already on (and the mic badge lit) on
every subsequent launch.  cli.py treats voice strictly as runtime
state: _voice_mode = False at __init__, only /voice on flips it, and
nothing writes it back to disk.

Drop the _write_config_key calls in voice.toggle on/off/tts and the
config.yaml fallback in _voice_mode_enabled / _voice_tts_enabled.
State is now env-var-only (HERMES_VOICE / HERMES_VOICE_TTS), scoped to
the live gateway subprocess — the next launch starts clean.
2026-04-23 16:18:15 -07:00
0xbyt4
2af0848f3c fix(tui): ignore SIGPIPE so stderr back-pressure can't kill the gateway
Crash-log stack trace (tui_gateway_crash.log) from the user's session
pinned the regression: SIGPIPE arrived while main thread was blocked on
for-raw-in-sys.stdin — i.e., a background thread (debug print to stderr,
most likely from HERMES_VOICE_DEBUG=1) wrote to a pipe whose buffer the
TUI hadn't drained yet, and SIG_DFL promptly killed the process.

Two fixes that together restore CLI parity:

- entry.py: SIGPIPE → SIG_IGN instead of the _log_signal handler that
  then exited. With SIG_IGN, Python raises BrokenPipeError on the
  offending write, which write_json already handles with a clean exit
  via _log_exit. SIGTERM / SIGHUP still route through _log_signal so
  real termination signals remain diagnosable.

- hermes_cli/voice.py:_debug: wrap the stderr print in a BrokenPipeError
  / OSError try/except. This runs from daemon threads (silence callback,
  TTS playback, beep), so a broken stderr must not escape and ride up
  into the main event loop.

Verified by spawning the gateway subprocess locally:
  voice.toggle status → 200 OK, process stays alive, clean exit on
  stdin close logs "reason=stdin EOF" instead of a silent reap.
2026-04-23 16:18:15 -07:00
0xbyt4
7baf370d3d chore(tui): capture signal-triggered gateway exits in crash log
SIG_DFL for SIGPIPE means the kernel reaps the gateway subprocess the
instant a background thread (TTS playback, silence callback, voice
status emitter) writes to a stdout the TUI stopped reading — before
the Python interpreter can run excepthook, threading.excepthook,
atexit, or the entry.py post-loop _log_exit.

Replace the three SIG_DFL / SIG_IGN bindings with a _log_signal
handler that:

- records which signal (SIGPIPE / SIGTERM / SIGHUP) fired and when;
- dumps the main-thread stack at signal delivery AND every live
  thread's stack via sys._current_frames — the background-thread
  write that provoked SIGPIPE is almost always visible here;
- writes everything to ~/.hermes/logs/tui_gateway_crash.log and prints
  a [gateway-signal] breadcrumb to stderr so the TUI Activity surfaces
  it as well.

SIGINT stays ignored (TUI handles Ctrl+C for the user).
2026-04-23 16:18:15 -07:00
0xbyt4
eeda18a9b7 chore(tui): record gateway exit reason in crash log
Gateway exits weren't reaching the panic hook because entry.py calls
sys.exit(0) on broken stdout — clean termination, no exception.  That
left "gateway exited" in the TUI with zero forensic trail when pipe
breaks happened mid-turn.

Entry.py now tags each exit path — startup-write failure, parse-error-
response write failure, per-method response write failure, stdin EOF —
with a one-line entry in ~/.hermes/logs/tui_gateway_crash.log and a
gateway.stderr breadcrumb.  Includes the JSON-RPC method name on the
dispatch path, which is the only way to tell "died right after handling
voice.toggle on" from "died emitting the second message.complete".
2026-04-23 16:18:15 -07:00
0xbyt4
3a9598337f chore(tui): dump gateway crash traces to ~/.hermes/logs/tui_gateway_crash.log
When the gateway subprocess raises an unhandled exception during a
voice-mode turn, nothing survives: stdout is the JSON-RPC pipe, stderr
flushes but the process is already exiting, and no log file catches
Python's default traceback print.  The user is left with an
undiagnosable "gateway exited" banner.

Install:

- sys.excepthook → write full traceback to tui_gateway_crash.log +
  echo the first line to stderr (which the TUI pumps into
  Activity as a gateway.stderr event).  Chains to the default hook so
  the process still terminates.
- threading.excepthook → same, tagged with the thread name so it's
  clear when the crash came from a daemon thread (beep playback, TTS,
  silence callback, etc.).
- Turn-dispatcher except block now also appends a traceback to the
  crash log before emitting the user-visible error event — str(e)
  alone was too terse to identify where in the voice pipeline the
  failure happened.

Zero behavioural change on the happy path; purely forensics.
2026-04-23 16:18:15 -07:00
0xbyt4
98418afd5d fix(tui): break TTS→STT feedback loop + colorize REC badge
TTS feedback loop (hermes_cli/voice.py)

The VAD loop kept the microphone live while speak_text played the
agent's reply over the speakers, so the reply itself was picked up,
transcribed, and submitted — the agent then replied to its own echo
("Ha, looks like we're in a loop").

Ported cli.py:_voice_tts_done synchronisation:

- _tts_playing: threading.Event (initially set = "not playing").
- speak_text cancels the active recorder before opening the speakers,
  clears _tts_playing, and on exit waits 300 ms before re-starting the
  recorder — long enough for the OS audio device to settle so afplay
  and sounddevice don't race for it.
- _continuous_on_silence now waits on _tts_playing (up to 60 s) before
  re-arming the mic with another 300 ms gap, mirroring
  cli.py:10619-10621.  If the user flips voice off during the wait the
  loop exits cleanly instead of fighting for the device.

Without both halves the loop races: if the silence callback fires
before TTS starts it re-arms immediately; if TTS is already playing
the pause-and-resume path catches it.

Red REC badge (ui-tui appChrome + useMainApp)

Classic CLI (cli.py:_get_voice_status_fragments) renders "● REC" in
red and "◉ STT" in amber.  TUI was showing a dim "REC" with no dot,
making it hard to spot at a glance.  voiceLabel now emits the same
glyphs and appChrome colours them via t.color.error / t.color.warn,
falling back to dim for the idle label.
2026-04-23 16:18:15 -07:00
0xbyt4
42ff785771 fix(tui): voice TTS speak-back + transcript-key bug + auto-submit
Three issues surfaced during end-to-end testing of the CLI-parity voice
loop and are fixed together because they all blocked "speak → agent
responds → TTS reads it back" from working at all:

1. Wrong result key (hermes_cli/voice.py)

   transcribe_recording() returns {"success": bool, "transcript": str},
   matching cli.py:_voice_stop_and_transcribe. The wrapper was reading
   result.get("text"), which is None, so every successful Groq / local
   STT response was thrown away and the 3-strikes halt fired after
   three silent-looking cycles. Fixed by reading "transcript" and also
   honouring "success" like the CLI does. Updated the loop simulation
   tests to return the correct shape.

2. TTS speak-back was missing (tui_gateway/server.py + hermes_cli/voice.py)

   The TUI had a voice.toggle "tts" subcommand but nothing downstream
   actually read the flag — agent replies never spoke. Mirrored
   cli.py:8747-8754's dispatch: on message.complete with status ==
   "complete", if _voice_tts_enabled() is true, spawn a daemon thread
   running speak_text(response). Rewrote speak_text as a full port of
   cli.py:_voice_speak_response — same markdown-strip regex pipeline
   (code blocks, links, bold/italic, inline code, headers, list bullets,
   horizontal rules, excessive newlines), same 4000-char cap, same
   explicit mp3 output path, same MP3-over-OGG playback choice (afplay
   misbehaves on OGG), same cleanup of both extensions. Keeps TUI TTS
   audible output byte-for-byte identical to the classic CLI.

3. Auto-submit swallowed on non-empty composer (createGatewayEventHandler.ts)

   The voice.transcript handler branched on prev input via a setInput
   updater and fired submitRef.current inside the updater when prev was
   empty. React strict mode double-invokes state updaters, which would
   queue the submit twice; and when the composer had any content the
   transcript was merely appended — the agent never saw it. CLI
   _pending_input.put(transcript) unconditionally feeds the transcript
   as the next turn, so match that: always clear the composer and
   setTimeout(() => submitRef.current(text), 0) outside any updater.
   Side effect can't run twice this way, and a half-typed draft on the
   rare occasion is a fair trade vs. silently dropping the turn.

Also added peak_rms to the rec.stop debug line so "recording too quiet"
is diagnosable at a glance when HERMES_VOICE_DEBUG=1.
2026-04-23 16:18:15 -07:00
0xbyt4
04c489b587 feat(tui): match CLI's voice slash + VAD-continuous recording model
The TUI had drifted from the CLI's voice model in two ways:

- /voice on was lighting up the microphone immediately and Ctrl+B was
  interpreted as a mode toggle.  The CLI separates the two: /voice on
  just flips the umbrella bit, recording only starts once the user
  presses Ctrl+B, which also sets _voice_continuous so the VAD loop
  auto-restarts until the user presses Ctrl+B again or three silent
  cycles pass.
- /voice tts was missing entirely, so users couldn't turn agent reply
  speech on/off from inside the TUI.

This commit brings the TUI to parity.

Python

- hermes_cli/voice.py: continuous-mode API (start_continuous,
  stop_continuous, is_continuous_active) layered on the existing PTT
  wrappers. The silence callback transcribes, fires on_transcript,
  tracks consecutive no-speech cycles, and auto-restarts — mirroring
  cli.py:_voice_stop_and_transcribe + _restart_recording.
- tui_gateway/server.py:
  - voice.toggle now supports on / off / tts / status.  The umbrella
    bit lives in HERMES_VOICE + display.voice_enabled; tts lives in
    HERMES_VOICE_TTS + display.voice_tts.  /voice off also tears down
    any active continuous loop so a toggle-off really releases the
    microphone.
  - voice.record start/stop now drives start_continuous/stop_continuous.
    start is refused with a clear error when the mode is off, matching
    cli.py:handle_voice_record's early return on `not _voice_mode`.
  - New voice.transcript / voice.status events emit through
    _voice_emit (remembers the sid that last enabled the mode so
    events land in the right session).

TypeScript

- gatewayTypes.ts: voice.status + voice.transcript event
  discriminants; VoiceToggleResponse gains tts; VoiceRecordResponse
  gains status for the new "started/stopped" responses.
- interfaces.ts: GatewayEventHandlerContext gains composer.setInput +
  submission.submitRef + voice.{setRecording, setProcessing,
  setVoiceEnabled}; InputHandlerContext.voice gains enabled +
  setVoiceEnabled for the mode-aware Ctrl+B handler.
- createGatewayEventHandler.ts: voice.status drives REC/STT badges;
  voice.transcript auto-submits when the composer is empty (CLI
  _pending_input.put parity) and appends when a draft is in flight.
  no_speech_limit flips voice off + sys line.
- useInputHandlers.ts: Ctrl+B now calls voice.record (start/stop),
  not voice.toggle, and nudges the user with a sys line when the
  mode is off instead of silently flipping it on.
- useMainApp.ts: wires the new event-handler context fields.
- slash/commands/session.ts: /voice handles on / off / tts / status
  with CLI-matching output ("voice: mode on · tts off").

Backward compat preserved for voice.record (was always PTT shape;
gateway still honours start/stop with mode-gating added).
2026-04-23 16:18:15 -07:00
0xbyt4
0bb460b070 fix(tui): add missing hermes_cli.voice wrapper for gateway RPC
tui_gateway/server.py:3486/3491/3509 imports start_recording,
stop_and_transcribe, and speak_text from hermes_cli.voice, but the
module never existed (not in git history — never shipped, never
deleted). Every voice.record / voice.tts RPC call hit the ImportError
branch and the TUI surfaced it as "voice module not available — install
audio dependencies" even on boxes with sounddevice / faster-whisper /
numpy installed.

Adds a thin wrapper on top of tools.voice_mode (recording +
transcription) and tools.tts_tool (text-to-speech):

- start_recording() — idempotent; stores the active AudioRecorder in a
  module-global guarded by a Lock so repeat Ctrl+B presses don't fight
  over the mic.
- stop_and_transcribe() — returns None for no-op / no-speech /
  Whisper-hallucination cases so the TUI's existing "no speech detected"
  path keeps working unchanged.
- speak_text(text) — lazily imports tts_tool (optional provider SDKs
  stay unloaded until the first /voice tts call), parses the tool's
  JSON result, and plays the audio via play_audio_file.

Paired with the Ctrl+B keybinding fix in the prior commit, the TUI
voice pipeline now works end-to-end for the first time.
2026-04-23 16:18:15 -07:00
0xbyt4
3504bd401b fix(tui): route Ctrl+B to voice toggle, not composer input
When the user runs /voice and then presses Ctrl+B in the TUI, three
handlers collaborate to consume the chord and none of them dispatch
voice.record:

- isAction() is platform-aware — on macOS it requires Cmd (meta/super),
  so Ctrl+B fails the match in useInputHandlers and never triggers
  voiceStart/voiceStop.
- TextInput's Ctrl+B pass-through list doesn't include 'b', so the
  keystroke falls through to the wordMod backward-word branch on Linux
  and to the printable-char insertion branch on macOS — the latter is
  exactly what timmie reported ("enters a b into the tui").
- /voice emits "voice: on" with no hint, so the user has no way to
  know Ctrl+B is the recording toggle.

Introduces isVoiceToggleKey(key, ch) in lib/platform.ts that matches
raw Ctrl+B on every platform (mirrors tips.py and config.yaml's
voice.record_key default) and additionally accepts Cmd+B on macOS so
existing muscle memory keeps working. Wires it into useInputHandlers,
adds Ctrl+B to TextInput's pass-through list so the global handler
actually receives the chord, and appends "press Ctrl+B to record" to
the /voice on message.

Empirically verified with hermes --tui: Ctrl+B no longer leaks 'b'
into the composer and now dispatches the voice.record RPC (the
downstream ImportError for hermes_cli.voice is a separate upstream
bug — follow-up patch).
2026-04-23 16:18:15 -07:00
Teknium
50d97edbe1
feat(delegation): bump default child_timeout_seconds to 600s (#14809)
The 300s default was too tight for high-reasoning models on non-trivial
delegated tasks — e.g. gpt-5.5 xhigh reviewing 12 files would burn >5min
on reasoning tokens before issuing its first tool call, tripping the
hard wall-clock timeout with 0 api_calls logged.

- tools/delegate_tool.py: DEFAULT_CHILD_TIMEOUT 300 -> 600
- hermes_cli/config.py: surface delegation.child_timeout_seconds in
  DEFAULT_CONFIG so it's discoverable (previously the key was read by
  _get_child_timeout() but absent from the default config schema)

Users can still override via config.yaml delegation.child_timeout_seconds
or DELEGATION_CHILD_TIMEOUT_SECONDS env var (floor 30s, no ceiling).
2026-04-23 16:14:55 -07:00
Teknium
e26c4f0e34
fix(kimi,mcp): Moonshot schema sanitizer + MCP schema robustness (#14805)
Fixes a broader class of 'tools.function.parameters is not a valid
moonshot flavored json schema' errors on Nous / OpenRouter aggregators
routing to moonshotai/kimi-k2.6 with MCP tools loaded.

## Moonshot sanitizer (agent/moonshot_schema.py, new)

Model-name-routed (not base-URL-routed) so Nous / OpenRouter users are
covered alongside api.moonshot.ai.  Applied in
ChatCompletionsTransport.build_kwargs when is_moonshot_model(model).

Two repairs:
1. Fill missing 'type' on every property / items / anyOf-child schema
   node (structural walk — only schema-position dicts are touched, not
   container maps like properties/$defs).
2. Strip 'type' at anyOf parents; Moonshot rejects it.

## MCP normalizer hardened (tools/mcp_tool.py)

Draft-07 $ref rewrite from PR #14802 now also does:
- coerce missing / null 'type' on object-shaped nodes (salvages #4897)
- prune 'required' arrays to names that exist in 'properties'
  (salvages #4651; Gemini 400s on dangling required)
- apply recursively, not just top-level

These repairs are provider-agnostic so the same MCP schema is valid on
OpenAI, Anthropic, Gemini, and Moonshot in one pass.

## Crash fix: safe getattr for Tool.inputSchema

_convert_mcp_schema now uses getattr(t, 'inputSchema', None) so MCP
servers whose Tool objects omit the attribute entirely no longer abort
registration (salvages #3882).

## Validation

- tests/agent/test_moonshot_schema.py: 27 new tests (model detection,
  missing-type fill, anyOf-parent strip, non-mutation, real-world MCP
  shape)
- tests/tools/test_mcp_tool.py: 7 new tests (missing / null type,
  required pruning, nested repair, safe getattr)
- tests/agent/transports/test_chat_completions.py: 2 new integration
  tests (Moonshot route sanitizes, non-Moonshot route doesn't)
- Targeted suite: 49 passed
- E2E via execute_code with a realistic MCP tool carrying all three
  Moonshot rejection modes + dangling required + draft-07 refs:
  sanitizer produces a schema valid on Moonshot and Gemini
2026-04-23 16:11:57 -07:00
helix4u
24f139e16a fix(mcp): rewrite definitions refs to in input schemas 2026-04-23 15:56:57 -07:00
Teknium
ef5eaf8d87
feat(cron): honor hermes tools config for the cron platform (#14798)
Cron now resolves its toolset from the same per-platform config the
gateway uses — `_get_platform_tools(cfg, 'cron')` — instead of blindly
loading every default toolset.  Existing cron jobs without a per-job
override automatically lose `moa`, `homeassistant`, and `rl` (the
`_DEFAULT_OFF_TOOLSETS` set), which stops the "surprise $4.63
mixture_of_agents run" class of bug (Norbert, Discord).

Precedence inside `run_job`:
  1. per-job `enabled_toolsets` (PR #14767 / #6130) — wins if set
  2. `_get_platform_tools(cfg, 'cron')` — new, the blanket gate
  3. `None` fallback (legacy) — only on resolver exception

Changes:
- hermes_cli/platforms.py: register 'cron' with default_toolset
  'hermes-cron'
- toolsets.py: add 'hermes-cron' toolset (mirrors 'hermes-cli';
  `_get_platform_tools` then filters via `_DEFAULT_OFF_TOOLSETS`)
- cron/scheduler.py: add `_resolve_cron_enabled_toolsets(job, cfg)`,
  call it at the `AIAgent(...)` kwargs site
- tests/cron/test_scheduler.py: replace the 'None when not set' test
  (outdated contract) with an invariant ('moa not in default cron
  toolset') + new per-job-wins precedence test
- tests/hermes_cli/test_tools_config.py: mark 'cron' as non-messaging
  in the gateway-toolset-coverage test
2026-04-23 15:48:50 -07:00
480 changed files with 76296 additions and 4152 deletions

View file

@ -53,6 +53,9 @@ jobs:
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Regenerate per-skill docs pages + catalogs
run: python3 website/scripts/generate-skill-docs.py
- name: Build skills index (if not already present)
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View file

@ -36,6 +36,9 @@ jobs:
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Regenerate per-skill docs pages + catalogs
run: python3 website/scripts/generate-skill-docs.py
- name: Lint docs diagrams
run: npm run lint:diagrams
working-directory: website

View file

@ -240,6 +240,19 @@ npm run fmt # prettier
npm test # vitest
```
### TUI in the Dashboard (`hermes dashboard``/chat`)
The dashboard embeds the real `hermes --tui`**not** a rewrite. See `hermes_cli/pty_bridge.py` + the `@app.websocket("/api/pty")` endpoint in `hermes_cli/web_server.py`.
- Browser loads `web/src/pages/ChatPage.tsx`, which mounts xterm.js's `Terminal` with the WebGL renderer, `@xterm/addon-fit` for container-driven resize, and `@xterm/addon-unicode11` for modern wide-character widths.
- `/api/pty?token=…` upgrades to a WebSocket; auth uses the same ephemeral `_SESSION_TOKEN` as REST, via query param (browsers can't set `Authorization` on WS upgrade).
- The server spawns whatever `hermes --tui` would spawn, through `ptyprocess` (POSIX PTY — WSL works, native Windows does not).
- Frames: raw PTY bytes each direction; resize via `\x1b[RESIZE:<cols>;<rows>]` intercepted on the server and applied with `TIOCSWINSZ`.
**Do not re-implement the primary chat experience in React.** The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded `hermes --tui` — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.
**Structured React UI around the TUI is allowed when it is not a second chat surface.** Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. `ChatSidebar`, `ModelPickerDialog`, `ToolCall`) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.
---
## Adding New Tools

View file

@ -10,9 +10,11 @@ ENV PYTHONUNBUFFERED=1
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# Install system dependencies in one layer, clear APT cache
# tini reaps orphaned zombie processes (MCP stdio subprocesses, git, bun, etc.)
# that would otherwise accumulate when hermes runs as PID 1. See #15012.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli && \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
rm -rf /var/lib/apt/lists/*
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
@ -41,9 +43,15 @@ COPY --chown=hermes:hermes . .
# Build web dashboard (Vite outputs to hermes_cli/web_dist/)
RUN cd web && npm run build
# ---------- Permissions ----------
# Make install dir world-readable so any HERMES_UID can read it at runtime.
# The venv needs to be traversable too.
USER root
RUN chmod -R a+rX /opt/hermes
# Start as root so the entrypoint can usermod/groupmod + gosu.
# If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).
# ---------- Python virtualenv ----------
RUN chown hermes:hermes /opt/hermes
USER hermes
RUN uv venv && \
uv pip install --no-cache-dir -e ".[all]"
@ -52,4 +60,4 @@ ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
ENV PATH="/opt/data/.local/bin:${PATH}"
VOLUME [ "/opt/data" ]
ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]

View file

@ -60,7 +60,7 @@ from acp_adapter.events import (
make_tool_progress_cb,
)
from acp_adapter.permissions import make_approval_callback
from acp_adapter.session import SessionManager, SessionState
from acp_adapter.session import SessionManager, SessionState, _expand_acp_enabled_toolsets
logger = logging.getLogger(__name__)
@ -287,7 +287,11 @@ class HermesACPAgent(acp.Agent):
try:
from model_tools import get_tool_definitions
enabled_toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
enabled_toolsets = _expand_acp_enabled_toolsets(
getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"],
mcp_server_names=[server.name for server in mcp_servers],
)
state.agent.enabled_toolsets = enabled_toolsets
disabled_toolsets = getattr(state.agent, "disabled_toolsets", None)
state.agent.tools = get_tool_definitions(
enabled_toolsets=enabled_toolsets,
@ -754,7 +758,9 @@ class HermesACPAgent(acp.Agent):
def _cmd_tools(self, args: str, state: SessionState) -> str:
try:
from model_tools import get_tool_definitions
toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
toolsets = _expand_acp_enabled_toolsets(
getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
)
tools = get_tool_definitions(enabled_toolsets=toolsets, quiet_mode=True)
if not tools:
return "No tools available."

View file

@ -106,6 +106,24 @@ def _register_task_cwd(task_id: str, cwd: str) -> None:
logger.debug("Failed to register ACP task cwd override", exc_info=True)
def _expand_acp_enabled_toolsets(
toolsets: List[str] | None = None,
mcp_server_names: List[str] | None = None,
) -> List[str]:
"""Return ACP toolsets plus explicit MCP server toolsets for this session."""
expanded: List[str] = []
for name in list(toolsets or ["hermes-acp"]):
if name and name not in expanded:
expanded.append(name)
for server_name in list(mcp_server_names or []):
toolset_name = f"mcp-{server_name}"
if server_name and toolset_name not in expanded:
expanded.append(toolset_name)
return expanded
def _clear_task_cwd(task_id: str) -> None:
"""Remove task-specific cwd overrides for an ACP session."""
if not task_id:
@ -537,9 +555,18 @@ class SessionManager:
elif isinstance(model_cfg, str) and model_cfg.strip():
default_model = model_cfg.strip()
configured_mcp_servers = [
name
for name, cfg in (config.get("mcp_servers") or {}).items()
if not isinstance(cfg, dict) or cfg.get("enabled", True) is not False
]
kwargs = {
"platform": "acp",
"enabled_toolsets": ["hermes-acp"],
"enabled_toolsets": _expand_acp_enabled_toolsets(
["hermes-acp"],
mcp_server_names=configured_mcp_servers,
),
"quiet_mode": True,
"session_id": session_id,
"model": model or default_model,

View file

@ -14,6 +14,8 @@ import copy
import json
import logging
import os
import platform
import subprocess
from pathlib import Path
from hermes_constants import get_hermes_home
@ -277,8 +279,9 @@ def _is_oauth_token(key: str) -> bool:
Positively identifies Anthropic OAuth tokens by their key format:
- ``sk-ant-`` prefix (but NOT ``sk-ant-api``) setup tokens, managed keys
- ``eyJ`` prefix JWTs from the Anthropic OAuth flow
- ``cc-`` prefix Claude Code OAuth access tokens (from CLAUDE_CODE_OAUTH_TOKEN)
Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match either pattern
Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match any pattern
and correctly return False.
"""
if not key:
@ -292,6 +295,9 @@ def _is_oauth_token(key: str) -> bool:
# JWTs from Anthropic OAuth flow
if key.startswith("eyJ"):
return True
# Claude Code OAuth access tokens (opaque, from CLAUDE_CODE_OAUTH_TOKEN)
if key.startswith("cc-"):
return True
return False
@ -461,8 +467,72 @@ def build_anthropic_bedrock_client(region: str):
)
def _read_claude_code_credentials_from_keychain() -> Optional[Dict[str, Any]]:
"""Read Claude Code OAuth credentials from the macOS Keychain.
Claude Code >=2.1.114 stores credentials in the macOS Keychain under the
service name "Claude Code-credentials" rather than (or in addition to)
the JSON file at ~/.claude/.credentials.json.
The password field contains a JSON string with the same claudeAiOauth
structure as the JSON file.
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
import platform
import subprocess
if platform.system() != "Darwin":
return None
try:
# Read the "Claude Code-credentials" generic password entry
result = subprocess.run(
["security", "find-generic-password",
"-s", "Claude Code-credentials",
"-w"],
capture_output=True,
text=True,
timeout=5,
)
except (OSError, subprocess.TimeoutExpired):
logger.debug("Keychain: security command not available or timed out")
return None
if result.returncode != 0:
logger.debug("Keychain: no entry found for 'Claude Code-credentials'")
return None
raw = result.stdout.strip()
if not raw:
return None
try:
data = json.loads(raw)
except json.JSONDecodeError:
logger.debug("Keychain: credentials payload is not valid JSON")
return None
oauth_data = data.get("claudeAiOauth")
if oauth_data and isinstance(oauth_data, dict):
access_token = oauth_data.get("accessToken", "")
if access_token:
return {
"accessToken": access_token,
"refreshToken": oauth_data.get("refreshToken", ""),
"expiresAt": oauth_data.get("expiresAt", 0),
"source": "macos_keychain",
}
return None
def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
"""Read refreshable Claude Code OAuth credentials from ~/.claude/.credentials.json.
"""Read refreshable Claude Code OAuth credentials.
Checks two sources in order:
1. macOS Keychain (Darwin only) "Claude Code-credentials" entry
2. ~/.claude/.credentials.json file
This intentionally excludes ~/.claude.json primaryApiKey. Opencode's
subscription flow is OAuth/setup-token based with refreshable credentials,
@ -471,6 +541,12 @@ def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
# Try macOS Keychain first (covers Claude Code >=2.1.114)
kc_creds = _read_claude_code_credentials_from_keychain()
if kc_creds:
return kc_creds
# Fall back to JSON file
cred_path = Path.home() / ".claude" / ".credentials.json"
if cred_path.exists():
try:
@ -641,7 +717,9 @@ def _write_claude_code_credentials(
existing["claudeAiOauth"] = oauth_data
cred_path.parent.mkdir(parents=True, exist_ok=True)
cred_path.write_text(json.dumps(existing, indent=2), encoding="utf-8")
_tmp_cred = cred_path.with_suffix(".tmp")
_tmp_cred.write_text(json.dumps(existing, indent=2), encoding="utf-8")
_tmp_cred.replace(cred_path)
# Restrict permissions (credentials file)
cred_path.chmod(0o600)
except (OSError, IOError) as e:
@ -908,6 +986,26 @@ def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
# ---------------------------------------------------------------------------
def _is_bedrock_model_id(model: str) -> bool:
"""Detect AWS Bedrock model IDs that use dots as namespace separators.
Bedrock model IDs come in two forms:
- Bare: ``anthropic.claude-opus-4-7``
- Regional (inference profiles): ``us.anthropic.claude-sonnet-4-5-v1:0``
In both cases the dots separate namespace components, not version
numbers, and must be preserved verbatim for the Bedrock API.
"""
lower = model.lower()
# Regional inference-profile prefixes
if any(lower.startswith(p) for p in ("global.", "us.", "eu.", "ap.", "jp.")):
return True
# Bare Bedrock model IDs: provider.model-family
if lower.startswith("anthropic."):
return True
return False
def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
"""Normalize a model name for the Anthropic API.
@ -915,11 +1013,19 @@ def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
- Converts dots to hyphens in version numbers (OpenRouter uses dots,
Anthropic uses hyphens: claude-opus-4.6 claude-opus-4-6), unless
preserve_dots is True (e.g. for Alibaba/DashScope: qwen3.5-plus).
- Preserves Bedrock model IDs (``anthropic.claude-opus-4-7``) and
regional inference profiles (``us.anthropic.claude-*``) whose dots
are namespace separators, not version separators.
"""
lower = model.lower()
if lower.startswith("anthropic/"):
model = model[len("anthropic/"):]
if not preserve_dots:
# Bedrock model IDs use dots as namespace separators
# (e.g. "anthropic.claude-opus-4-7", "us.anthropic.claude-*").
# These must not be converted to hyphens. See issue #12295.
if _is_bedrock_model_id(model):
return model
# OpenRouter uses dots for version separators (claude-opus-4.6),
# Anthropic uses hyphens (claude-opus-4-6). Convert dots to hyphens.
model = model.replace(".", "-")
@ -1598,4 +1704,3 @@ def build_anthropic_kwargs(
return kwargs

View file

@ -74,6 +74,12 @@ _PROVIDER_ALIASES = {
"minimax_cn": "minimax-cn",
"claude": "anthropic",
"claude-code": "anthropic",
"github": "copilot",
"github-copilot": "copilot",
"github-model": "copilot",
"github-models": "copilot",
"github-copilot-acp": "copilot-acp",
"copilot-acp-agent": "copilot-acp",
}
@ -89,10 +95,11 @@ def _normalize_aux_provider(provider: Optional[str]) -> str:
if normalized == "main":
# Resolve to the user's actual main provider so named custom providers
# and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
main_prov = _read_main_provider()
main_prov = (_read_main_provider() or "").strip().lower()
if main_prov and main_prov not in ("auto", "main", ""):
return main_prov
return "custom"
normalized = main_prov
else:
return "custom"
return _PROVIDER_ALIASES.get(normalized, normalized)
@ -1342,6 +1349,68 @@ def _is_auth_error(exc: Exception) -> bool:
return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()
def _evict_cached_clients(provider: str) -> None:
"""Drop cached auxiliary clients for a provider so fresh creds are used."""
normalized = _normalize_aux_provider(provider)
with _client_cache_lock:
stale_keys = [
key for key in _client_cache
if _normalize_aux_provider(str(key[0])) == normalized
]
for key in stale_keys:
client = _client_cache.get(key, (None, None, None))[0]
if client is not None:
_force_close_async_httpx(client)
try:
close_fn = getattr(client, "close", None)
if callable(close_fn):
close_fn()
except Exception:
pass
_client_cache.pop(key, None)
def _refresh_provider_credentials(provider: str) -> bool:
"""Refresh short-lived credentials for OAuth-backed auxiliary providers."""
normalized = _normalize_aux_provider(provider)
try:
if normalized == "openai-codex":
from hermes_cli.auth import resolve_codex_runtime_credentials
creds = resolve_codex_runtime_credentials(force_refresh=True)
if not str(creds.get("api_key", "") or "").strip():
return False
_evict_cached_clients(normalized)
return True
if normalized == "nous":
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
force_mint=True,
)
if not str(creds.get("api_key", "") or "").strip():
return False
_evict_cached_clients(normalized)
return True
if normalized == "anthropic":
from agent.anthropic_adapter import read_claude_code_credentials, _refresh_oauth_token, resolve_anthropic_token
creds = read_claude_code_credentials()
token = _refresh_oauth_token(creds) if isinstance(creds, dict) and creds.get("refreshToken") else None
if not str(token or "").strip():
token = resolve_anthropic_token()
if not str(token or "").strip():
return False
_evict_cached_clients(normalized)
return True
except Exception as exc:
logger.debug("Auxiliary provider credential refresh failed for %s: %s", normalized, exc)
return False
return False
def _try_payment_fallback(
failed_provider: str,
task: str = None,
@ -1736,7 +1805,7 @@ def resolve_provider_client(
"but no endpoint credentials found")
return None, None
# ── Named custom providers (config.yaml custom_providers list) ───
# ── Named custom providers (config.yaml providers dict / custom_providers list) ───
try:
from hermes_cli.runtime_provider import _get_named_custom_provider
custom_entry = _get_named_custom_provider(provider)
@ -1747,16 +1816,51 @@ def resolve_provider_client(
if not custom_key and custom_key_env:
custom_key = os.getenv(custom_key_env, "").strip()
custom_key = custom_key or "no-key-required"
# An explicit per-task api_mode override (from _resolve_task_provider_model)
# wins; otherwise fall back to what the provider entry declared.
entry_api_mode = (api_mode or custom_entry.get("api_mode") or "").strip()
if custom_base:
final_model = _normalize_resolved_model(
model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
provider,
)
client = OpenAI(api_key=custom_key, base_url=custom_base)
client = _wrap_if_needed(client, final_model, custom_base)
logger.debug(
"resolve_provider_client: named custom provider %r (%s)",
provider, final_model)
"resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
provider, final_model, entry_api_mode or "chat_completions")
# anthropic_messages: route through the Anthropic Messages API
# via AnthropicAuxiliaryClient. Mirrors the anonymous-custom
# branch in _try_custom_endpoint(). See #15033.
if entry_api_mode == "anthropic_messages":
try:
from agent.anthropic_adapter import build_anthropic_client
real_client = build_anthropic_client(custom_key, custom_base)
except ImportError:
logger.warning(
"Named custom provider %r declares api_mode="
"anthropic_messages but the anthropic SDK is not "
"installed — falling back to OpenAI-wire.",
provider,
)
client = OpenAI(api_key=custom_key, base_url=custom_base)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
real_client, final_model, custom_key, custom_base, is_oauth=False,
)
if async_mode:
return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
return sync_anthropic, final_model
client = OpenAI(api_key=custom_key, base_url=custom_base)
# codex_responses or inherited auto-detect (via _wrap_if_needed).
# _wrap_if_needed reads the closed-over `api_mode` (the task-level
# override). Named-provider entry api_mode=codex_responses also
# flows through here.
if entry_api_mode == "codex_responses" and not isinstance(
client, CodexAuxiliaryClient
):
client = CodexAuxiliaryClient(client, final_model)
else:
client = _wrap_if_needed(client, final_model, custom_base)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
logger.warning(
@ -1889,6 +1993,39 @@ def resolve_provider_client(
"directly supported", provider)
return None, None
elif pconfig.auth_type == "aws_sdk":
# AWS SDK providers (Bedrock) — use the Anthropic Bedrock client via
# boto3's credential chain (IAM roles, SSO, env vars, instance metadata).
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_bedrock_region
from agent.anthropic_adapter import build_anthropic_bedrock_client
except ImportError:
logger.warning("resolve_provider_client: bedrock requested but "
"boto3 or anthropic SDK not installed")
return None, None
if not has_aws_credentials():
logger.debug("resolve_provider_client: bedrock requested but "
"no AWS credentials found")
return None, None
region = resolve_bedrock_region()
default_model = "anthropic.claude-haiku-4-5-20251001-v1:0"
final_model = _normalize_resolved_model(model or default_model, provider)
try:
real_client = build_anthropic_bedrock_client(region)
except ImportError as exc:
logger.warning("resolve_provider_client: cannot create Bedrock "
"client: %s", exc)
return None, None
client = AnthropicAuxiliaryClient(
real_client, final_model, api_key="aws-sdk",
base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
)
logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
return (_to_async_client(client, final_model) if async_mode
else (client, final_model))
elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
# OAuth providers — route through their specific try functions
if provider == "nous":
@ -2857,6 +2994,49 @@ def call_llm(
return _validate_llm_response(
refreshed_client.chat.completions.create(**kwargs), task)
# ── Auth refresh retry ───────────────────────────────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s: refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
retry_client, retry_model = (
resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=False,
)[1:]
if task == "vision"
else _get_cached_client(
resolved_provider,
resolved_model,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
)
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
retry_client.chat.completions.create(**retry_kwargs), task)
# ── Payment / credit exhaustion fallback ──────────────────────
# When the resolved provider returns 402 or a credit-related error,
# try alternative providers instead of giving up. This handles the
@ -3077,6 +3257,48 @@ async def async_call_llm(
return _validate_llm_response(
await refreshed_client.chat.completions.create(**kwargs), task)
# ── Auth refresh retry (mirrors sync call_llm) ───────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s (async): refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
if task == "vision":
_, retry_client, retry_model = resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=True,
)
else:
retry_client, retry_model = _get_cached_client(
resolved_provider,
resolved_model,
async_mode=True,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
await retry_client.chat.completions.create(**retry_kwargs), task)
# ── Payment / connection fallback (mirrors sync call_llm) ─────
should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
is_auto = resolved_provider in ("auto", "", None)

View file

@ -87,6 +87,114 @@ def reset_client_cache():
_bedrock_control_client_cache.clear()
def invalidate_runtime_client(region: str) -> bool:
"""Evict the cached ``bedrock-runtime`` client for a single region.
Per-region counterpart to :func:`reset_client_cache`. Used by the converse
call wrappers to discard clients whose underlying HTTP connection has
gone stale, so the next call allocates a fresh client (with a fresh
connection pool) instead of reusing a dead socket.
Returns True if a cached entry was evicted, False if the region was not
cached.
"""
existed = region in _bedrock_runtime_client_cache
_bedrock_runtime_client_cache.pop(region, None)
return existed
# ---------------------------------------------------------------------------
# Stale-connection detection
# ---------------------------------------------------------------------------
#
# boto3 caches its HTTPS connection pool inside the client object. When a
# pooled connection is killed out from under us (NAT timeout, VPN flap,
# server-side TCP RST, proxy idle cull, etc.), the next use surfaces as
# one of a handful of low-level exceptions — most commonly
# ``botocore.exceptions.ConnectionClosedError`` or
# ``urllib3.exceptions.ProtocolError``. urllib3 also trips an internal
# ``assert`` in a couple of paths (connection pool state checks, chunked
# response readers) which bubbles up as a bare ``AssertionError`` with an
# empty ``str(exc)``.
#
# In all of these cases the client is the problem, not the request: retrying
# with the same cached client reproduces the failure until the process
# restarts. The fix is to evict the region's cached client so the next
# attempt builds a new one.
_STALE_LIB_MODULE_PREFIXES = (
"urllib3.",
"botocore.",
"boto3.",
)
def _traceback_frames_modules(exc: BaseException):
"""Yield ``__name__``-style module strings for each frame in exc's traceback."""
tb = getattr(exc, "__traceback__", None)
while tb is not None:
frame = tb.tb_frame
module = frame.f_globals.get("__name__", "")
yield module or ""
tb = tb.tb_next
def is_stale_connection_error(exc: BaseException) -> bool:
"""Return True if ``exc`` indicates a dead/stale Bedrock HTTP connection.
Matches:
* ``botocore.exceptions.ConnectionError`` and subclasses
(``ConnectionClosedError``, ``EndpointConnectionError``,
``ReadTimeoutError``, ``ConnectTimeoutError``).
* ``urllib3.exceptions.ProtocolError`` / ``NewConnectionError`` /
``ConnectionError`` (best-effort import urllib3 is a transitive
dependency of botocore so it is always available in practice).
* Bare ``AssertionError`` raised from a frame inside urllib3, botocore,
or boto3. These are internal-invariant failures (typically triggered
by corrupted connection-pool state after a dropped socket) and are
recoverable by swapping the client.
Non-library ``AssertionError``s (from application code or tests) are
intentionally not matched only library-internal asserts signal stale
connection state.
"""
# botocore: the canonical signal — HTTPClientError is the umbrella for
# ConnectionClosedError, ReadTimeoutError, EndpointConnectionError,
# ConnectTimeoutError, and ProxyConnectionError. ConnectionError covers
# the same family via a different branch of the hierarchy.
try:
from botocore.exceptions import (
ConnectionError as BotoConnectionError,
HTTPClientError,
)
botocore_errors: tuple = (BotoConnectionError, HTTPClientError)
except ImportError: # pragma: no cover — botocore always present with boto3
botocore_errors = ()
if botocore_errors and isinstance(exc, botocore_errors):
return True
# urllib3: low-level transport failures
try:
from urllib3.exceptions import (
ProtocolError,
NewConnectionError,
ConnectionError as Urllib3ConnectionError,
)
urllib3_errors = (ProtocolError, NewConnectionError, Urllib3ConnectionError)
except ImportError: # pragma: no cover
urllib3_errors = ()
if urllib3_errors and isinstance(exc, urllib3_errors):
return True
# Library-internal AssertionError (urllib3 / botocore / boto3)
if isinstance(exc, AssertionError):
for module in _traceback_frames_modules(exc):
if any(module.startswith(prefix) for prefix in _STALE_LIB_MODULE_PREFIXES):
return True
return False
# ---------------------------------------------------------------------------
# AWS credential detection
# ---------------------------------------------------------------------------
@ -787,7 +895,17 @@ def call_converse(
guardrail_config=guardrail_config,
)
response = client.converse(**kwargs)
try:
response = client.converse(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse(region=%s, model=%s): "
"%s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
return normalize_converse_response(response)
@ -819,7 +937,17 @@ def call_converse_stream(
guardrail_config=guardrail_config,
)
response = client.converse_stream(**kwargs)
try:
response = client.converse_stream(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse_stream(region=%s, "
"model=%s): %s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
return normalize_converse_stream_events(response)

View file

@ -23,6 +23,23 @@ from agent.prompt_builder import DEFAULT_AGENT_IDENTITY
logger = logging.getLogger(__name__)
# Matches Codex/Harmony tool-call serialization that occasionally leaks into
# assistant-message content when the model fails to emit a structured
# ``function_call`` item. Accepts the common forms:
#
# to=functions.exec_command
# assistant to=functions.exec_command
# <|channel|>commentary to=functions.exec_command
#
# ``to=functions.<name>`` is the stable marker — the optional ``assistant`` or
# Harmony channel prefix varies by degeneration mode. Case-insensitive to
# cover lowercase/uppercase ``assistant`` variants.
_TOOL_CALL_LEAK_PATTERN = re.compile(
r"(?:^|[\s>|])to=functions\.[A-Za-z_][\w.]*",
re.IGNORECASE,
)
# ---------------------------------------------------------------------------
# Multimodal content helpers
# ---------------------------------------------------------------------------
@ -787,6 +804,37 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if isinstance(out_text, str):
final_text = out_text.strip()
# ── Tool-call leak recovery ──────────────────────────────────
# gpt-5.x on the Codex Responses API sometimes degenerates and emits
# what should be a structured `function_call` item as plain assistant
# text using the Harmony/Codex serialization (``to=functions.foo
# {json}`` or ``assistant to=functions.foo {json}``). The model
# intended to call a tool, but the intent never made it into
# ``response.output`` as a ``function_call`` item, so ``tool_calls``
# is empty here. If we pass this through, the parent sees a
# confident-looking summary with no audit trail (empty ``tool_trace``)
# and no tools actually ran — the Taiwan-embassy-email incident.
#
# Detection: leaked tokens always contain ``to=functions.<name>`` and
# the assistant message has no real tool calls. Treat it as incomplete
# so the existing Codex-incomplete continuation path (3 retries,
# handled in run_agent.py) gets a chance to re-elicit a proper
# ``function_call`` item. The existing loop already handles message
# append, dedup, and retry budget.
leaked_tool_call_text = False
if final_text and not tool_calls and _TOOL_CALL_LEAK_PATTERN.search(final_text):
leaked_tool_call_text = True
logger.warning(
"Codex response contains leaked tool-call text in assistant content "
"(no structured function_call items). Treating as incomplete so the "
"continuation path can re-elicit a proper tool call. Leaked snippet: %r",
final_text[:300],
)
# Clear the text so downstream code doesn't surface the garbage as
# a summary. The encrypted reasoning items (if any) are preserved
# so the model keeps its chain-of-thought on the retry.
final_text = ""
assistant_message = SimpleNamespace(
content=final_text,
tool_calls=tool_calls,
@ -798,6 +846,8 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if tool_calls:
finish_reason = "tool_calls"
elif leaked_tool_call_text:
finish_reason = "incomplete"
elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
finish_reason = "incomplete"
elif reasoning_items_raw and not final_text:

View file

@ -294,6 +294,7 @@ class ContextCompressor(ContextEngine):
self._context_probed = False
self._context_probe_persistable = False
self._previous_summary = None
self._last_summary_error = None
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
@ -389,6 +390,7 @@ class ContextCompressor(ContextEngine):
self._last_compression_savings_pct: float = 100.0
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
self._last_summary_error: Optional[str] = None
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
@ -812,10 +814,12 @@ The user has requested that this compaction PRIORITISE preserving all informatio
self._previous_summary = summary
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
self._last_summary_error = None
return self._with_summary_prefix(summary)
except RuntimeError:
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
self._last_summary_error = "no auxiliary LLM provider configured"
logging.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
"for %d seconds.",
@ -853,6 +857,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
err_text = str(e).strip() or e.__class__.__name__
if len(err_text) > 220:
err_text = err_text[:217].rstrip() + "..."
self._last_summary_error = err_text
logging.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
@ -1099,6 +1107,21 @@ The user has requested that this compaction PRIORITISE preserving all informatio
return max(cut_idx, head_end + 1)
# ------------------------------------------------------------------
# ContextEngine: manual /compress preflight
# ------------------------------------------------------------------
def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
"""Return True if there is a non-empty middle region to compact.
Overrides the ABC default so the gateway ``/compress`` guard can
skip the LLM call when the transcript is still entirely inside
the protected head/tail.
"""
compress_start = self._align_boundary_forward(messages, self.protect_first_n)
compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
return compress_start < compress_end
# ------------------------------------------------------------------
# Main compression entry point
# ------------------------------------------------------------------

View file

@ -78,6 +78,7 @@ class ContextEngine(ABC):
self,
messages: List[Dict[str, Any]],
current_tokens: int = None,
focus_topic: str = None,
) -> List[Dict[str, Any]]:
"""Compact the message list and return the new message list.
@ -86,6 +87,12 @@ class ContextEngine(ABC):
context budget. The implementation is free to summarize, build a
DAG, or do anything else as long as the returned list is a valid
OpenAI-format message sequence.
Args:
focus_topic: Optional topic string from manual ``/compress <focus>``.
Engines that support guided compression should prioritise
preserving information related to this topic. Engines that
don't support it may simply ignore this argument.
"""
# -- Optional: pre-flight check ----------------------------------------
@ -98,6 +105,21 @@ class ContextEngine(ABC):
"""
return False
# -- Optional: manual /compress preflight ------------------------------
def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
"""Quick check: is there anything in ``messages`` that can be compacted?
Used by the gateway ``/compress`` command as a preflight guard
returning False lets the gateway report "nothing to compress yet"
without making an LLM call.
Default returns True (always attempt). Engines with a cheap way
to introspect their own head/tail boundaries should override this
to return False when the transcript is still entirely protected.
"""
return True
# -- Optional: session lifecycle ---------------------------------------
def on_session_start(self, session_id: str, **kwargs) -> None:

View file

@ -46,6 +46,47 @@ def _resolve_args() -> list[str]:
return shlex.split(raw)
def _resolve_home_dir() -> str:
"""Return a stable HOME for child ACP processes."""
try:
from hermes_constants import get_subprocess_home
profile_home = get_subprocess_home()
if profile_home:
return profile_home
except Exception:
pass
home = os.environ.get("HOME", "").strip()
if home:
return home
expanded = os.path.expanduser("~")
if expanded and expanded != "~":
return expanded
try:
import pwd
resolved = pwd.getpwuid(os.getuid()).pw_dir.strip()
if resolved:
return resolved
except Exception:
pass
# Last resort: /tmp (writable on any POSIX system). Avoids crashing the
# subprocess with no HOME; callers can set HERMES_HOME explicitly if they
# need a different writable dir.
return "/tmp"
def _build_subprocess_env() -> dict[str, str]:
env = os.environ.copy()
env["HOME"] = _resolve_home_dir()
return env
def _jsonrpc_error(message_id: Any, code: int, message: str) -> dict[str, Any]:
return {
"jsonrpc": "2.0",
@ -382,6 +423,7 @@ class CopilotACPClient:
text=True,
bufsize=1,
cwd=self._acp_cwd,
env=_build_subprocess_env(),
)
except FileNotFoundError as exc:
raise RuntimeError(

View file

@ -455,6 +455,61 @@ class CredentialPool:
logger.debug("Failed to sync from credentials file: %s", exc)
return entry
def _sync_nous_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync a Nous pool entry from auth.json if tokens differ.
Nous OAuth refresh tokens are single-use. When another process
(e.g. a concurrent cron) refreshes the token via
``resolve_nous_runtime_credentials``, it writes fresh tokens to
auth.json under ``_auth_store_lock``. The pool entry's tokens
become stale. This method detects that and adopts the newer pair,
avoiding a "refresh token reuse" revocation on the Nous Portal.
"""
if self.provider != "nous" or entry.source != "device_code":
return entry
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "nous")
if not state:
return entry
store_refresh = state.get("refresh_token", "")
store_access = state.get("access_token", "")
if store_refresh and store_refresh != entry.refresh_token:
logger.debug(
"Pool entry %s: syncing tokens from auth.json (Nous refresh token changed)",
entry.id,
)
field_updates: Dict[str, Any] = {
"access_token": store_access,
"refresh_token": store_refresh,
"last_status": None,
"last_status_at": None,
"last_error_code": None,
}
if state.get("expires_at"):
field_updates["expires_at"] = state["expires_at"]
if state.get("agent_key"):
field_updates["agent_key"] = state["agent_key"]
if state.get("agent_key_expires_at"):
field_updates["agent_key_expires_at"] = state["agent_key_expires_at"]
if state.get("inference_base_url"):
field_updates["inference_base_url"] = state["inference_base_url"]
extra_updates = dict(entry.extra)
for extra_key in ("obtained_at", "expires_in", "agent_key_id",
"agent_key_expires_in", "agent_key_reused",
"agent_key_obtained_at"):
val = state.get(extra_key)
if val is not None:
extra_updates[extra_key] = val
updated = replace(entry, extra=extra_updates, **field_updates)
self._replace_entry(entry, updated)
self._persist()
return updated
except Exception as exc:
logger.debug("Failed to sync Nous entry from auth.json: %s", exc)
return entry
def _sync_device_code_entry_to_auth_store(self, entry: PooledCredential) -> None:
"""Write refreshed pool entry tokens back to auth.json providers.
@ -561,6 +616,9 @@ class CredentialPool:
last_refresh=refreshed.get("last_refresh"),
)
elif self.provider == "nous":
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
nous_state = {
"access_token": entry.access_token,
"refresh_token": entry.refresh_token,
@ -635,6 +693,26 @@ class CredentialPool:
# Credentials file had a valid (non-expired) token — use it directly
logger.debug("Credentials file has valid token, using without refresh")
return synced
# For nous: another process may have consumed the refresh token
# between our proactive sync and the HTTP call. Re-sync from
# auth.json and adopt the fresh tokens if available.
if self.provider == "nous":
synced = self._sync_nous_entry_from_auth_store(entry)
if synced.refresh_token != entry.refresh_token:
logger.debug("Nous refresh failed but auth.json has newer tokens — adopting")
updated = replace(
synced,
last_status=STATUS_OK,
last_status_at=None,
last_error_code=None,
last_error_reason=None,
last_error_message=None,
last_error_reset_at=None,
)
self._replace_entry(synced, updated)
self._persist()
self._sync_device_code_entry_to_auth_store(updated)
return updated
self._mark_exhausted(entry, None)
return None
@ -698,6 +776,17 @@ class CredentialPool:
if synced is not entry:
entry = synced
cleared_any = True
# For nous entries, sync from auth.json before status checks.
# Another process may have successfully refreshed via
# resolve_nous_runtime_credentials(), making this entry's
# exhausted status stale.
if (self.provider == "nous"
and entry.source == "device_code"
and entry.last_status == STATUS_EXHAUSTED):
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
cleared_any = True
if entry.last_status == STATUS_EXHAUSTED:
exhausted_until = _exhausted_until(entry)
if exhausted_until is not None and now < exhausted_until:
@ -739,8 +828,11 @@ class CredentialPool:
if self._strategy == STRATEGY_LEAST_USED and len(available) > 1:
entry = min(available, key=lambda e: e.request_count)
# Increment usage counter so subsequent selections distribute load
updated = replace(entry, request_count=entry.request_count + 1)
self._replace_entry(entry, updated)
self._current_id = entry.id
return entry
return updated
if self._strategy == STRATEGY_ROUND_ROBIN and len(available) > 1:
entry = available[0]
@ -1056,6 +1148,18 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
"inference_base_url": state.get("inference_base_url"),
"agent_key": state.get("agent_key"),
"agent_key_expires_at": state.get("agent_key_expires_at"),
# Carry the mint/refresh timestamps into the pool so
# freshness-sensitive consumers (self-heal hooks, pool
# pruning by age) can distinguish just-minted credentials
# from stale ones. Without these, fresh device_code
# entries get obtained_at=None and look older than they
# are (#15099).
"obtained_at": state.get("obtained_at"),
"expires_in": state.get("expires_in"),
"agent_key_id": state.get("agent_key_id"),
"agent_key_expires_in": state.get("agent_key_expires_in"),
"agent_key_reused": state.get("agent_key_reused"),
"agent_key_obtained_at": state.get("agent_key_obtained_at"),
"tls": state.get("tls") if isinstance(state.get("tls"), dict) else None,
"label": seeded_label,
},
@ -1066,9 +1170,10 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
# env vars (COPILOT_GITHUB_TOKEN / GH_TOKEN). They don't live in
# the auth store or credential pool, so we resolve them here.
try:
from hermes_cli.copilot_auth import resolve_copilot_token
from hermes_cli.copilot_auth import resolve_copilot_token, get_copilot_api_token
token, source = resolve_copilot_token()
if token:
api_token = get_copilot_api_token(token)
source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
if not _is_suppressed(provider, source_name):
active_sources.add(source_name)
@ -1080,7 +1185,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
{
"source": source_name,
"auth_type": AUTH_TYPE_API_KEY,
"access_token": token,
"access_token": api_token,
"base_url": pconfig.inference_base_url if pconfig else "",
"label": source,
},

View file

@ -45,6 +45,7 @@ class FailoverReason(enum.Enum):
# Model
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
provider_policy_blocked = "provider_policy_blocked" # Aggregator (e.g. OpenRouter) blocked the only endpoint due to account data/privacy policy
# Request format
format_error = "format_error" # 400 bad request — abort or strip + retry
@ -194,6 +195,29 @@ _MODEL_NOT_FOUND_PATTERNS = [
"unsupported model",
]
# OpenRouter aggregator policy-block patterns.
#
# When a user's OpenRouter account privacy setting (or a per-request
# `provider.data_collection: deny` preference) excludes the only endpoint
# serving a model, OpenRouter returns 404 with a *specific* message that is
# distinct from "model not found":
#
# "No endpoints available matching your guardrail restrictions and
# data policy. Configure: https://openrouter.ai/settings/privacy"
#
# We classify this as `provider_policy_blocked` rather than
# `model_not_found` because:
# - The model *exists* — model_not_found is misleading in logs
# - Provider fallback won't help: the account-level setting applies to
# every call on the same OpenRouter account
# - The error body already contains the fix URL, so the user gets
# actionable guidance without us rewriting the message
_PROVIDER_POLICY_BLOCKED_PATTERNS = [
"no endpoints available matching your guardrail",
"no endpoints available matching your data policy",
"no endpoints found matching your data policy",
]
# Auth patterns (non-status-code signals)
_AUTH_PATTERNS = [
"invalid api key",
@ -319,6 +343,11 @@ def classify_api_error(
"""
status_code = _extract_status_code(error)
error_type = type(error).__name__
# Copilot/GitHub Models RateLimitError may not set .status_code; force 429
# so downstream rate-limit handling (classifier reason, pool rotation,
# fallback gating) fires correctly instead of misclassifying as generic.
if status_code is None and error_type == "RateLimitError":
status_code = 429
body = _extract_error_body(error)
error_code = _extract_error_code(body)
@ -523,6 +552,17 @@ def _classify_by_status(
return _classify_402(error_msg, result_fn)
if status_code == 404:
# OpenRouter policy-block 404 — distinct from "model not found".
# The model exists; the user's account privacy setting excludes the
# only endpoint serving it. Falling back to another provider won't
# help (same account setting applies). The error body already
# contains the fix URL, so just surface it.
if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
return result_fn(
FailoverReason.provider_policy_blocked,
retryable=False,
should_fallback=False,
)
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
return result_fn(
FailoverReason.model_not_found,
@ -640,6 +680,12 @@ def _classify_400(
)
# Some providers return model-not-found as 400 instead of 404 (e.g. OpenRouter).
if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
return result_fn(
FailoverReason.provider_policy_blocked,
retryable=False,
should_fallback=False,
)
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
return result_fn(
FailoverReason.model_not_found,
@ -812,6 +858,15 @@ def _classify_by_message(
should_fallback=True,
)
# Provider policy-block (aggregator-side guardrail) — check before
# model_not_found so we don't mis-label as a missing model.
if any(p in error_msg for p in _PROVIDER_POLICY_BLOCKED_PATTERNS):
return result_fn(
FailoverReason.provider_policy_blocked,
retryable=False,
should_fallback=False,
)
# Model not found patterns
if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
return result_fn(

View file

@ -44,6 +44,97 @@ def is_native_gemini_base_url(base_url: str) -> bool:
return not normalized.endswith("/openai")
def probe_gemini_tier(
api_key: str,
base_url: str = DEFAULT_GEMINI_BASE_URL,
*,
model: str = "gemini-2.5-flash",
timeout: float = 10.0,
) -> str:
"""Probe a Google AI Studio API key and return its tier.
Returns one of:
- ``"free"`` -- key is on the free tier (unusable with Hermes)
- ``"paid"`` -- key is on a paid tier
- ``"unknown"`` -- probe failed; callers should proceed without blocking.
"""
key = (api_key or "").strip()
if not key:
return "unknown"
normalized_base = str(base_url or DEFAULT_GEMINI_BASE_URL).strip().rstrip("/")
if not normalized_base:
normalized_base = DEFAULT_GEMINI_BASE_URL
if normalized_base.lower().endswith("/openai"):
normalized_base = normalized_base[: -len("/openai")]
url = f"{normalized_base}/models/{model}:generateContent"
payload = {
"contents": [{"role": "user", "parts": [{"text": "hi"}]}],
"generationConfig": {"maxOutputTokens": 1},
}
try:
with httpx.Client(timeout=timeout) as client:
resp = client.post(
url,
params={"key": key},
json=payload,
headers={"Content-Type": "application/json"},
)
except Exception as exc:
logger.debug("probe_gemini_tier: network error: %s", exc)
return "unknown"
headers_lower = {k.lower(): v for k, v in resp.headers.items()}
rpd_header = headers_lower.get("x-ratelimit-limit-requests-per-day")
if rpd_header:
try:
rpd_val = int(rpd_header)
except (TypeError, ValueError):
rpd_val = None
# Published free-tier daily caps (Dec 2025):
# gemini-2.5-pro: 100, gemini-2.5-flash: 250, flash-lite: 1000
# Tier 1 starts at ~1500+ for Flash. We treat <= 1000 as free.
if rpd_val is not None and rpd_val <= 1000:
return "free"
if rpd_val is not None and rpd_val > 1000:
return "paid"
if resp.status_code == 429:
body_text = ""
try:
body_text = resp.text or ""
except Exception:
body_text = ""
if "free_tier" in body_text.lower():
return "free"
return "paid"
if 200 <= resp.status_code < 300:
return "paid"
return "unknown"
def is_free_tier_quota_error(error_message: str) -> bool:
"""Return True when a Gemini 429 message indicates free-tier exhaustion."""
if not error_message:
return False
return "free_tier" in error_message.lower()
_FREE_TIER_GUIDANCE = (
"\n\nYour Google API key is on the free tier (<= 250 requests/day for "
"gemini-2.5-flash). Hermes typically makes 3-10 API calls per user turn, "
"so the free tier is exhausted in a handful of messages and cannot sustain "
"an agent session. Enable billing on your Google Cloud project and "
"regenerate the key in a billing-enabled project: "
"https://aistudio.google.com/apikey"
)
class GeminiAPIError(Exception):
"""Error shape compatible with Hermes retry/error classification."""
@ -650,6 +741,12 @@ def gemini_http_error(response: httpx.Response) -> GeminiAPIError:
else:
message = f"Gemini returned HTTP {status}: {body_text[:500]}"
# Free-tier quota exhaustion -> append actionable guidance so users who
# bypassed the setup wizard (direct GOOGLE_API_KEY in .env) still learn
# that the free tier cannot sustain an agent session.
if status == 429 and is_free_tier_quota_error(err_message or body_text):
message = message + _FREE_TIER_GUIDANCE
return GeminiAPIError(
message,
code=code,
@ -704,6 +801,13 @@ class GeminiNativeClient:
http_client: Optional[httpx.Client] = None,
**_: Any,
) -> None:
if not (api_key or "").strip():
raise RuntimeError(
"Gemini native client requires an API key, but none was provided. "
"Set GOOGLE_API_KEY or GEMINI_API_KEY in your environment / ~/.hermes/.env "
"(get one at https://aistudio.google.com/app/apikey), or run `hermes setup` "
"to configure the Google provider."
)
self.api_key = api_key
normalized_base = (base_url or DEFAULT_GEMINI_BASE_URL).rstrip("/")
if normalized_base.endswith("/openai"):

View file

@ -73,6 +73,20 @@ def sanitize_gemini_schema(schema: Any) -> Dict[str, Any]:
]
continue
cleaned[key] = value
# Gemini's Schema validator requires every ``enum`` entry to be a string,
# even when the parent ``type`` is ``integer`` / ``number`` / ``boolean``.
# OpenAI / OpenRouter / Anthropic accept typed enums (e.g. Discord's
# ``auto_archive_duration: {type: integer, enum: [60, 1440, 4320, 10080]}``),
# so we only drop the ``enum`` when it would collide with Gemini's rule.
# Keeping ``type: integer`` plus the human-readable description gives the
# model enough guidance; the tool handler still validates the value.
enum_val = cleaned.get("enum")
type_val = cleaned.get("type")
if isinstance(enum_val, list) and type_val in {"integer", "number", "boolean"}:
if any(not isinstance(item, str) for item in enum_val):
cleaned.pop("enum", None)
return cleaned

View file

@ -31,6 +31,7 @@ from __future__ import annotations
import json
import logging
import re
import inspect
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
@ -312,7 +313,39 @@ class MemoryManager:
)
return "\n\n".join(parts)
def on_memory_write(self, action: str, target: str, content: str) -> None:
@staticmethod
def _provider_memory_write_metadata_mode(provider: MemoryProvider) -> str:
"""Return how to pass metadata to a provider's memory-write hook."""
try:
signature = inspect.signature(provider.on_memory_write)
except (TypeError, ValueError):
return "keyword"
params = list(signature.parameters.values())
if any(p.kind == inspect.Parameter.VAR_KEYWORD for p in params):
return "keyword"
if "metadata" in signature.parameters:
return "keyword"
accepted = [
p for p in params
if p.kind in (
inspect.Parameter.POSITIONAL_ONLY,
inspect.Parameter.POSITIONAL_OR_KEYWORD,
inspect.Parameter.KEYWORD_ONLY,
)
]
if len(accepted) >= 4:
return "positional"
return "legacy"
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""Notify external providers when the built-in memory tool writes.
Skips the builtin provider itself (it's the source of the write).
@ -321,7 +354,15 @@ class MemoryManager:
if provider.name == "builtin":
continue
try:
provider.on_memory_write(action, target, content)
metadata_mode = self._provider_memory_write_metadata_mode(provider)
if metadata_mode == "keyword":
provider.on_memory_write(
action, target, content, metadata=dict(metadata or {})
)
elif metadata_mode == "positional":
provider.on_memory_write(action, target, content, dict(metadata or {}))
else:
provider.on_memory_write(action, target, content)
except Exception as e:
logger.debug(
"Memory provider '%s' on_memory_write failed: %s",

View file

@ -26,7 +26,7 @@ Optional hooks (override to opt in):
on_turn_start(turn, message, **kwargs) per-turn tick with runtime context
on_session_end(messages) end-of-session extraction
on_pre_compress(messages) -> str extract before context compression
on_memory_write(action, target, content) mirror built-in memory writes
on_memory_write(action, target, content, metadata=None) mirror built-in memory writes
on_delegation(task, result, **kwargs) parent-side observation of subagent work
"""
@ -34,7 +34,7 @@ from __future__ import annotations
import logging
from abc import ABC, abstractmethod
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
@ -220,12 +220,21 @@ class MemoryProvider(ABC):
should all have ``env_var`` set and this method stays no-op).
"""
def on_memory_write(self, action: str, target: str, content: str) -> None:
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""Called when the built-in memory tool writes an entry.
action: 'add', 'replace', or 'remove'
target: 'memory' or 'user'
content: the entry content
metadata: structured provenance for the write, when available. Common
keys include ``write_origin``, ``execution_context``, ``session_id``,
``parent_session_id``, ``platform``, and ``tool_name``.
Use to mirror built-in memory writes to your backend.
"""

View file

@ -6,6 +6,7 @@ and run_agent.py for pre-flight context checks.
import ipaddress
import logging
import os
import re
import time
from pathlib import Path
@ -21,6 +22,25 @@ from hermes_constants import OPENROUTER_MODELS_URL
logger = logging.getLogger(__name__)
def _resolve_requests_verify() -> bool | str:
"""Resolve SSL verify setting for `requests` calls from env vars.
The `requests` library only honours REQUESTS_CA_BUNDLE / CURL_CA_BUNDLE
by default. Hermes also honours HERMES_CA_BUNDLE (its own convention)
and SSL_CERT_FILE (used by the stdlib `ssl` module and by httpx), so
that a single env var can cover both `requests` and `httpx` callsites
inside the same process.
Returns either a filesystem path to a CA bundle, or True to defer to
the requests default (certifi).
"""
for env_var in ("HERMES_CA_BUNDLE", "REQUESTS_CA_BUNDLE", "SSL_CERT_FILE"):
val = os.getenv(env_var)
if val and os.path.isfile(val):
return val
return True
# Provider names that can appear as a "provider:" prefix before a model ID.
# Only these are stripped — Ollama-style "model:tag" colons (e.g. "qwen3.5:27b")
# are preserved so the full model name reaches cache lookups and server queries.
@ -123,8 +143,9 @@ DEFAULT_CONTEXT_LENGTHS = {
"claude": 200000,
# OpenAI — GPT-5 family (most have 400k; specific overrides first)
# Source: https://developers.openai.com/api/docs/models
# GPT-5.5 (launched Apr 23 2026). Verified via live ChatGPT codex/models
# endpoint: bare slug `gpt-5.5`, no -pro/-mini variants. 400k context on Codex.
# GPT-5.5 (launched Apr 23 2026). 400k is the fallback for providers we
# can't probe live. ChatGPT Codex OAuth actually caps lower (272k as of
# Apr 2026) and is resolved via _resolve_codex_oauth_context_length().
"gpt-5.5": 400000,
"gpt-5.4-nano": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4-mini": 400000, # 400k (not 1.05M like full 5.4)
@ -494,7 +515,7 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
return _model_metadata_cache
try:
response = requests.get(OPENROUTER_MODELS_URL, timeout=10)
response = requests.get(OPENROUTER_MODELS_URL, timeout=10, verify=_resolve_requests_verify())
response.raise_for_status()
data = response.json()
@ -561,6 +582,7 @@ def fetch_endpoint_model_metadata(
server_url.rstrip("/") + "/api/v1/models",
headers=headers,
timeout=10,
verify=_resolve_requests_verify(),
)
response.raise_for_status()
payload = response.json()
@ -609,7 +631,7 @@ def fetch_endpoint_model_metadata(
for candidate in candidates:
url = candidate.rstrip("/") + "/models"
try:
response = requests.get(url, headers=headers, timeout=10)
response = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
response.raise_for_status()
payload = response.json()
cache: Dict[str, Dict[str, Any]] = {}
@ -640,9 +662,10 @@ def fetch_endpoint_model_metadata(
try:
# Try /v1/props first (current llama.cpp); fall back to /props for older builds
base = candidate.rstrip("/").replace("/v1", "")
props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5)
_verify = _resolve_requests_verify()
props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5, verify=_verify)
if not props_resp.ok:
props_resp = requests.get(base + "/props", headers=headers, timeout=5)
props_resp = requests.get(base + "/props", headers=headers, timeout=5, verify=_verify)
if props_resp.ok:
props = props_resp.json()
gen_settings = props.get("default_generation_settings", {})
@ -714,6 +737,22 @@ def get_cached_context_length(model: str, base_url: str) -> Optional[int]:
return cache.get(key)
def _invalidate_cached_context_length(model: str, base_url: str) -> None:
"""Drop a stale cache entry so it gets re-resolved on the next lookup."""
key = f"{model}@{base_url}"
cache = _load_context_cache()
if key not in cache:
return
del cache[key]
path = _get_context_cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
except Exception as e:
logger.debug("Failed to invalidate context length cache entry %s: %s", key, e)
def get_next_probe_tier(current_length: int) -> Optional[int]:
"""Return the next lower probe tier, or None if already at minimum."""
for tier in CONTEXT_PROBE_TIERS:
@ -991,7 +1030,7 @@ def _query_anthropic_context_length(model: str, base_url: str, api_key: str) ->
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
}
resp = requests.get(url, headers=headers, timeout=10)
resp = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
if resp.status_code != 200:
return None
data = resp.json()
@ -1005,6 +1044,116 @@ def _query_anthropic_context_length(model: str, base_url: str, api_key: str) ->
return None
# Known ChatGPT Codex OAuth context windows (observed via live
# chatgpt.com/backend-api/codex/models probe, Apr 2026). These are the
# `context_window` values, which are what Codex actually enforces — the
# direct OpenAI API has larger limits for the same slugs, but Codex OAuth
# caps lower (e.g. gpt-5.5 is 1.05M on the API, 272K on Codex).
#
# Used as a fallback when the live probe fails (no token, network error).
# Longest keys first so substring match picks the most specific entry.
_CODEX_OAUTH_CONTEXT_FALLBACK: Dict[str, int] = {
"gpt-5.1-codex-max": 272_000,
"gpt-5.1-codex-mini": 272_000,
"gpt-5.3-codex": 272_000,
"gpt-5.2-codex": 272_000,
"gpt-5.4-mini": 272_000,
"gpt-5.5": 272_000,
"gpt-5.4": 272_000,
"gpt-5.2": 272_000,
"gpt-5": 272_000,
}
_codex_oauth_context_cache: Dict[str, int] = {}
_codex_oauth_context_cache_time: float = 0.0
_CODEX_OAUTH_CONTEXT_CACHE_TTL = 3600 # 1 hour
def _fetch_codex_oauth_context_lengths(access_token: str) -> Dict[str, int]:
"""Probe the ChatGPT Codex /models endpoint for per-slug context windows.
Codex OAuth imposes its own context limits that differ from the direct
OpenAI API (e.g. gpt-5.5 is 1.05M on the API, 272K on Codex). The
`context_window` field in each model entry is the authoritative source.
Returns a ``{slug: context_window}`` dict. Empty on failure.
"""
global _codex_oauth_context_cache, _codex_oauth_context_cache_time
now = time.time()
if (
_codex_oauth_context_cache
and now - _codex_oauth_context_cache_time < _CODEX_OAUTH_CONTEXT_CACHE_TTL
):
return _codex_oauth_context_cache
try:
resp = requests.get(
"https://chatgpt.com/backend-api/codex/models?client_version=1.0.0",
headers={"Authorization": f"Bearer {access_token}"},
timeout=10,
verify=_resolve_requests_verify(),
)
if resp.status_code != 200:
logger.debug(
"Codex /models probe returned HTTP %s; falling back to hardcoded defaults",
resp.status_code,
)
return {}
data = resp.json()
except Exception as exc:
logger.debug("Codex /models probe failed: %s", exc)
return {}
entries = data.get("models", []) if isinstance(data, dict) else []
result: Dict[str, int] = {}
for item in entries:
if not isinstance(item, dict):
continue
slug = item.get("slug")
ctx = item.get("context_window")
if isinstance(slug, str) and isinstance(ctx, int) and ctx > 0:
result[slug.strip()] = ctx
if result:
_codex_oauth_context_cache = result
_codex_oauth_context_cache_time = now
return result
def _resolve_codex_oauth_context_length(
model: str, access_token: str = ""
) -> Optional[int]:
"""Resolve a Codex OAuth model's real context window.
Prefers a live probe of chatgpt.com/backend-api/codex/models (when we
have a bearer token), then falls back to ``_CODEX_OAUTH_CONTEXT_FALLBACK``.
"""
model_bare = _strip_provider_prefix(model).strip()
if not model_bare:
return None
if access_token:
live = _fetch_codex_oauth_context_lengths(access_token)
if model_bare in live:
return live[model_bare]
# Case-insensitive match in case casing drifts
model_lower = model_bare.lower()
for slug, ctx in live.items():
if slug.lower() == model_lower:
return ctx
# Fallback: longest-key-first substring match over hardcoded defaults.
model_lower = model_bare.lower()
for slug, ctx in sorted(
_CODEX_OAUTH_CONTEXT_FALLBACK.items(), key=lambda x: len(x[0]), reverse=True
):
if slug in model_lower:
return ctx
return None
def _resolve_nous_context_length(model: str) -> Optional[int]:
"""Resolve Nous Portal model context length via OpenRouter metadata.
@ -1050,6 +1199,7 @@ def get_model_context_length(
Resolution order:
0. Explicit config override (model.context_length or custom_providers per-model)
1. Persistent cache (previously discovered via probing)
1b. AWS Bedrock static table (must precede custom-endpoint probe)
2. Active endpoint metadata (/models for explicit custom endpoints)
3. Local server query (for local endpoints)
4. Anthropic /v1/models API (API-key users only, not OAuth)
@ -1072,7 +1222,41 @@ def get_model_context_length(
if base_url:
cached = get_cached_context_length(model, base_url)
if cached is not None:
return cached
# Invalidate stale Codex OAuth cache entries: pre-PR #14935 builds
# resolved gpt-5.x to the direct-API value (e.g. 1.05M) via
# models.dev and persisted it. Codex OAuth caps at 272K for every
# slug, so any cached Codex entry at or above 400K is a leftover
# from the old resolution path. Drop it and fall through to the
# live /models probe in step 5 below.
if provider == "openai-codex" and cached >= 400_000:
logger.info(
"Dropping stale Codex cache entry %s@%s -> %s (pre-fix value); "
"re-resolving via live /models probe",
model, base_url, f"{cached:,}",
)
_invalidate_cached_context_length(model, base_url)
else:
return cached
# 1b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels API doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py that reflects
# AWS-imposed limits (e.g. 200K for Claude models vs 1M on the native
# Anthropic API). This must run BEFORE the custom-endpoint probe at
# step 2 — bedrock-runtime.<region>.amazonaws.com is not in
# _URL_TO_PROVIDER, so it would otherwise be treated as a custom endpoint,
# fail the /models probe (Bedrock doesn't expose that shape), and fall
# back to the 128K default before reaching the original step 4b branch.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 2. Active endpoint metadata for truly custom/unknown endpoints.
# Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
@ -1119,19 +1303,7 @@ def get_model_context_length(
if ctx:
return ctx
# 4b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 4b. (Bedrock handled earlier at step 1b — before custom-endpoint probe.)
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
@ -1145,10 +1317,32 @@ def get_model_context_length(
if inferred:
effective_provider = inferred
# 5a. Copilot live /models API — max_prompt_tokens from the user's account.
# This catches account-specific models (e.g. claude-opus-4.6-1m) that
# don't exist in models.dev. For models that ARE in models.dev, this
# returns the provider-enforced limit which is what users can actually use.
if effective_provider in ("copilot", "copilot-acp", "github-copilot"):
try:
from hermes_cli.models import get_copilot_model_context
ctx = get_copilot_model_context(model, api_key=api_key)
if ctx:
return ctx
except Exception:
pass # Fall through to models.dev
if effective_provider == "nous":
ctx = _resolve_nous_context_length(model)
if ctx:
return ctx
if effective_provider == "openai-codex":
# Codex OAuth enforces lower context limits than the direct OpenAI
# API for the same slug (e.g. gpt-5.5 is 1.05M on the API but 272K
# on Codex). Authoritative source is Codex's own /models endpoint.
codex_ctx = _resolve_codex_oauth_context_length(model, access_token=api_key or "")
if codex_ctx:
if base_url:
save_context_length(model, base_url, codex_ctx)
return codex_ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)

190
agent/moonshot_schema.py Normal file
View file

@ -0,0 +1,190 @@
"""Helpers for translating OpenAI-style tool schemas to Moonshot's schema subset.
Moonshot (Kimi) accepts a stricter subset of JSON Schema than standard OpenAI
tool calling. Requests that violate it fail with HTTP 400:
tools.function.parameters is not a valid moonshot flavored json schema,
details: <...>
Known rejection modes documented at
https://forum.moonshot.ai/t/tool-calling-specification-violation-on-moonshot-api/102
and MoonshotAI/kimi-cli#1595:
1. Every property schema must carry a ``type``. Standard JSON Schema allows
type to be omitted (the value is then unconstrained); Moonshot refuses.
2. When ``anyOf`` is used, ``type`` must be on the ``anyOf`` children, not
the parent. Presence of both causes "type should be defined in anyOf
items instead of the parent schema".
The ``#/definitions/...`` → ``#/$defs/...`` rewrite for draft-07 refs is
handled separately in ``tools/mcp_tool._normalize_mcp_input_schema`` so it
applies at MCP registration time for all providers.
"""
from __future__ import annotations
import copy
from typing import Any, Dict, List
# Keys whose values are maps of name → schema (not schemas themselves).
# When we recurse, we walk the values of these maps as schemas, but we do
# NOT apply the missing-type repair to the map itself.
_SCHEMA_MAP_KEYS = frozenset({"properties", "patternProperties", "$defs", "definitions"})
# Keys whose values are lists of schemas.
_SCHEMA_LIST_KEYS = frozenset({"anyOf", "oneOf", "allOf", "prefixItems"})
# Keys whose values are a single nested schema.
_SCHEMA_NODE_KEYS = frozenset({"items", "contains", "not", "additionalProperties", "propertyNames"})
def _repair_schema(node: Any, is_schema: bool = True) -> Any:
"""Recursively apply Moonshot repairs to a schema node.
``is_schema=True`` means this dict is a JSON Schema node and gets the
missing-type + anyOf-parent repairs applied. ``is_schema=False`` means
it's a container map (e.g. the value of ``properties``) and we only
recurse into its values.
"""
if isinstance(node, list):
# Lists only show up under schema-list keys (anyOf/oneOf/allOf), so
# every element is itself a schema.
return [_repair_schema(item, is_schema=True) for item in node]
if not isinstance(node, dict):
return node
# Walk the dict, deciding per-key whether recursion is into a schema
# node, a container map, or a scalar.
repaired: Dict[str, Any] = {}
for key, value in node.items():
if key in _SCHEMA_MAP_KEYS and isinstance(value, dict):
# Map of name → schema. Don't treat the map itself as a schema
# (it has no type / properties of its own), but each value is.
repaired[key] = {
sub_key: _repair_schema(sub_val, is_schema=True)
for sub_key, sub_val in value.items()
}
elif key in _SCHEMA_LIST_KEYS and isinstance(value, list):
repaired[key] = [_repair_schema(v, is_schema=True) for v in value]
elif key in _SCHEMA_NODE_KEYS:
# items / not / additionalProperties: single nested schema.
# additionalProperties can also be a bool — leave those alone.
if isinstance(value, dict):
repaired[key] = _repair_schema(value, is_schema=True)
else:
repaired[key] = value
else:
# Scalars (description, title, format, enum values, etc.) pass through.
repaired[key] = value
if not is_schema:
return repaired
# Rule 2: when anyOf is present, type belongs only on the children.
if "anyOf" in repaired and isinstance(repaired["anyOf"], list):
repaired.pop("type", None)
return repaired
# Rule 1: property schemas without type need one. $ref nodes are exempt
# — their type comes from the referenced definition.
if "$ref" in repaired:
return repaired
return _fill_missing_type(repaired)
def _fill_missing_type(node: Dict[str, Any]) -> Dict[str, Any]:
"""Infer a reasonable ``type`` if this schema node has none."""
if "type" in node and node["type"] not in (None, ""):
return node
# Heuristic: presence of ``properties`` → object, ``items`` → array, ``enum``
# → type of first enum value, else fall back to ``string`` (safest scalar).
if "properties" in node or "required" in node or "additionalProperties" in node:
inferred = "object"
elif "items" in node or "prefixItems" in node:
inferred = "array"
elif "enum" in node and isinstance(node["enum"], list) and node["enum"]:
sample = node["enum"][0]
if isinstance(sample, bool):
inferred = "boolean"
elif isinstance(sample, int):
inferred = "integer"
elif isinstance(sample, float):
inferred = "number"
else:
inferred = "string"
else:
inferred = "string"
return {**node, "type": inferred}
def sanitize_moonshot_tool_parameters(parameters: Any) -> Dict[str, Any]:
"""Normalize tool parameters to a Moonshot-compatible object schema.
Returns a deep-copied schema with the two flavored-JSON-Schema repairs
applied. Input is not mutated.
"""
if not isinstance(parameters, dict):
return {"type": "object", "properties": {}}
repaired = _repair_schema(copy.deepcopy(parameters), is_schema=True)
if not isinstance(repaired, dict):
return {"type": "object", "properties": {}}
# Top-level must be an object schema
if repaired.get("type") != "object":
repaired["type"] = "object"
if "properties" not in repaired:
repaired["properties"] = {}
return repaired
def sanitize_moonshot_tools(tools: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Apply ``sanitize_moonshot_tool_parameters`` to every tool's parameters."""
if not tools:
return tools
sanitized: List[Dict[str, Any]] = []
any_change = False
for tool in tools:
if not isinstance(tool, dict):
sanitized.append(tool)
continue
fn = tool.get("function")
if not isinstance(fn, dict):
sanitized.append(tool)
continue
params = fn.get("parameters")
repaired = sanitize_moonshot_tool_parameters(params)
if repaired is not params:
any_change = True
new_fn = {**fn, "parameters": repaired}
sanitized.append({**tool, "function": new_fn})
else:
sanitized.append(tool)
return sanitized if any_change else tools
def is_moonshot_model(model: str | None) -> bool:
"""True for any Kimi / Moonshot model slug, regardless of aggregator prefix.
Matches bare names (``kimi-k2.6``, ``moonshotai/Kimi-K2.6``) and aggregator-
prefixed slugs (``nous/moonshotai/kimi-k2.6``, ``openrouter/moonshotai/...``).
Detection by model name covers Nous / OpenRouter / other aggregators that
route to Moonshot's inference, where the base URL is the aggregator's, not
``api.moonshot.ai``.
"""
if not model:
return False
bare = model.strip().lower()
# Last path segment (covers aggregator-prefixed slugs)
tail = bare.rsplit("/", 1)[-1]
if tail.startswith("kimi-") or tail == "kimi":
return True
# Vendor-prefixed forms commonly used on aggregators
if "moonshot" in bare or "/kimi" in bare or bare.startswith("kimi"):
return True
return False

View file

@ -1,154 +1,29 @@
"""Shared slash command helpers for skills and built-in prompt-style modes.
"""Shared slash command helpers for skills.
Shared between CLI (cli.py) and gateway (gateway/run.py) so both surfaces
can invoke skills via /skill-name commands and prompt-only built-ins like
/plan.
can invoke skills via /skill-name commands.
"""
import json
import logging
import re
import subprocess
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import display_hermes_home
from agent.skill_preprocessing import (
expand_inline_shell as _expand_inline_shell,
load_skills_config as _load_skills_config,
substitute_template_vars as _substitute_template_vars,
)
logger = logging.getLogger(__name__)
_skill_commands: Dict[str, Dict[str, Any]] = {}
_PLAN_SLUG_RE = re.compile(r"[^a-z0-9]+")
# Patterns for sanitizing skill names into clean hyphen-separated slugs.
_SKILL_INVALID_CHARS = re.compile(r"[^a-z0-9-]")
_SKILL_MULTI_HYPHEN = re.compile(r"-{2,}")
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only — no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def _load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def _substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def _run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return f"[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "…[truncated]"
return output
def _expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return _run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def build_plan_path(
user_instruction: str = "",
*,
now: datetime | None = None,
) -> Path:
"""Return the default workspace-relative markdown path for a /plan invocation.
Relative paths are intentional: file tools are task/backend-aware and resolve
them against the active working directory for local, docker, ssh, modal,
daytona, and similar terminal backends. That keeps the plan with the active
workspace instead of the Hermes host's global home directory.
"""
slug_source = (user_instruction or "").strip().splitlines()[0] if user_instruction else ""
slug = _PLAN_SLUG_RE.sub("-", slug_source.lower()).strip("-")
if slug:
slug = "-".join(part for part in slug.split("-")[:8] if part)[:48].strip("-")
slug = slug or "conversation-plan"
timestamp = (now or datetime.now()).strftime("%Y-%m-%d_%H%M%S")
return Path(".hermes") / "plans" / f"{timestamp}-{slug}.md"
def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tuple[dict[str, Any], Path | None, str] | None:
"""Load a skill by name/path and return (loaded_payload, skill_dir, display_name)."""
raw_identifier = (skill_identifier or "").strip()
@ -167,7 +42,9 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
else:
normalized = raw_identifier.lstrip("/")
loaded_skill = json.loads(skill_view(normalized, task_id=task_id))
loaded_skill = json.loads(
skill_view(normalized, task_id=task_id, preprocess=False)
)
except Exception:
return None

View file

@ -0,0 +1,131 @@
"""Shared SKILL.md preprocessing helpers."""
import logging
import re
import subprocess
from pathlib import Path
logger = logging.getLogger(__name__)
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only -- no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available --
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return "[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "...[truncated]"
return output
def expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def preprocess_skill_content(
content: str,
skill_dir: Path | None,
session_id: str | None = None,
skills_cfg: dict | None = None,
) -> str:
"""Apply configured SKILL.md template and inline-shell preprocessing."""
if not content:
return content
cfg = skills_cfg if isinstance(skills_cfg, dict) else load_skills_config()
if cfg.get("template_vars", True):
content = substitute_template_vars(content, skill_dir, session_id)
if cfg.get("inline_shell", False):
timeout = int(cfg.get("inline_shell_timeout", 10) or 10)
content = expand_inline_shell(content, skill_dir, timeout)
return content

View file

@ -12,6 +12,7 @@ reasoning configuration, temperature handling, and extra_body assembly.
import copy
from typing import Any, Dict, List, Optional
from agent.moonshot_schema import is_moonshot_model, sanitize_moonshot_tools
from agent.prompt_builder import DEVELOPER_ROLE_MODELS
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall, Usage
@ -172,6 +173,11 @@ class ChatCompletionsTransport(ProviderTransport):
# Tools
if tools:
# Moonshot/Kimi uses a stricter flavored JSON Schema. Rewriting
# tool parameters here keeps aggregator routes (Nous, OpenRouter,
# etc.) compatible, in addition to direct moonshot.ai endpoints.
if is_moonshot_model(model):
tools = sanitize_moonshot_tools(tools)
api_kwargs["tools"] = tools
# max_tokens resolution — priority: ephemeral > user > provider default

View file

@ -951,13 +951,9 @@ class BatchRunner:
root_logger.setLevel(original_level)
# Aggregate all batch statistics and update checkpoint
all_completed_prompts = list(completed_prompts_set)
total_reasoning_stats = {"total_assistant_turns": 0, "turns_with_reasoning": 0, "turns_without_reasoning": 0}
for batch_result in results:
# Add newly completed prompts
all_completed_prompts.extend(batch_result.get("completed_prompts", []))
# Aggregate tool stats
for tool_name, stats in batch_result.get("tool_stats", {}).items():
if tool_name not in total_tool_stats:
@ -977,7 +973,7 @@ class BatchRunner:
# Save final checkpoint (best-effort; incremental writes already happened)
try:
checkpoint_data["completed_prompts"] = all_completed_prompts
checkpoint_data["completed_prompts"] = sorted(completed_prompts_set)
self._save_checkpoint(checkpoint_data, lock=checkpoint_lock)
except Exception as ckpt_err:
print(f"⚠️ Warning: Failed to save final checkpoint: {ckpt_err}")

View file

@ -326,6 +326,16 @@ compression:
# To pin a specific model/provider for compression summaries, use the
# auxiliary section below (auxiliary.compression.provider / model).
# =============================================================================
# Anthropic prompt caching TTL
# =============================================================================
# When prompt caching is active (Claude via OpenRouter or native Anthropic),
# Anthropic supports two TTL tiers for cached prefixes: "5m" (default) and
# "1h". Other values are ignored and "5m" is used.
#
prompt_caching:
cache_ttl: "5m" # use "1h" for long sessions with pauses between turns
# =============================================================================
# Auxiliary Models (Advanced — Experimental)
# =============================================================================

359
cli.py
View file

@ -1688,7 +1688,6 @@ def _looks_like_slash_command(text: str) -> bool:
from agent.skill_commands import (
scan_skill_commands,
build_skill_invocation_message,
build_plan_path,
build_preloaded_skills_prompt,
)
@ -3084,6 +3083,8 @@ class HermesCLI:
format_runtime_provider_error,
)
_primary_exc = None
runtime = None
try:
runtime = resolve_runtime_provider(
requested=self.requested_provider,
@ -3091,7 +3092,34 @@ class HermesCLI:
explicit_base_url=self._explicit_base_url,
)
except Exception as exc:
message = format_runtime_provider_error(exc)
_primary_exc = exc
# Primary provider auth failed — try fallback providers before giving up.
if runtime is None and _primary_exc is not None:
from hermes_cli.auth import AuthError
if isinstance(_primary_exc, AuthError):
_fb_chain = self._fallback_model if isinstance(self._fallback_model, list) else []
for _fb in _fb_chain:
_fb_provider = (_fb.get("provider") or "").strip().lower()
_fb_model = (_fb.get("model") or "").strip()
if not _fb_provider or not _fb_model:
continue
try:
runtime = resolve_runtime_provider(requested=_fb_provider)
logger.warning(
"Primary provider auth failed (%s). Falling through to fallback: %s/%s",
_primary_exc, _fb_provider, _fb_model,
)
_cprint(f"⚠️ Primary auth failed — switching to fallback: {_fb_provider} / {_fb_model}")
self.requested_provider = _fb_provider
self.model = _fb_model
_primary_exc = None
break
except Exception:
continue
if runtime is None:
message = format_runtime_provider_error(_primary_exc) if _primary_exc else "Provider resolution failed."
ChatConsole().print(f"[bold red]{message}[/]")
return False
@ -3254,6 +3282,23 @@ class HermesCLI:
_cprint(f"\033[1;31mSession not found: {self.session_id}{_RST}")
_cprint(f"{_DIM}Use a session ID from a previous CLI run (hermes sessions list).{_RST}")
return False
# If the requested session is the (empty) head of a compression
# chain, walk to the descendant that actually holds the messages.
# See #15000 and SessionDB.resolve_resume_session_id.
try:
resolved_id = self._session_db.resolve_resume_session_id(self.session_id)
except Exception:
resolved_id = self.session_id
if resolved_id and resolved_id != self.session_id:
ChatConsole().print(
f"[{_DIM}]Session {_escape(self.session_id)} was compressed into "
f"{_escape(resolved_id)}; resuming the descendant with your "
f"transcript.[/]"
)
self.session_id = resolved_id
resolved_meta = self._session_db.get_session(self.session_id)
if resolved_meta:
session_meta = resolved_meta
restored = self._session_db.get_messages_as_conversation(self.session_id)
if restored:
restored = [m for m in restored if m.get("role") != "session_meta"]
@ -3472,6 +3517,22 @@ class HermesCLI:
)
return False
# If the requested session is the (empty) head of a compression chain,
# walk to the descendant that actually holds the messages. See #15000.
try:
resolved_id = self._session_db.resolve_resume_session_id(self.session_id)
except Exception:
resolved_id = self.session_id
if resolved_id and resolved_id != self.session_id:
self._console_print(
f"[dim]Session {self.session_id} was compressed into "
f"{resolved_id}; resuming the descendant with your transcript.[/]"
)
self.session_id = resolved_id
resolved_meta = self._session_db.get_session(self.session_id)
if resolved_meta:
session_meta = resolved_meta
restored = self._session_db.get_messages_as_conversation(self.session_id)
if restored:
restored = [m for m in restored if m.get("role") != "session_meta"]
@ -4686,6 +4747,22 @@ class HermesCLI:
_cprint(" Use /history or `hermes sessions list` to see available sessions.")
return
# If the target is the empty head of a compression chain, redirect to
# the descendant that actually holds the transcript. See #15000.
try:
resolved_id = self._session_db.resolve_resume_session_id(target_id)
except Exception:
resolved_id = target_id
if resolved_id and resolved_id != target_id:
_cprint(
f" Session {target_id} was compressed into {resolved_id}; "
f"resuming the descendant with your transcript."
)
target_id = resolved_id
resolved_meta = self._session_db.get_session(target_id)
if resolved_meta:
session_meta = resolved_meta
if target_id == self.session_id:
_cprint(" Already on that session.")
return
@ -5297,29 +5374,26 @@ class HermesCLI:
_cprint(f" ✓ Model switched: {result.new_model}")
_cprint(f" Provider: {provider_label}")
# Rich metadata from models.dev
# Context: always resolve via the provider-aware chain so Codex OAuth,
# Copilot, and Nous-enforced caps win over the raw models.dev entry
# (e.g. gpt-5.5 is 1.05M on openai but 272K on Codex OAuth).
mi = result.model_info
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or self.base_url or "",
api_key=result.api_key or self.api_key or "",
model_info=mi,
)
if ctx:
_cprint(f" Context: {ctx:,} tokens")
if mi:
if mi.context_window:
_cprint(f" Context: {mi.context_window:,} tokens")
if mi.max_output:
_cprint(f" Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
_cprint(f" Cost: {mi.format_cost()}")
_cprint(f" Capabilities: {mi.format_capabilities()}")
else:
# Fallback to old context length lookup
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
result.new_model,
base_url=result.base_url or self.base_url,
api_key=result.api_key or self.api_key,
provider=result.target_provider,
)
_cprint(f" Context: {ctx:,} tokens")
except Exception:
pass
# Cache notice
cache_enabled = (
@ -5378,79 +5452,6 @@ class HermesCLI:
except Exception:
return False
def _show_model_and_providers(self):
"""Show current model + provider and list all authenticated providers.
Shows current model + provider, then lists all authenticated
providers with their available models.
"""
from hermes_cli.models import (
curated_models_for_provider, list_available_providers,
normalize_provider, _PROVIDER_LABELS,
get_pricing_for_provider, format_model_pricing_table,
)
from hermes_cli.auth import resolve_provider as _resolve_provider
# Resolve current provider
raw_provider = normalize_provider(self.provider)
if raw_provider == "auto":
try:
current = _resolve_provider(
self.requested_provider,
explicit_api_key=self._explicit_api_key,
explicit_base_url=self._explicit_base_url,
)
except Exception:
current = "openrouter"
else:
current = raw_provider
current_label = _PROVIDER_LABELS.get(current, current)
print(f"\n Current: {self.model} via {current_label}")
print()
# Show all authenticated providers with their models
providers = list_available_providers()
authed = [p for p in providers if p["authenticated"]]
unauthed = [p for p in providers if not p["authenticated"]]
if authed:
print(" Authenticated providers & models:")
for p in authed:
is_active = p["id"] == current
marker = " ← active" if is_active else ""
print(f" [{p['id']}]{marker}")
curated = curated_models_for_provider(p["id"])
# Fetch pricing for providers that support it (openrouter, nous)
pricing_map = get_pricing_for_provider(p["id"]) if p["id"] in ("openrouter", "nous") else {}
if curated and pricing_map:
cur_model = self.model if is_active else ""
for line in format_model_pricing_table(curated, pricing_map, current_model=cur_model):
print(line)
elif curated:
for mid, desc in curated:
current_marker = " ← current" if (is_active and mid == self.model) else ""
print(f" {mid}{current_marker}")
elif p["id"] == "custom":
from hermes_cli.models import _get_custom_base_url
custom_url = _get_custom_base_url()
if custom_url:
print(f" endpoint: {custom_url}")
if is_active:
print(f" model: {self.model} ← current")
print(" (use hermes model to change)")
else:
print(" (use hermes model to change)")
print()
if unauthed:
names = ", ".join(p["label"] for p in unauthed)
print(f" Not configured: {names}")
print(" Run: hermes setup")
print()
print(" To change model or provider, use: hermes model")
def _output_console(self):
"""Use prompt_toolkit-safe Rich rendering once the TUI is live."""
if getattr(self, "_app", None):
@ -6026,16 +6027,12 @@ class HermesCLI:
self._handle_resume_command(cmd_original)
elif canonical == "model":
self._handle_model_switch(cmd_original)
elif canonical == "provider":
self._show_model_and_providers()
elif canonical == "gquota":
self._handle_gquota_command(cmd_original)
elif canonical == "personality":
# Use original case (handler lowercases the personality name itself)
self._handle_personality_command(cmd_original)
elif canonical == "plan":
self._handle_plan_command(cmd_original)
elif canonical == "retry":
retry_msg = self.retry_last()
if retry_msg and hasattr(self, '_pending_input'):
@ -6165,6 +6162,8 @@ class HermesCLI:
self._handle_skin_command(cmd_original)
elif canonical == "voice":
self._handle_voice_command(cmd_original)
elif canonical == "busy":
self._handle_busy_command(cmd_original)
else:
# Check for user-defined quick commands (bypass agent loop, no LLM call)
base_cmd = cmd_lower.split()[0]
@ -6270,32 +6269,6 @@ class HermesCLI:
return True
def _handle_plan_command(self, cmd: str):
"""Handle /plan [request] — load the bundled plan skill."""
parts = cmd.strip().split(maxsplit=1)
user_instruction = parts[1].strip() if len(parts) > 1 else ""
plan_path = build_plan_path(user_instruction)
msg = build_skill_invocation_message(
"/plan",
user_instruction,
task_id=self.session_id,
runtime_note=(
"Save the markdown plan with write_file to this exact relative path "
f"inside the active workspace/backend cwd: {plan_path}"
),
)
if not msg:
ChatConsole().print("[bold red]Failed to load the bundled /plan skill[/]")
return
_cprint(f" 📝 Plan mode queued via skill. Markdown plan target: {plan_path}")
if hasattr(self, '_pending_input'):
self._pending_input.put(msg)
else:
ChatConsole().print("[bold red]Plan mode unavailable: input queue not initialized[/]")
def _handle_background_command(self, cmd: str):
"""Handle /background <prompt> — run a prompt in a separate background session.
@ -6685,6 +6658,13 @@ class HermesCLI:
print(f" ⚠ Port {_port} is not reachable at {cdp_url}")
os.environ["BROWSER_CDP_URL"] = cdp_url
# Eagerly start the CDP supervisor so pending_dialogs + frame_tree
# show up in the next browser_snapshot. No-op if already started.
try:
from tools.browser_tool import _ensure_cdp_supervisor # type: ignore[import-not-found]
_ensure_cdp_supervisor("default")
except Exception:
pass
print()
print("🌐 Browser connected to live Chrome via CDP")
print(f" Endpoint: {cdp_url}")
@ -6706,7 +6686,8 @@ class HermesCLI:
if current:
os.environ.pop("BROWSER_CDP_URL", None)
try:
from tools.browser_tool import cleanup_all_browsers
from tools.browser_tool import cleanup_all_browsers, _stop_cdp_supervisor
_stop_cdp_supervisor("default")
cleanup_all_browsers()
except Exception:
pass
@ -6919,6 +6900,36 @@ class HermesCLI:
else:
_cprint(f" {_ACCENT}✓ Reasoning effort set to '{arg}' (session only){_RST}")
def _handle_busy_command(self, cmd: str):
"""Handle /busy — control what Enter does while Hermes is working.
Usage:
/busy Show current busy input mode
/busy status Show current busy input mode
/busy queue Queue input for the next turn instead of interrupting
/busy interrupt Interrupt the current run on Enter (default)
"""
parts = cmd.strip().split(maxsplit=1)
if len(parts) < 2 or parts[1].strip().lower() == "status":
_cprint(f" {_ACCENT}Busy input mode: {self.busy_input_mode}{_RST}")
_cprint(f" {_DIM}Enter while busy: {'queues for next turn' if self.busy_input_mode == 'queue' else 'interrupts current run'}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
return
arg = parts[1].strip().lower()
if arg not in {"queue", "interrupt"}:
_cprint(f" {_DIM}(._.) Unknown argument: {arg}{_RST}")
_cprint(f" {_DIM}Usage: /busy [queue|interrupt|status]{_RST}")
return
self.busy_input_mode = arg
if save_config_value("display.busy_input_mode", arg):
behavior = "Enter will queue follow-up input while Hermes is busy." if arg == "queue" else "Enter will interrupt the current run while Hermes is busy."
_cprint(f" {_ACCENT}✓ Busy input mode set to '{arg}' (saved to config){_RST}")
_cprint(f" {_DIM}{behavior}{_RST}")
else:
_cprint(f" {_ACCENT}✓ Busy input mode set to '{arg}' (session only){_RST}")
def _handle_fast_command(self, cmd: str):
"""Handle /fast — toggle fast mode (OpenAI Priority Processing / Anthropic Fast Mode)."""
if not self._fast_command_available():
@ -6997,51 +7008,52 @@ class HermesCLI:
focus_topic = parts[1].strip()
original_count = len(self.conversation_history)
try:
from agent.model_metadata import estimate_messages_tokens_rough
from agent.manual_compression_feedback import summarize_manual_compression
original_history = list(self.conversation_history)
approx_tokens = estimate_messages_tokens_rough(original_history)
if focus_topic:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens), "
f"focus: \"{focus_topic}\"...")
else:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
with self._busy_command("Compressing context..."):
try:
from agent.model_metadata import estimate_messages_tokens_rough
from agent.manual_compression_feedback import summarize_manual_compression
original_history = list(self.conversation_history)
approx_tokens = estimate_messages_tokens_rough(original_history)
if focus_topic:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens), "
f"focus: \"{focus_topic}\"...")
else:
print(f"🗜️ Compressing {original_count} messages (~{approx_tokens:,} tokens)...")
compressed, _ = self.agent._compress_context(
original_history,
self.agent._cached_system_prompt or "",
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
)
self.conversation_history = compressed
# _compress_context ends the old session and creates a new child
# session on the agent (run_agent.py::_compress_context). Sync the
# CLI's session_id so /status, /resume, exit summary, and title
# generation all point at the live continuation session, not the
# ended parent. Without this, subsequent end_session() calls target
# the already-closed parent and the child is orphaned.
if (
getattr(self.agent, "session_id", None)
and self.agent.session_id != self.session_id
):
self.session_id = self.agent.session_id
self._pending_title = None
new_tokens = estimate_messages_tokens_rough(self.conversation_history)
summary = summarize_manual_compression(
original_history,
self.conversation_history,
approx_tokens,
new_tokens,
)
icon = "🗜️" if summary["noop"] else ""
print(f" {icon} {summary['headline']}")
print(f" {summary['token_line']}")
if summary["note"]:
print(f" {summary['note']}")
compressed, _ = self.agent._compress_context(
original_history,
self.agent._cached_system_prompt or "",
approx_tokens=approx_tokens,
focus_topic=focus_topic or None,
)
self.conversation_history = compressed
# _compress_context ends the old session and creates a new child
# session on the agent (run_agent.py::_compress_context). Sync the
# CLI's session_id so /status, /resume, exit summary, and title
# generation all point at the live continuation session, not the
# ended parent. Without this, subsequent end_session() calls target
# the already-closed parent and the child is orphaned.
if (
getattr(self.agent, "session_id", None)
and self.agent.session_id != self.session_id
):
self.session_id = self.agent.session_id
self._pending_title = None
new_tokens = estimate_messages_tokens_rough(self.conversation_history)
summary = summarize_manual_compression(
original_history,
self.conversation_history,
approx_tokens,
new_tokens,
)
icon = "🗜️" if summary["noop"] else ""
print(f" {icon} {summary['headline']}")
print(f" {summary['token_line']}")
if summary["note"]:
print(f" {summary['note']}")
except Exception as e:
print(f" ❌ Compression failed: {e}")
except Exception as e:
print(f" ❌ Compression failed: {e}")
def _handle_debug_command(self):
"""Handle /debug — upload debug report + logs and print paste URLs."""
@ -9543,9 +9555,20 @@ class HermesCLI:
@kb.add('c-d')
def handle_ctrl_d(event):
"""Handle Ctrl+D - exit."""
self._should_exit = True
event.app.exit()
"""Ctrl+D: delete char under cursor (standard readline behaviour).
Only exit when the input is empty same as bash/zsh. Pending
attached images count as input and block the EOF-exit so the
user doesn't lose them silently.
"""
buf = event.app.current_buffer
if buf.text:
buf.delete()
elif self._attached_images:
# Empty text but pending attachments — no-op, don't exit.
return
else:
self._should_exit = True
event.app.exit()
_modal_prompt_active = Condition(
lambda: bool(self._secret_state or self._sudo_state)

View file

@ -371,6 +371,39 @@ def save_jobs(jobs: List[Dict[str, Any]]):
raise
def _normalize_workdir(workdir: Optional[str]) -> Optional[str]:
"""Normalize and validate a cron job workdir.
Rules:
- Empty / None None (feature off, preserves old behaviour).
- ``~`` is expanded. Relative paths are rejected cron jobs run detached
from any shell cwd, so relative paths have no stable meaning.
- The path must exist and be a directory at create/update time. We do
NOT re-check at run time (a user might briefly unmount the dir; the
scheduler will just fall back to old behaviour with a logged warning).
Returns the absolute path string, or None when disabled.
Raises ValueError on invalid input.
"""
if workdir is None:
return None
raw = str(workdir).strip()
if not raw:
return None
expanded = Path(raw).expanduser()
if not expanded.is_absolute():
raise ValueError(
f"Cron workdir must be an absolute path (got {raw!r}). "
f"Cron jobs run detached from any shell cwd, so relative paths are ambiguous."
)
resolved = expanded.resolve()
if not resolved.exists():
raise ValueError(f"Cron workdir does not exist: {resolved}")
if not resolved.is_dir():
raise ValueError(f"Cron workdir is not a directory: {resolved}")
return str(resolved)
def create_job(
prompt: str,
schedule: str,
@ -385,6 +418,7 @@ def create_job(
base_url: Optional[str] = None,
script: Optional[str] = None,
enabled_toolsets: Optional[List[str]] = None,
workdir: Optional[str] = None,
) -> Dict[str, Any]:
"""
Create a new cron job.
@ -407,6 +441,12 @@ def create_job(
enabled_toolsets: Optional list of toolset names to restrict the agent to.
When set, only tools from these toolsets are loaded, reducing
token overhead. When omitted, all default tools are loaded.
workdir: Optional absolute path. When set, the job runs as if launched
from that directory: AGENTS.md / CLAUDE.md / .cursorrules from
that directory are injected into the system prompt, and the
terminal/file/code_exec tools use it as their working directory
(via TERMINAL_CWD). When unset, the old behaviour is preserved
(no context files injected, tools use the scheduler's cwd).
Returns:
The created job dict
@ -439,6 +479,7 @@ def create_job(
normalized_script = normalized_script or None
normalized_toolsets = [str(t).strip() for t in enabled_toolsets if str(t).strip()] if enabled_toolsets else None
normalized_toolsets = normalized_toolsets or None
normalized_workdir = _normalize_workdir(workdir)
label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
job = {
@ -471,6 +512,7 @@ def create_job(
"deliver": deliver,
"origin": origin, # Tracks where job was created for "origin" delivery
"enabled_toolsets": normalized_toolsets,
"workdir": normalized_workdir,
}
jobs = load_jobs()
@ -504,6 +546,15 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]
if job["id"] != job_id:
continue
# Validate / normalize workdir if present in updates. Empty string or
# None both mean "clear the field" (restore old behaviour).
if "workdir" in updates:
_wd = updates["workdir"]
if _wd in (None, "", False):
updates["workdir"] = None
else:
updates["workdir"] = _normalize_workdir(_wd)
updated = _apply_skill_fields({**job, **updates})
schedule_changed = "schedule" in updates

View file

@ -40,6 +40,37 @@ from hermes_time import now as _hermes_now
logger = logging.getLogger(__name__)
def _resolve_cron_enabled_toolsets(job: dict, cfg: dict) -> list[str] | None:
"""Resolve the toolset list for a cron job.
Precedence:
1. Per-job ``enabled_toolsets`` (set via ``cronjob`` tool on create/update).
Keeps the agent's job-scoped toolset override intact — #6130.
2. Per-platform ``hermes tools`` config for the ``cron`` platform.
Mirrors gateway behavior (``_get_platform_tools(cfg, platform_key)``)
so users can gate cron toolsets globally without recreating every job.
3. ``None`` on any lookup failure AIAgent loads the full default set
(legacy behavior before this change, preserved as the safety net).
_DEFAULT_OFF_TOOLSETS ({moa, homeassistant, rl}) are removed by
``_get_platform_tools`` for unconfigured platforms, so fresh installs
get cron WITHOUT ``moa`` by default (issue reported by Norbert
surprise $4.63 run).
"""
per_job = job.get("enabled_toolsets")
if per_job:
return per_job
try:
from hermes_cli.tools_config import _get_platform_tools # lazy: avoid heavy import at cron module load
return sorted(_get_platform_tools(cfg or {}, "cron"))
except Exception as exc:
logger.warning(
"Cron toolset resolution failed, falling back to full default toolset: %s",
exc,
)
return None
# Valid delivery platforms — used to validate user-supplied platform names
# in cron delivery targets, preventing env var enumeration via crafted names.
_KNOWN_DELIVERY_PLATFORMS = frozenset({
@ -764,6 +795,30 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
chat_name=origin.get("chat_name", "") if origin else "",
)
# Per-job working directory. When set (and validated at create/update
# time), we point TERMINAL_CWD at it so:
# - build_context_files_prompt() picks up AGENTS.md / CLAUDE.md /
# .cursorrules from the job's project dir, AND
# - the terminal, file, and code-exec tools run commands from there.
#
# tick() serializes workdir-jobs outside the parallel pool, so mutating
# os.environ["TERMINAL_CWD"] here is safe for those jobs. For workdir-less
# jobs we leave TERMINAL_CWD untouched — preserves the original behaviour
# (skip_context_files=True, tools use whatever cwd the scheduler has).
_job_workdir = (job.get("workdir") or "").strip() or None
if _job_workdir and not Path(_job_workdir).is_dir():
# Directory was removed between create-time validation and now. Log
# and drop back to old behaviour rather than crashing the job.
logger.warning(
"Job '%s': configured workdir %r no longer exists — running without it",
job_id, _job_workdir,
)
_job_workdir = None
_prior_terminal_cwd = os.environ.get("TERMINAL_CWD", "_UNSET_")
if _job_workdir:
os.environ["TERMINAL_CWD"] = _job_workdir
logger.info("Job '%s': using workdir %s", job_id, _job_workdir)
try:
# Re-read .env and config.yaml fresh every run so provider/key
# changes take effect without a gateway restart.
@ -840,6 +895,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
resolve_runtime_provider,
format_runtime_provider_error,
)
from hermes_cli.auth import AuthError
try:
runtime_kwargs = {
"requested": job.get("provider") or os.getenv("HERMES_INFERENCE_PROVIDER"),
@ -847,6 +903,28 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
if job.get("base_url"):
runtime_kwargs["explicit_base_url"] = job.get("base_url")
runtime = resolve_runtime_provider(**runtime_kwargs)
except AuthError as auth_exc:
# Primary provider auth failed — try fallback chain before giving up.
logger.warning("Job '%s': primary auth failed (%s), trying fallback", job_id, auth_exc)
fb = _cfg.get("fallback_providers") or _cfg.get("fallback_model")
fb_list = (fb if isinstance(fb, list) else [fb]) if fb else []
runtime = None
for entry in fb_list:
if not isinstance(entry, dict):
continue
try:
fb_kwargs = {"requested": entry.get("provider")}
if entry.get("base_url"):
fb_kwargs["explicit_base_url"] = entry["base_url"]
if entry.get("api_key"):
fb_kwargs["explicit_api_key"] = entry["api_key"]
runtime = resolve_runtime_provider(**fb_kwargs)
logger.info("Job '%s': fallback resolved to %s", job_id, runtime.get("provider"))
break
except Exception as fb_exc:
logger.debug("Job '%s': fallback %s failed: %s", job_id, entry.get("provider"), fb_exc)
if runtime is None:
raise RuntimeError(format_runtime_provider_error(auth_exc)) from auth_exc
except Exception as exc:
message = format_runtime_provider_error(exc)
raise RuntimeError(message) from exc
@ -886,10 +964,13 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
providers_ignored=pr.get("ignore"),
providers_order=pr.get("order"),
provider_sort=pr.get("sort"),
enabled_toolsets=job.get("enabled_toolsets") or None,
enabled_toolsets=_resolve_cron_enabled_toolsets(job, _cfg),
disabled_toolsets=["cronjob", "messaging", "clarify"],
quiet_mode=True,
skip_context_files=True, # Don't inject SOUL.md/AGENTS.md from scheduler cwd
# When a workdir is configured, inject AGENTS.md / CLAUDE.md /
# .cursorrules from that directory; otherwise preserve the old
# behaviour (don't inject SOUL.md/AGENTS.md from the scheduler cwd).
skip_context_files=not bool(_job_workdir),
skip_memory=True, # Cron system prompts would corrupt user representations
platform="cron",
session_id=_cron_session_id,
@ -1028,6 +1109,14 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
return False, output, "", error_msg
finally:
# Restore TERMINAL_CWD to whatever it was before this job ran. We
# only ever mutate it when the job has a workdir; see the setup block
# at the top of run_job for the serialization guarantee.
if _job_workdir:
if _prior_terminal_cwd == "_UNSET_":
os.environ.pop("TERMINAL_CWD", None)
else:
os.environ["TERMINAL_CWD"] = _prior_terminal_cwd
# Clean up ContextVar session/delivery state for this job.
clear_session_vars(_ctx_tokens)
if _session_db:
@ -1155,14 +1244,28 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
mark_job_run(job["id"], False, str(e))
return False
# Run all due jobs concurrently, each in its own ContextVar copy
# so session/delivery state stays isolated per-thread.
with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
_futures = []
for job in due_jobs:
_ctx = contextvars.copy_context()
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results = [f.result() for f in _futures]
# Partition due jobs: those with a per-job workdir mutate
# os.environ["TERMINAL_CWD"] inside run_job, which is process-global —
# so they MUST run sequentially to avoid corrupting each other. Jobs
# without a workdir leave env untouched and stay parallel-safe.
workdir_jobs = [j for j in due_jobs if (j.get("workdir") or "").strip()]
parallel_jobs = [j for j in due_jobs if not (j.get("workdir") or "").strip()]
_results: list = []
# Sequential pass for workdir jobs.
for job in workdir_jobs:
_ctx = contextvars.copy_context()
_results.append(_ctx.run(_process_job, job))
# Parallel pass for the rest — same behaviour as before.
if parallel_jobs:
with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
_futures = []
for job in parallel_jobs:
_ctx = contextvars.copy_context()
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results.extend(f.result() for f in _futures)
return sum(_results)
finally:

52
docker-compose.yml Normal file
View file

@ -0,0 +1,52 @@
#
# docker-compose.yml for Hermes Agent
#
# Usage:
# HERMES_UID=$(id -u) HERMES_GID=$(id -g) docker compose up -d
#
# Set HERMES_UID / HERMES_GID to the host user that owns ~/.hermes so
# files created inside the container stay readable/writable on the host.
# The entrypoint remaps the internal `hermes` user to these values via
# usermod/groupmod + gosu.
#
# Security notes:
# - The dashboard service binds to 127.0.0.1 by default. It stores API
# keys; exposing it on LAN without auth is unsafe. If you want remote
# access, use an SSH tunnel or put it behind a reverse proxy that
# adds authentication — do NOT pass --insecure --host 0.0.0.0.
# - The gateway's API server is off unless you uncomment API_SERVER_KEY
# and API_SERVER_HOST. See docs/user-guide/api-server.md before doing
# this on an internet-facing host.
#
services:
gateway:
build: .
image: hermes-agent
container_name: hermes
restart: unless-stopped
network_mode: host
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_UID=${HERMES_UID:-10000}
- HERMES_GID=${HERMES_GID:-10000}
# To expose the OpenAI-compatible API server beyond localhost,
# uncomment BOTH lines (API_SERVER_KEY is mandatory for auth):
# - API_SERVER_HOST=0.0.0.0
# - API_SERVER_KEY=${API_SERVER_KEY}
command: ["gateway", "run"]
dashboard:
image: hermes-agent
container_name: hermes-dashboard
restart: unless-stopped
network_mode: host
depends_on:
- gateway
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_UID=${HERMES_UID:-10000}
- HERMES_GID=${HERMES_GID:-10000}
# Localhost-only. For remote access, tunnel via `ssh -L 9119:localhost:9119`.
command: ["dashboard", "--host", "127.0.0.1", "--no-open"]

View file

@ -22,9 +22,18 @@ if [ "$(id -u)" = "0" ]; then
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
fi
# Fix ownership of the data volume. When HERMES_UID remaps the hermes user,
# files created by previous runs (under the old UID) become inaccessible.
# Always chown -R when UID was remapped; otherwise only if top-level is wrong.
actual_hermes_uid=$(id -u hermes)
if [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
echo "$HERMES_HOME is not owned by $actual_hermes_uid, fixing"
needs_chown=false
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "10000" ]; then
needs_chown=true
elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
needs_chown=true
fi
if [ "$needs_chown" = true ]; then
echo "Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
# In rootless Podman the container's "root" is mapped to an unprivileged
# host UID — chown will fail. That's fine: the volume is already owned
# by the mapped user on the host side.

View file

@ -135,7 +135,7 @@ class SessionResetPolicy:
mode=mode if mode is not None else "both",
at_hour=at_hour if at_hour is not None else 4,
idle_minutes=idle_minutes if idle_minutes is not None else 1440,
notify=notify if notify is not None else True,
notify=_coerce_bool(notify, True),
notify_exclude_platforms=tuple(exclude) if exclude is not None else ("api_server", "webhook"),
)
@ -178,7 +178,7 @@ class PlatformConfig:
home_channel = HomeChannel.from_dict(data["home_channel"])
return cls(
enabled=data.get("enabled", False),
enabled=_coerce_bool(data.get("enabled"), False),
token=data.get("token"),
api_key=data.get("api_key"),
home_channel=home_channel,
@ -435,7 +435,7 @@ class GatewayConfig:
reset_triggers=data.get("reset_triggers", ["/new", "/reset"]),
quick_commands=quick_commands,
sessions_dir=sessions_dir,
always_log_local=data.get("always_log_local", True),
always_log_local=_coerce_bool(data.get("always_log_local"), True),
stt_enabled=_coerce_bool(stt_enabled, True),
group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
thread_sessions_per_user=_coerce_bool(thread_sessions_per_user, False),
@ -687,6 +687,11 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
if "group_allowed_chats" in telegram_cfg and not os.getenv("TELEGRAM_GROUP_ALLOWED_USERS"):
gac = telegram_cfg["group_allowed_chats"]
if isinstance(gac, list):
gac = ",".join(str(v) for v in gac)
os.environ["TELEGRAM_GROUP_ALLOWED_USERS"] = str(gac)
if "disable_link_previews" in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):

View file

@ -1204,10 +1204,12 @@ class APIServerAdapter(BasePlatformAdapter):
If the client disconnects mid-stream, ``agent.interrupt()`` is
called so the agent stops issuing upstream LLM calls, then the
asyncio task is cancelled. When ``store=True`` the full response
is persisted to the ResponseStore in a ``finally`` block so GET
/v1/responses/{id} and ``previous_response_id`` chaining work the
same as the batch path.
asyncio task is cancelled. When ``store=True`` an initial
``in_progress`` snapshot is persisted immediately after
``response.created`` and disconnects update it to an
``incomplete`` snapshot so GET /v1/responses/{id} and
``previous_response_id`` chaining still have something to
recover from.
"""
import queue as _q
@ -1269,6 +1271,60 @@ class APIServerAdapter(BasePlatformAdapter):
final_response_text = ""
agent_error: Optional[str] = None
usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
terminal_snapshot_persisted = False
def _persist_response_snapshot(
response_env: Dict[str, Any],
*,
conversation_history_snapshot: Optional[List[Dict[str, Any]]] = None,
) -> None:
if not store:
return
if conversation_history_snapshot is None:
conversation_history_snapshot = list(conversation_history)
conversation_history_snapshot.append({"role": "user", "content": user_message})
self._response_store.put(response_id, {
"response": response_env,
"conversation_history": conversation_history_snapshot,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
def _persist_incomplete_if_needed() -> None:
"""Persist an ``incomplete`` snapshot if no terminal one was written.
Called from both the client-disconnect (``ConnectionResetError``)
and server-cancellation (``asyncio.CancelledError``) paths so
GET /v1/responses/{id} and ``previous_response_id`` chaining keep
working after abrupt stream termination.
"""
if not store or terminal_snapshot_persisted:
return
incomplete_text = "".join(final_text_parts) or final_response_text
incomplete_items: List[Dict[str, Any]] = list(emitted_items)
if incomplete_text:
incomplete_items.append({
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": incomplete_text}],
})
incomplete_env = _envelope("incomplete")
incomplete_env["output"] = incomplete_items
incomplete_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
incomplete_history = list(conversation_history)
incomplete_history.append({"role": "user", "content": user_message})
if incomplete_text:
incomplete_history.append({"role": "assistant", "content": incomplete_text})
_persist_response_snapshot(
incomplete_env,
conversation_history_snapshot=incomplete_history,
)
try:
# response.created — initial envelope, status=in_progress
@ -1278,6 +1334,7 @@ class APIServerAdapter(BasePlatformAdapter):
"type": "response.created",
"response": created_env,
})
_persist_response_snapshot(created_env)
last_activity = time.monotonic()
async def _open_message_item() -> None:
@ -1534,6 +1591,18 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
_failed_history = list(conversation_history)
_failed_history.append({"role": "user", "content": user_message})
if final_response_text or agent_error:
_failed_history.append({
"role": "assistant",
"content": final_response_text or agent_error,
})
_persist_response_snapshot(
failed_env,
conversation_history_snapshot=_failed_history,
)
terminal_snapshot_persisted = True
await _write_event("response.failed", {
"type": "response.failed",
"response": failed_env,
@ -1546,30 +1615,24 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
_persist_response_snapshot(
completed_env,
conversation_history_snapshot=full_history,
)
terminal_snapshot_persisted = True
await _write_event("response.completed", {
"type": "response.completed",
"response": completed_env,
})
# Persist for future chaining / GET retrieval, mirroring
# the batch path behavior.
if store:
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
self._response_store.put(response_id, {
"response": completed_env,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
_persist_incomplete_if_needed()
# Client disconnected — interrupt the agent so it stops
# making upstream LLM calls, then cancel the task.
agent = agent_ref[0] if agent_ref else None
@ -1585,6 +1648,22 @@ class APIServerAdapter(BasePlatformAdapter):
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", response_id)
except asyncio.CancelledError:
# Server-side cancellation (e.g. shutdown, request timeout) —
# persist an incomplete snapshot so GET /v1/responses/{id} and
# previous_response_id chaining still work, then re-raise so the
# runtime's cancellation semantics are respected.
_persist_incomplete_if_needed()
agent = agent_ref[0] if agent_ref else None
if agent is not None:
try:
agent.interrupt("SSE task cancelled")
except Exception:
pass
if not agent_task.done():
agent_task.cancel()
logger.info("SSE task cancelled; persisted incomplete snapshot for %s", response_id)
raise
return response

View file

@ -148,7 +148,102 @@ def _detect_macos_system_proxy() -> str | None:
return None
def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
def _split_host_port(value: str) -> tuple[str, int | None]:
raw = str(value or "").strip()
if not raw:
return "", None
if "://" in raw:
parsed = urlsplit(raw)
return (parsed.hostname or "").lower().rstrip("."), parsed.port
if raw.startswith("[") and "]" in raw:
host, _, rest = raw[1:].partition("]")
port = None
if rest.startswith(":") and rest[1:].isdigit():
port = int(rest[1:])
return host.lower().rstrip("."), port
if raw.count(":") == 1:
host, _, maybe_port = raw.rpartition(":")
if maybe_port.isdigit():
return host.lower().rstrip("."), int(maybe_port)
return raw.lower().strip("[]").rstrip("."), None
def _no_proxy_entries() -> list[str]:
entries: list[str] = []
for key in ("NO_PROXY", "no_proxy"):
raw = os.environ.get(key, "")
entries.extend(part.strip() for part in raw.split(",") if part.strip())
return entries
def _no_proxy_entry_matches(entry: str, host: str, port: int | None = None) -> bool:
token = str(entry or "").strip().lower()
if not token:
return False
if token == "*":
return True
token_host, token_port = _split_host_port(token)
if token_port is not None and port is not None and token_port != port:
return False
if token_port is not None and port is None:
return False
if not token_host:
return False
try:
network = ipaddress.ip_network(token_host, strict=False)
try:
return ipaddress.ip_address(host) in network
except ValueError:
return False
except ValueError:
pass
try:
token_ip = ipaddress.ip_address(token_host)
try:
return ipaddress.ip_address(host) == token_ip
except ValueError:
return False
except ValueError:
pass
if token_host.startswith("*."):
suffix = token_host[1:]
return host.endswith(suffix)
if token_host.startswith("."):
return host == token_host[1:] or host.endswith(token_host)
return host == token_host or host.endswith(f".{token_host}")
def should_bypass_proxy(target_hosts: str | list[str] | tuple[str, ...] | set[str] | None) -> bool:
"""Return True when NO_PROXY/no_proxy matches at least one target host.
Supports exact hosts, domain suffixes, wildcard suffixes, IP literals,
CIDR ranges, optional host:port entries, and ``*``.
"""
entries = _no_proxy_entries()
if not entries or not target_hosts:
return False
if isinstance(target_hosts, str):
candidates = [target_hosts]
else:
candidates = list(target_hosts)
for candidate in candidates:
host, port = _split_host_port(str(candidate))
if not host:
continue
if any(_no_proxy_entry_matches(entry, host, port) for entry in entries):
return True
return False
def resolve_proxy_url(
platform_env_var: str | None = None,
*,
target_hosts: str | list[str] | tuple[str, ...] | set[str] | None = None,
) -> str | None:
"""Return a proxy URL from env vars, or macOS system proxy.
Check order:
@ -156,18 +251,26 @@ def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
1. HTTPS_PROXY / HTTP_PROXY / ALL_PROXY (and lowercase variants)
2. macOS system proxy via ``scutil --proxy`` (auto-detect)
Returns *None* if no proxy is found.
Returns *None* if no proxy is found, or if NO_PROXY/no_proxy matches one
of ``target_hosts``.
"""
if platform_env_var:
value = (os.environ.get(platform_env_var) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy"):
value = (os.environ.get(key) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
return normalize_proxy_url(_detect_macos_system_proxy())
detected = normalize_proxy_url(_detect_macos_system_proxy())
if detected and should_bypass_proxy(target_hosts):
return None
return detected
def proxy_kwargs_for_bot(proxy_url: str | None) -> dict:

View file

@ -99,6 +99,7 @@ def _normalize_server_url(raw: str) -> str:
class BlueBubblesAdapter(BasePlatformAdapter):
platform = Platform.BLUEBUBBLES
SUPPORTS_MESSAGE_EDITING = False
MAX_MESSAGE_LENGTH = MAX_TEXT_LENGTH
def __init__(self, config: PlatformConfig):
@ -391,6 +392,13 @@ class BlueBubblesAdapter(BasePlatformAdapter):
# Text sending
# ------------------------------------------------------------------
@staticmethod
def truncate_message(content: str, max_length: int = MAX_TEXT_LENGTH) -> List[str]:
# Use the base splitter but skip pagination indicators — iMessage
# bubbles flow naturally without "(1/3)" suffixes.
chunks = BasePlatformAdapter.truncate_message(content, max_length)
return [re.sub(r"\s*\(\d+/\d+\)$", "", c) for c in chunks]
async def send(
self,
chat_id: str,
@ -398,10 +406,19 @@ class BlueBubblesAdapter(BasePlatformAdapter):
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
text = strip_markdown(content or "")
text = self.format_message(content)
if not text:
return SendResult(success=False, error="BlueBubbles send requires text")
chunks = self.truncate_message(text, max_length=self.MAX_MESSAGE_LENGTH)
# Split on paragraph breaks first (double newlines) so each thought
# becomes its own iMessage bubble, then truncate any that are still
# too long.
paragraphs = [p.strip() for p in re.split(r'\n\s*\n', text) if p.strip()]
chunks: List[str] = []
for para in (paragraphs or [text]):
if len(para) <= self.MAX_MESSAGE_LENGTH:
chunks.append(para)
else:
chunks.extend(self.truncate_message(para, max_length=self.MAX_MESSAGE_LENGTH))
last = SendResult(success=True)
for chunk in chunks:
guid = await self._resolve_chat_guid(chat_id)

View file

@ -2246,10 +2246,6 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_usage(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/usage")
@tree.command(name="provider", description="Show available providers")
async def slash_provider(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/provider")
@tree.command(name="help", description="Show available commands")
async def slash_help(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/help")
@ -2719,7 +2715,12 @@ class DiscordAdapter(BasePlatformAdapter):
return os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no", "off")
def _discord_free_response_channels(self) -> set:
"""Return Discord channel IDs where no bot mention is required."""
"""Return Discord channel IDs where no bot mention is required.
A single ``"*"`` entry (either from a list or a comma-separated
string) is preserved in the returned set so callers can short-circuit
on wildcard membership, consistent with ``allowed_channels``.
"""
raw = self.config.extra.get("free_response_channels")
if raw is None:
raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
@ -3212,14 +3213,14 @@ class DiscordAdapter(BasePlatformAdapter):
allowed_channels_raw = os.getenv("DISCORD_ALLOWED_CHANNELS", "")
if allowed_channels_raw:
allowed_channels = {ch.strip() for ch in allowed_channels_raw.split(",") if ch.strip()}
if not (channel_ids & allowed_channels):
if "*" not in allowed_channels and not (channel_ids & allowed_channels):
logger.debug("[%s] Ignoring message in non-allowed channel: %s", self.name, channel_ids)
return
# Check ignored channels - never respond even when mentioned
ignored_channels_raw = os.getenv("DISCORD_IGNORED_CHANNELS", "")
ignored_channels = {ch.strip() for ch in ignored_channels_raw.split(",") if ch.strip()}
if channel_ids & ignored_channels:
if "*" in ignored_channels or (channel_ids & ignored_channels):
logger.debug("[%s] Ignoring message in ignored channel: %s", self.name, channel_ids)
return
@ -3233,7 +3234,11 @@ class DiscordAdapter(BasePlatformAdapter):
voice_linked_ids = {str(ch_id) for ch_id in self._voice_text_channels.values()}
current_channel_id = str(message.channel.id)
is_voice_linked_channel = current_channel_id in voice_linked_ids
is_free_channel = bool(channel_ids & free_channels) or is_voice_linked_channel
is_free_channel = (
"*" in free_channels
or bool(channel_ids & free_channels)
or is_voice_linked_channel
)
# Skip the mention check if the message is in a thread where
# the bot has previously participated (auto-created or replied in).
@ -3866,6 +3871,15 @@ if DISCORD_AVAILABLE:
self.resolved = True
model_id = interaction.data["values"][0]
self.clear_items()
await interaction.response.edit_message(
embed=discord.Embed(
title="⚙ Switching Model",
description=f"Switching to `{model_id}`...",
color=discord.Color.blue(),
),
view=None,
)
try:
result_text = await self.on_model_selected(
@ -3876,14 +3890,13 @@ if DISCORD_AVAILABLE:
except Exception as exc:
result_text = f"Error switching model: {exc}"
self.clear_items()
await interaction.response.edit_message(
await interaction.edit_original_response(
embed=discord.Embed(
title="⚙ Model Switched",
description=result_text,
color=discord.Color.green(),
),
view=self,
view=None,
)
async def _on_back(self, interaction: discord.Interaction):

View file

@ -703,7 +703,6 @@ class TelegramAdapter(BasePlatformAdapter):
"write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
}
proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
fallback_ips = self._fallback_ips()
if not fallback_ips:
@ -714,6 +713,8 @@ class TelegramAdapter(BasePlatformAdapter):
", ".join(fallback_ips),
)
proxy_targets = ["api.telegram.org", *fallback_ips]
proxy_url = resolve_proxy_url("TELEGRAM_PROXY", target_hosts=proxy_targets)
if fallback_ips and not proxy_url and not disable_fallback:
logger.info(
"[%s] Telegram fallback IPs active: %s",

View file

@ -43,10 +43,10 @@ _DOH_PROVIDERS: list[dict] = [
_SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url() -> str | None:
def _resolve_proxy_url(target_hosts=None) -> str | None:
# Delegate to shared implementation (env vars + macOS system proxy detection)
from gateway.platforms.base import resolve_proxy_url
return resolve_proxy_url("TELEGRAM_PROXY")
return resolve_proxy_url("TELEGRAM_PROXY", target_hosts=target_hosts)
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
@ -60,7 +60,7 @@ class TelegramFallbackTransport(httpx.AsyncBaseTransport):
def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
proxy_url = _resolve_proxy_url()
proxy_url = _resolve_proxy_url(target_hosts=[_TELEGRAM_API_HOST, *self._fallback_ips])
if proxy_url and "proxy" not in transport_kwargs:
transport_kwargs["proxy"] = proxy_url
self._primary = httpx.AsyncHTTPTransport(**transport_kwargs)

View file

@ -14,6 +14,7 @@ Usage:
"""
import asyncio
import dataclasses
import json
import logging
import os
@ -297,50 +298,16 @@ from gateway.restart import (
)
def _normalize_whatsapp_identifier(value: str) -> str:
"""Strip WhatsApp JID/LID syntax down to its stable numeric identifier."""
return (
str(value or "")
.strip()
.replace("+", "", 1)
.split(":", 1)[0]
.split("@", 1)[0]
)
from gateway.whatsapp_identity import (
canonical_whatsapp_identifier as _canonical_whatsapp_identifier, # noqa: F401
expand_whatsapp_aliases as _expand_whatsapp_auth_aliases,
normalize_whatsapp_identifier as _normalize_whatsapp_identifier,
)
def _expand_whatsapp_auth_aliases(identifier: str) -> set:
"""Resolve WhatsApp phone/LID aliases using bridge session mapping files."""
normalized = _normalize_whatsapp_identifier(identifier)
if not normalized:
return set()
session_dir = _hermes_home / "whatsapp" / "session"
resolved = set()
queue = [normalized]
while queue:
current = queue.pop(0)
if not current or current in resolved:
continue
resolved.add(current)
for suffix in ("", "_reverse"):
mapping_path = session_dir / f"lid-mapping-{current}{suffix}.json"
if not mapping_path.exists():
continue
try:
mapped = _normalize_whatsapp_identifier(
json.loads(mapping_path.read_text(encoding="utf-8"))
)
except Exception:
continue
if mapped and mapped not in resolved:
queue.append(mapped)
return resolved
logger = logging.getLogger(__name__)
# Sentinel placed into _running_agents immediately when a session starts
# processing, *before* any await. Prevents a second message for the same
# session from bypassing the "already running" guard during the async gap
@ -349,16 +316,30 @@ _AGENT_PENDING_SENTINEL = object()
def _resolve_runtime_agent_kwargs() -> dict:
"""Resolve provider credentials for gateway-created AIAgent instances."""
"""Resolve provider credentials for gateway-created AIAgent instances.
If the primary provider fails with an authentication error, attempt to
resolve credentials using the fallback provider chain from config.yaml
before giving up.
"""
from hermes_cli.runtime_provider import (
resolve_runtime_provider,
format_runtime_provider_error,
)
from hermes_cli.auth import AuthError
try:
runtime = resolve_runtime_provider(
requested=os.getenv("HERMES_INFERENCE_PROVIDER"),
)
except AuthError as auth_exc:
# Primary provider auth failed (expired token, revoked key, etc.).
# Try the fallback provider chain before raising.
logger.warning("Primary provider auth failed: %s — trying fallback", auth_exc)
fb_config = _try_resolve_fallback_provider()
if fb_config is not None:
return fb_config
raise RuntimeError(format_runtime_provider_error(auth_exc)) from auth_exc
except Exception as exc:
raise RuntimeError(format_runtime_provider_error(exc)) from exc
@ -373,6 +354,48 @@ def _resolve_runtime_agent_kwargs() -> dict:
}
def _try_resolve_fallback_provider() -> dict | None:
"""Attempt to resolve credentials from the fallback_model/fallback_providers config."""
from hermes_cli.runtime_provider import resolve_runtime_provider
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if not cfg_path.exists():
return None
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
fb = cfg.get("fallback_providers") or cfg.get("fallback_model")
if not fb:
return None
# Normalize to list
fb_list = fb if isinstance(fb, list) else [fb]
for entry in fb_list:
if not isinstance(entry, dict):
continue
try:
runtime = resolve_runtime_provider(
requested=entry.get("provider"),
explicit_base_url=entry.get("base_url"),
explicit_api_key=entry.get("api_key"),
)
logger.info("Fallback provider resolved: %s", runtime.get("provider"))
return {
"api_key": runtime.get("api_key"),
"base_url": runtime.get("base_url"),
"provider": runtime.get("provider"),
"api_mode": runtime.get("api_mode"),
"command": runtime.get("command"),
"args": list(runtime.get("args") or []),
"credential_pool": runtime.get("credential_pool"),
}
except Exception as fb_exc:
logger.debug("Fallback entry %s failed: %s", entry.get("provider"), fb_exc)
continue
except Exception:
pass
return None
def _build_media_placeholder(event) -> str:
"""Build a text placeholder for media-only events so they aren't dropped.
@ -2309,6 +2332,17 @@ class GatewayRunner:
for key, entry in _expired_entries:
try:
await self._async_flush_memories(entry.session_id, key)
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_parts = key.split(":")
_platform = _parts[2] if len(_parts) > 2 else ""
_invoke_hook(
"on_session_finalize",
session_id=entry.session_id,
platform=_platform,
)
except Exception:
pass
# Shut down memory provider and close tool resources
# on the cached agent. Idle agents live in
# _agent_cache (not _running_agents), so look there.
@ -2969,6 +3003,7 @@ class GatewayRunner:
Platform.QQBOT: "QQ_ALLOWED_USERS",
}
platform_group_env_map = {
Platform.TELEGRAM: "TELEGRAM_GROUP_ALLOWED_USERS",
Platform.QQBOT: "QQ_GROUP_ALLOWED_USERS",
}
platform_allow_all_map = {
@ -3025,7 +3060,7 @@ class GatewayRunner:
# Check platform-specific and global allowlists
platform_allowlist = os.getenv(platform_env_map.get(source.platform, ""), "").strip()
group_allowlist = ""
if source.chat_type == "group":
if source.chat_type in {"group", "forum"}:
group_allowlist = os.getenv(platform_group_env_map.get(source.platform, ""), "").strip()
global_allowlist = os.getenv("GATEWAY_ALLOWED_USERS", "").strip()
@ -3034,7 +3069,7 @@ class GatewayRunner:
return os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in ("true", "1", "yes")
# Some platforms authorize group traffic by chat ID rather than sender ID.
if group_allowlist and source.chat_type == "group" and source.chat_id:
if group_allowlist and source.chat_type in {"group", "forum"} and source.chat_id:
allowed_group_ids = {
chat_id.strip() for chat_id in group_allowlist.split(",") if chat_id.strip()
}
@ -3145,7 +3180,50 @@ class GatewayRunner:
# Internal events (e.g. background-process completion notifications)
# are system-generated and must skip user authorization.
if getattr(event, "internal", False):
is_internal = bool(getattr(event, "internal", False))
# Fire pre_gateway_dispatch plugin hook for user-originated messages.
# Plugins receive the MessageEvent and may return a dict influencing flow:
# {"action": "skip", "reason": ...} -> drop (no reply, plugin handled)
# {"action": "rewrite", "text": ...} -> replace event.text, continue
# {"action": "allow"} / None -> normal dispatch
# Hook runs BEFORE auth so plugins can handle unauthorized senders
# (e.g. customer handover ingest) without triggering the pairing flow.
if not is_internal:
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_hook_results = _invoke_hook(
"pre_gateway_dispatch",
event=event,
gateway=self,
session_store=self.session_store,
)
except Exception as _hook_exc:
logger.warning("pre_gateway_dispatch invocation failed: %s", _hook_exc)
_hook_results = []
for _result in _hook_results:
if not isinstance(_result, dict):
continue
_action = _result.get("action")
if _action == "skip":
logger.info(
"pre_gateway_dispatch skip: reason=%s platform=%s chat=%s",
_result.get("reason"),
source.platform.value if source.platform else "unknown",
source.chat_id or "unknown",
)
return None
if _action == "rewrite":
_new_text = _result.get("text")
if isinstance(_new_text, str):
event = dataclasses.replace(event, text=_new_text)
source = event.source
break
if _action == "allow":
break
if is_internal:
pass
elif source.user_id is None:
# Messages with no user identity (Telegram service messages,
@ -3442,7 +3520,7 @@ class GatewayRunner:
# running-agent guard. Reject gracefully rather than falling
# through to interrupt + discard. Without this, commands
# like /model, /reasoning, /voice, /insights, /title,
# /resume, /retry, /undo, /compress, /usage, /provider,
# /resume, /retry, /undo, /compress, /usage,
# /reload-mcp, /sethome, /reset (all registered as Discord
# slash commands) would interrupt the agent AND get
# silently discarded by the slash-command safety net,
@ -3513,6 +3591,10 @@ class GatewayRunner:
if self._queue_during_drain_enabled()
else f"⏳ Gateway is {self._status_action_gerund()} and is not accepting another turn right now."
)
if self._busy_input_mode == "queue":
logger.debug("PRIORITY queue follow-up for session %s", _quick_key[:20])
self._queue_or_replace_pending_event(_quick_key, event)
return None
logger.debug("PRIORITY interrupt for session %s", _quick_key[:20])
running_agent.interrupt(event.text)
if _quick_key in self._pending_messages:
@ -3629,34 +3711,9 @@ class GatewayRunner:
if canonical == "model":
return await self._handle_model_command(event)
if canonical == "provider":
return await self._handle_provider_command(event)
if canonical == "personality":
return await self._handle_personality_command(event)
if canonical == "plan":
try:
from agent.skill_commands import build_plan_path, build_skill_invocation_message
user_instruction = event.get_command_args().strip()
plan_path = build_plan_path(user_instruction)
event.text = build_skill_invocation_message(
"/plan",
user_instruction,
task_id=_quick_key,
runtime_note=(
"Save the markdown plan with write_file to this exact relative path "
f"inside the active workspace/backend cwd: {plan_path}"
),
)
if not event.text:
return "Failed to load the bundled /plan skill."
canonical = None
except Exception as e:
logger.exception("Failed to prepare /plan command")
return f"Failed to enter plan mode: {e}"
if canonical == "retry":
return await self._handle_retry_command(event)
@ -5602,9 +5659,17 @@ class GatewayRunner:
lines = [f"Model switched to `{result.new_model}`"]
lines.append(f"Provider: {plabel}")
mi = result.model_info
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or current_base_url or "",
api_key=result.api_key or current_api_key or "",
model_info=mi,
)
if ctx:
lines.append(f"Context: {ctx:,} tokens")
if mi:
if mi.context_window:
lines.append(f"Context: {mi.context_window:,} tokens")
if mi.max_output:
lines.append(f"Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
@ -5738,28 +5803,25 @@ class GatewayRunner:
lines = [f"Model switched to `{result.new_model}`"]
lines.append(f"Provider: {provider_label}")
# Rich metadata from models.dev
# Context: always resolve via the provider-aware chain so Codex OAuth,
# Copilot, and Nous-enforced caps win over the raw models.dev entry.
mi = result.model_info
from hermes_cli.model_switch import resolve_display_context_length
ctx = resolve_display_context_length(
result.new_model,
result.target_provider,
base_url=result.base_url or current_base_url or "",
api_key=result.api_key or current_api_key or "",
model_info=mi,
)
if ctx:
lines.append(f"Context: {ctx:,} tokens")
if mi:
if mi.context_window:
lines.append(f"Context: {mi.context_window:,} tokens")
if mi.max_output:
lines.append(f"Max output: {mi.max_output:,} tokens")
if mi.has_cost_data():
lines.append(f"Cost: {mi.format_cost()}")
lines.append(f"Capabilities: {mi.format_capabilities()}")
else:
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
result.new_model,
base_url=result.base_url or current_base_url,
api_key=result.api_key or current_api_key,
provider=result.target_provider,
)
lines.append(f"Context: {ctx:,} tokens")
except Exception:
pass
# Cache notice
cache_enabled = (
@ -5779,63 +5841,6 @@ class GatewayRunner:
return "\n".join(lines)
async def _handle_provider_command(self, event: MessageEvent) -> str:
"""Handle /provider command - show available providers."""
import yaml
from hermes_cli.models import (
list_available_providers,
normalize_provider,
_PROVIDER_LABELS,
)
# Resolve current provider from config
current_provider = "openrouter"
model_cfg = {}
config_path = _hermes_home / 'config.yaml'
try:
if config_path.exists():
with open(config_path, encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
current_provider = model_cfg.get("provider", current_provider)
except Exception:
pass
current_provider = normalize_provider(current_provider)
if current_provider == "auto":
try:
from hermes_cli.auth import resolve_provider as _resolve_provider
current_provider = _resolve_provider(current_provider)
except Exception:
current_provider = "openrouter"
# Detect custom endpoint from config base_url
if current_provider == "openrouter":
_cfg_base = model_cfg.get("base_url", "") if isinstance(model_cfg, dict) else ""
if _cfg_base and "openrouter.ai" not in _cfg_base:
current_provider = "custom"
current_label = _PROVIDER_LABELS.get(current_provider, current_provider)
lines = [
f"🔌 **Current provider:** {current_label} (`{current_provider}`)",
"",
"**Available providers:**",
]
providers = list_available_providers()
for p in providers:
marker = " ← active" if p["id"] == current_provider else ""
auth = "" if p["authenticated"] else ""
aliases = f" _(also: {', '.join(p['aliases'])})_" if p["aliases"] else ""
lines.append(f"{auth} `{p['id']}` — {p['label']}{aliases}{marker}")
lines.append("")
lines.append("Switch: `/model provider:model-name`")
lines.append("Setup: `hermes setup`")
return "\n".join(lines)
async def _handle_personality_command(self, event: MessageEvent) -> str:
"""Handle /personality command - list or set a personality."""
import yaml
@ -7102,10 +7107,7 @@ class GatewayRunner:
tmp_agent._print_fn = lambda *a, **kw: None
compressor = tmp_agent.context_compressor
compress_start = compressor.protect_first_n
compress_start = compressor._align_boundary_forward(msgs, compress_start)
compress_end = compressor._find_tail_cut_by_tokens(msgs, compress_start)
if compress_start >= compress_end:
if not compressor.has_content_to_compress(msgs):
return "Nothing to compress yet (the transcript is still all protected context)."
loop = asyncio.get_running_loop()
@ -7231,13 +7233,19 @@ class GatewayRunner:
logger.debug("Failed to list titled sessions: %s", e)
return f"Could not list sessions: {e}"
# Resolve the name to a session ID
# Resolve the name to a session ID.
target_id = self._session_db.resolve_session_by_title(name)
if not target_id:
return (
f"No session found matching '**{name}**'.\n"
"Use `/resume` with no arguments to see available sessions."
)
# Compression creates child continuations that hold the live transcript.
# Follow that chain so gateway /resume matches CLI behavior (#15000).
try:
target_id = self._session_db.resolve_resume_session_id(target_id)
except Exception as e:
logger.debug("Failed to resolve resume continuation for %s: %s", target_id, e)
# Check if already on that session
current_entry = self.session_store.get_or_create_session(source)

View file

@ -60,6 +60,10 @@ from .config import (
SessionResetPolicy, # noqa: F401 — re-exported via gateway/__init__.py
HomeChannel,
)
from .whatsapp_identity import (
canonical_whatsapp_identifier,
normalize_whatsapp_identifier,
)
@dataclass
@ -281,6 +285,18 @@ def build_session_context_prompt(
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.BLUEBUBBLES:
lines.append("")
lines.append(
"**Platform notes:** You are responding via iMessage. "
"Keep responses short and conversational — think texts, not essays. "
"Structure longer replies as separate short thoughts, each separated "
"by a blank line (double newline). Each block between blank lines "
"will be delivered as its own iMessage bubble, so write accordingly: "
"one idea per bubble, 13 sentences each. "
"If the user needs a detailed answer, give the short version first "
"and offer to elaborate."
)
# Connected platforms
platforms_list = ["local (files on this machine)"]
@ -518,15 +534,24 @@ def build_session_key(
"""
platform = source.platform.value
if source.chat_type == "dm":
if source.chat_id:
dm_chat_id = source.chat_id
if source.platform == Platform.WHATSAPP:
dm_chat_id = canonical_whatsapp_identifier(source.chat_id)
if dm_chat_id:
if source.thread_id:
return f"agent:main:{platform}:dm:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{source.chat_id}"
return f"agent:main:{platform}:dm:{dm_chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{dm_chat_id}"
if source.thread_id:
return f"agent:main:{platform}:dm:{source.thread_id}"
return f"agent:main:{platform}:dm"
participant_id = source.user_id_alt or source.user_id
if participant_id and source.platform == Platform.WHATSAPP:
# Same JID/LID-flip bug as the DM case: without canonicalisation, a
# single group member gets two isolated per-user sessions when the
# bridge reshuffles alias forms.
participant_id = canonical_whatsapp_identifier(str(participant_id)) or participant_id
key_parts = ["agent:main", platform, source.chat_type]
if source.chat_id:

View file

@ -0,0 +1,135 @@
"""Shared helpers for canonicalising WhatsApp sender identity.
WhatsApp's bridge can surface the same human under two different JID shapes
within a single conversation:
- LID form: ``999999999999999@lid``
- Phone form: ``15551234567@s.whatsapp.net``
Both the authorisation path (:mod:`gateway.run`) and the session-key path
(:mod:`gateway.session`) need to collapse these aliases to a single stable
identity. This module is the single source of truth for that resolution so
the two paths can never drift apart.
Public helpers:
- :func:`normalize_whatsapp_identifier` strip JID/LID/device/plus syntax
down to the bare numeric identifier.
- :func:`canonical_whatsapp_identifier` walk the bridge's
``lid-mapping-*.json`` files and return a stable canonical identity
across phone/LID variants.
- :func:`expand_whatsapp_aliases` return the full alias set for an
identifier. Used by authorisation code that needs to match any known
form of a sender against an allow-list.
Plugins that need per-sender behaviour on WhatsApp (role-based routing,
per-contact authorisation, policy gating in a gateway hook) should use
``canonical_whatsapp_identifier`` so their bookkeeping lines up with
Hermes' own session keys.
"""
from __future__ import annotations
import json
from typing import Set
from hermes_constants import get_hermes_home
def normalize_whatsapp_identifier(value: str) -> str:
"""Strip WhatsApp JID/LID syntax down to its stable numeric identifier.
Accepts any of the identifier shapes the WhatsApp bridge may emit:
``"60123456789@s.whatsapp.net"``, ``"60123456789:47@s.whatsapp.net"``,
``"60123456789@lid"``, or a bare ``"+601****6789"`` / ``"60123456789"``.
Returns just the numeric identifier (``"60123456789"``) suitable for
equality comparisons.
Useful for plugins that want to match sender IDs against
user-supplied config (phone numbers in ``config.yaml``) without
worrying about which variant the bridge happens to deliver.
"""
return (
str(value or "")
.strip()
.replace("+", "", 1)
.split(":", 1)[0]
.split("@", 1)[0]
)
def expand_whatsapp_aliases(identifier: str) -> Set[str]:
"""Resolve WhatsApp phone/LID aliases via bridge session mapping files.
Returns the set of all identifiers transitively reachable through the
bridge's ``$HERMES_HOME/whatsapp/session/lid-mapping-*.json`` files,
starting from ``identifier``. The result always includes the
normalized input itself, so callers can safely ``in`` check against
the return value without a separate fallback branch.
Returns an empty set if ``identifier`` normalizes to empty.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return set()
session_dir = get_hermes_home() / "whatsapp" / "session"
resolved: Set[str] = set()
queue = [normalized]
while queue:
current = queue.pop(0)
if not current or current in resolved:
continue
resolved.add(current)
for suffix in ("", "_reverse"):
mapping_path = session_dir / f"lid-mapping-{current}{suffix}.json"
if not mapping_path.exists():
continue
try:
mapped = normalize_whatsapp_identifier(
json.loads(mapping_path.read_text(encoding="utf-8"))
)
except Exception:
continue
if mapped and mapped not in resolved:
queue.append(mapped)
return resolved
def canonical_whatsapp_identifier(identifier: str) -> str:
"""Return a stable WhatsApp sender identity across phone-JID/LID variants.
WhatsApp may surface the same person under either a phone-format JID
(``60123456789@s.whatsapp.net``) or a LID (``1234567890@lid``). This
applies to a DM ``chat_id`` *and* to the ``participant_id`` of a
member inside a group chat both represent a user identity, and the
bridge may flip between the two for the same human.
This helper reads the bridge's ``whatsapp/session/lid-mapping-*.json``
files, walks the mapping transitively, and picks the shortest
(numeric-preferred) alias as the canonical identity.
:func:`gateway.session.build_session_key` uses this for both WhatsApp
DM chat_ids and WhatsApp group participant_ids, so callers get the
same session-key identity Hermes itself uses.
Plugins that need per-sender behaviour (role-based routing,
authorisation, per-contact policy) should use this so their
bookkeeping lines up with Hermes' session bookkeeping even when
the bridge reshuffles aliases.
Returns an empty string if ``identifier`` normalizes to empty. If no
mapping files exist yet (fresh bridge install), returns the
normalized input unchanged.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return ""
# expand_whatsapp_aliases always includes `normalized` itself in the
# returned set, so the min() below degrades gracefully to `normalized`
# when no lid-mapping files are present.
aliases = expand_whatsapp_aliases(normalized)
return min(aliases, key=lambda candidate: (len(candidate), candidate))

File diff suppressed because it is too large Load diff

View file

@ -110,18 +110,40 @@ def _display_source(source: str) -> str:
return source.split(":", 1)[1] if source.startswith("manual:") else source
def _classify_exhausted_status(entry) -> tuple[str, bool]:
code = getattr(entry, "last_error_code", None)
reason = str(getattr(entry, "last_error_reason", "") or "").strip().lower()
message = str(getattr(entry, "last_error_message", "") or "").strip().lower()
if code == 429 or any(token in reason for token in ("rate_limit", "usage_limit", "quota", "exhausted")) or any(
token in message for token in ("rate limit", "usage limit", "quota", "too many requests")
):
return "rate-limited", True
if code in {401, 403} or any(token in reason for token in ("invalid_token", "invalid_grant", "unauthorized", "forbidden", "auth")) or any(
token in message for token in ("unauthorized", "forbidden", "expired", "revoked", "invalid token", "authentication")
):
return "auth failed", False
return "exhausted", True
def _format_exhausted_status(entry) -> str:
if entry.last_status != STATUS_EXHAUSTED:
return ""
label, show_retry_window = _classify_exhausted_status(entry)
reason = getattr(entry, "last_error_reason", None)
reason_text = f" {reason}" if isinstance(reason, str) and reason.strip() else ""
code = f" ({entry.last_error_code})" if entry.last_error_code else ""
if not show_retry_window:
return f" {label}{reason_text}{code} (re-auth may be required)"
exhausted_until = _exhausted_until(entry)
if exhausted_until is None:
return f" exhausted{reason_text}{code}"
return f" {label}{reason_text}{code}"
remaining = max(0, int(math.ceil(exhausted_until - time.time())))
if remaining <= 0:
return f" exhausted{reason_text}{code} (ready to retry)"
return f" {label}{reason_text}{code} (ready to retry)"
minutes, seconds = divmod(remaining, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
@ -133,7 +155,7 @@ def _format_exhausted_status(entry) -> str:
wait = f"{minutes}m {seconds}s"
else:
wait = f"{seconds}s"
return f" exhausted{reason_text}{code} ({wait} left)"
return f" {label}{reason_text}{code} ({wait} left)"
def auth_add_command(args) -> None:
@ -386,6 +408,44 @@ def auth_reset_command(args) -> None:
print(f"Reset status on {count} {provider} credentials")
def auth_status_command(args) -> None:
provider = _normalize_provider(getattr(args, "provider", "") or "")
if not provider:
raise SystemExit("Provider is required. Example: `hermes auth status spotify`.")
status = auth_mod.get_auth_status(provider)
if not status.get("logged_in"):
reason = status.get("error")
if reason:
print(f"{provider}: logged out ({reason})")
else:
print(f"{provider}: logged out")
return
print(f"{provider}: logged in")
for key in ("auth_type", "client_id", "redirect_uri", "scope", "expires_at", "api_base_url"):
value = status.get(key)
if value:
print(f" {key}: {value}")
def auth_logout_command(args) -> None:
auth_mod.logout_command(SimpleNamespace(provider=getattr(args, "provider", None)))
def auth_spotify_command(args) -> None:
action = str(getattr(args, "spotify_action", "") or "login").strip().lower()
if action in {"", "login"}:
auth_mod.login_spotify_command(args)
return
if action == "status":
auth_status_command(SimpleNamespace(provider="spotify"))
return
if action == "logout":
auth_logout_command(SimpleNamespace(provider="spotify"))
return
raise SystemExit(f"Unknown Spotify auth action: {action}")
def _interactive_auth() -> None:
"""Interactive credential pool management when `hermes auth` is called bare."""
# Show current pool status first
@ -583,5 +643,14 @@ def auth_command(args) -> None:
if action == "reset":
auth_reset_command(args)
return
if action == "status":
auth_status_command(args)
return
if action == "logout":
auth_logout_command(args)
return
if action == "spotify":
auth_spotify_command(args)
return
# No subcommand — launch interactive mode
_interactive_auth()

View file

@ -238,6 +238,52 @@ def get_git_banner_state(repo_dir: Optional[Path] = None) -> Optional[dict]:
return {"upstream": upstream, "local": local, "ahead": max(ahead, 0)}
_RELEASE_URL_BASE = "https://github.com/NousResearch/hermes-agent/releases/tag"
_latest_release_cache: Optional[tuple] = None # (tag, url) once resolved
def get_latest_release_tag(repo_dir: Optional[Path] = None) -> Optional[tuple]:
"""Return ``(tag, release_url)`` for the latest git tag, or None.
Local-only runs ``git describe --tags --abbrev=0`` against the
Hermes checkout. Cached per-process. Release URL always points at the
canonical NousResearch/hermes-agent repo (forks don't get a link).
"""
global _latest_release_cache
if _latest_release_cache is not None:
return _latest_release_cache or None
repo_dir = repo_dir or _resolve_repo_dir()
if repo_dir is None:
_latest_release_cache = () # falsy sentinel — skip future lookups
return None
try:
result = subprocess.run(
["git", "describe", "--tags", "--abbrev=0"],
capture_output=True,
text=True,
timeout=3,
cwd=str(repo_dir),
)
except Exception:
_latest_release_cache = ()
return None
if result.returncode != 0:
_latest_release_cache = ()
return None
tag = (result.stdout or "").strip()
if not tag:
_latest_release_cache = ()
return None
url = f"{_RELEASE_URL_BASE}/{tag}"
_latest_release_cache = (tag, url)
return _latest_release_cache
def format_banner_version_label() -> str:
"""Return the version label shown in the startup banner title."""
base = f"Hermes Agent v{VERSION} ({RELEASE_DATE})"
@ -519,9 +565,16 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
agent_name = _skin_branding("agent_name", "Hermes Agent")
title_color = _skin_color("banner_title", "#FFD700")
border_color = _skin_color("banner_border", "#CD7F32")
version_label = format_banner_version_label()
release_info = get_latest_release_tag()
if release_info:
_tag, _url = release_info
title_markup = f"[bold {title_color}][link={_url}]{version_label}[/link][/]"
else:
title_markup = f"[bold {title_color}]{version_label}[/]"
outer_panel = Panel(
layout_table,
title=f"[bold {title_color}]{format_banner_version_label()}[/]",
title=title_markup,
border_style=border_color,
padding=(0, 2),
)

View file

@ -77,7 +77,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("rollback", "List or restore filesystem checkpoints", "Session",
args_hint="[number]"),
CommandDef("snapshot", "Create or restore state snapshots of Hermes config/state", "Session",
aliases=("snap",), args_hint="[create|restore <id>|prune]"),
cli_only=True, aliases=("snap",), args_hint="[create|restore <id>|prune]"),
CommandDef("stop", "Kill all running background processes", "Session"),
CommandDef("approve", "Approve a pending dangerous command", "Session",
gateway_only=True, args_hint="[session|always]"),
@ -104,9 +104,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--provider name] [--global]"),
CommandDef("provider", "Show available providers and current provider",
"Configuration"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
cli_only=True),
CommandDef("personality", "Set a predefined personality", "Configuration",
args_hint="[name]"),
@ -124,9 +123,12 @@ COMMAND_REGISTRY: list[CommandDef] = [
args_hint="[normal|fast|status]",
subcommands=("normal", "fast", "status", "on", "off")),
CommandDef("skin", "Show or change the display skin/theme", "Configuration",
args_hint="[name]"),
cli_only=True, args_hint="[name]"),
CommandDef("voice", "Toggle voice mode", "Configuration",
args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
cli_only=True, args_hint="[queue|interrupt|status]",
subcommands=("queue", "interrupt", "status")),
# Tools & Skills
CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
@ -139,7 +141,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
cli_only=True, args_hint="[subcommand]",
subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills"),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills",
cli_only=True),
CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
aliases=("reload_mcp",)),
CommandDef("browser", "Connect browser tools to your live Chrome via CDP", "Tools & Skills",
@ -317,7 +320,7 @@ def should_bypass_active_session(command_name: str | None) -> bool:
safety net in gateway.run discards any command text that reaches
the pending queue which meant a mid-run /model (or /reasoning,
/voice, /insights, /title, /resume, /retry, /undo, /compress,
/usage, /provider, /reload-mcp, /sethome, /reset) would silently
/usage, /reload-mcp, /sethome, /reset) would silently
interrupt the agent AND get discarded, producing a zero-char
response. See issue #5057 / PRs #6252, #10370, #4665.

View file

@ -466,6 +466,12 @@ DEFAULT_CONFIG = {
"record_sessions": False, # Auto-record browser sessions as WebM videos
"allow_private_urls": False, # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
"cdp_url": "", # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
# CDP supervisor — dialog + frame detection via a persistent WebSocket.
# Active only when a CDP-capable backend is attached (Browserbase or
# local Chrome via /browser connect). See
# website/docs/developer-guide/browser-supervisor.md.
"dialog_policy": "must_respond", # must_respond | auto_dismiss | auto_accept
"dialog_timeout_s": 300, # Safety auto-dismiss after N seconds under must_respond
"camofox": {
# When true, Hermes sends a stable profile-scoped userId to Camofox
# so the server maps it to a persistent Firefox profile automatically.
@ -486,7 +492,27 @@ DEFAULT_CONFIG = {
# exceed this are rejected with guidance to use offset+limit.
# 100K chars ≈ 2535K tokens across typical tokenisers.
"file_read_max_chars": 100_000,
# Tool-output truncation thresholds. When terminal output or a
# single read_file page exceeds these limits, Hermes truncates the
# payload sent to the model (keeping head + tail for terminal,
# enforcing pagination for read_file). Tuning these trades context
# footprint against how much raw output the model can see in one
# shot. Ported from anomalyco/opencode PR #23770.
#
# - max_bytes: terminal_tool output cap, in chars
# (default 50_000 ≈ 12-15K tokens).
# - max_lines: read_file pagination cap — the maximum `limit`
# a single read_file call can request before
# being clamped (default 2000).
# - max_line_length: per-line cap applied when read_file emits a
# line-numbered view (default 2000 chars).
"tool_output": {
"max_bytes": 50_000,
"max_lines": 2000,
"max_line_length": 2000,
},
"compression": {
"enabled": True,
"threshold": 0.50, # compress when context usage exceeds this ratio
@ -495,6 +521,12 @@ DEFAULT_CONFIG = {
},
# Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
# cache_ttl must be "5m" or "1h" (Anthropic-supported tiers); other values are ignored.
"prompt_caching": {
"cache_ttl": "5m",
},
# AWS Bedrock provider configuration.
# Only used when model.provider is "bedrock".
"bedrock": {
@ -739,6 +771,10 @@ DEFAULT_CONFIG = {
"inherit_mcp_toolsets": True,
"max_iterations": 50, # per-subagent iteration cap (each subagent gets its own budget,
# independent of the parent's max_iterations)
"child_timeout_seconds": 600, # wall-clock timeout for each child agent (floor 30s,
# no ceiling). High-reasoning models on large tasks
# (e.g. gpt-5.5 xhigh, opus-4.6) need generous budgets;
# raise if children time out before producing output.
"reasoning_effort": "", # reasoning effort for subagents: "xhigh", "high", "medium",
# "low", "minimal", "none" (empty = inherit parent's level)
"max_concurrent_children": 3, # max parallel children per batch; floor of 1 enforced, no ceiling

View file

@ -275,6 +275,99 @@ def copilot_device_code_login(
return None
# ─── Copilot Token Exchange ────────────────────────────────────────────────
# Module-level cache for exchanged Copilot API tokens.
# Maps raw_token_fingerprint -> (api_token, expires_at_epoch).
_jwt_cache: dict[str, tuple[str, float]] = {}
_JWT_REFRESH_MARGIN_SECONDS = 120 # refresh 2 min before expiry
# Token exchange endpoint and headers (matching VS Code / Copilot CLI)
_TOKEN_EXCHANGE_URL = "https://api.github.com/copilot_internal/v2/token"
_EDITOR_VERSION = "vscode/1.104.1"
_EXCHANGE_USER_AGENT = "GitHubCopilotChat/0.26.7"
def _token_fingerprint(raw_token: str) -> str:
"""Short fingerprint of a raw token for cache keying (avoids storing full token)."""
import hashlib
return hashlib.sha256(raw_token.encode()).hexdigest()[:16]
def exchange_copilot_token(raw_token: str, *, timeout: float = 10.0) -> tuple[str, float]:
"""Exchange a raw GitHub token for a short-lived Copilot API token.
Calls ``GET https://api.github.com/copilot_internal/v2/token`` with
the raw GitHub token and returns ``(api_token, expires_at)``.
The returned token is a semicolon-separated string (not a standard JWT)
used as ``Authorization: Bearer <token>`` for Copilot API requests.
Results are cached in-process and reused until close to expiry.
Raises ``ValueError`` on failure.
"""
import urllib.request
fp = _token_fingerprint(raw_token)
# Check cache first
cached = _jwt_cache.get(fp)
if cached:
api_token, expires_at = cached
if time.time() < expires_at - _JWT_REFRESH_MARGIN_SECONDS:
return api_token, expires_at
req = urllib.request.Request(
_TOKEN_EXCHANGE_URL,
method="GET",
headers={
"Authorization": f"token {raw_token}",
"User-Agent": _EXCHANGE_USER_AGENT,
"Accept": "application/json",
"Editor-Version": _EDITOR_VERSION,
},
)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
except Exception as exc:
raise ValueError(f"Copilot token exchange failed: {exc}") from exc
api_token = data.get("token", "")
expires_at = data.get("expires_at", 0)
if not api_token:
raise ValueError("Copilot token exchange returned empty token")
# Convert expires_at to float if needed
expires_at = float(expires_at) if expires_at else time.time() + 1800
_jwt_cache[fp] = (api_token, expires_at)
logger.debug(
"Copilot token exchanged, expires_at=%s",
expires_at,
)
return api_token, expires_at
def get_copilot_api_token(raw_token: str) -> str:
"""Exchange a raw GitHub token for a Copilot API token, with fallback.
Convenience wrapper: returns the exchanged token on success, or the
raw token unchanged if the exchange fails (e.g. network error, unsupported
account type). This preserves existing behaviour for accounts that don't
need exchange while enabling access to internal-only models for those that do.
"""
if not raw_token:
return raw_token
try:
api_token, _ = exchange_copilot_token(raw_token)
return api_token
except Exception as exc:
logger.debug("Copilot token exchange failed, using raw token: %s", exc)
return raw_token
# ─── Copilot API Headers ───────────────────────────────────────────────────
def copilot_request_headers(

View file

@ -93,6 +93,9 @@ def cron_list(show_all: bool = False):
script = job.get("script")
if script:
print(f" Script: {script}")
workdir = job.get("workdir")
if workdir:
print(f" Workdir: {workdir}")
# Execution history
last_status = job.get("last_status")
@ -168,6 +171,7 @@ def cron_create(args):
skill=getattr(args, "skill", None),
skills=_normalize_skills(getattr(args, "skill", None), getattr(args, "skills", None)),
script=getattr(args, "script", None),
workdir=getattr(args, "workdir", None),
)
if not result.get("success"):
print(color(f"Failed to create job: {result.get('error', 'unknown error')}", Colors.RED))
@ -180,6 +184,8 @@ def cron_create(args):
job_data = result.get("job", {})
if job_data.get("script"):
print(f" Script: {job_data['script']}")
if job_data.get("workdir"):
print(f" Workdir: {job_data['workdir']}")
print(f" Next run: {result['next_run_at']}")
return 0
@ -218,6 +224,7 @@ def cron_edit(args):
repeat=getattr(args, "repeat", None),
skills=final_skills,
script=getattr(args, "script", None),
workdir=getattr(args, "workdir", None),
)
if not result.get("success"):
print(color(f"Failed to update job: {result.get('error', 'unknown error')}", Colors.RED))
@ -233,6 +240,8 @@ def cron_edit(args):
print(" Skills: none")
if updated.get("script"):
print(f" Script: {updated['script']}")
if updated.get("workdir"):
print(f" Workdir: {updated['workdir']}")
return 0

View file

@ -29,6 +29,7 @@ if _env_path.exists():
load_dotenv(PROJECT_ROOT / ".env", override=False, encoding="utf-8")
from hermes_cli.colors import Colors, color
from hermes_cli.models import _HERMES_USER_AGENT
from hermes_constants import OPENROUTER_MODELS_URL
from utils import base_url_host_matches
@ -295,16 +296,33 @@ def run_doctor(args):
except Exception:
pass
try:
from hermes_cli.auth import resolve_provider as _resolve_provider
from hermes_cli.config import get_compatible_custom_providers as _compatible_custom_providers
from hermes_cli.providers import resolve_provider_full as _resolve_provider_full
except Exception:
_resolve_provider = None
_compatible_custom_providers = None
_resolve_provider_full = None
custom_providers = []
if _compatible_custom_providers is not None:
try:
custom_providers = _compatible_custom_providers(cfg)
except Exception:
custom_providers = []
user_providers = cfg.get("providers")
if isinstance(user_providers, dict):
known_providers.update(str(name).strip().lower() for name in user_providers if str(name).strip())
for entry in custom_providers:
if not isinstance(entry, dict):
continue
name = str(entry.get("name") or "").strip()
if name:
known_providers.add("custom:" + name.lower().replace(" ", "-"))
canonical_provider = provider
if provider and _resolve_provider is not None and provider != "auto":
try:
canonical_provider = _resolve_provider(provider)
except Exception:
canonical_provider = None
if provider and _resolve_provider_full is not None and provider != "auto":
provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
canonical_provider = provider_def.id if provider_def is not None else None
if provider and provider != "auto":
if canonical_provider is None or (known_providers and canonical_provider not in known_providers):
@ -957,7 +975,10 @@ def run_doctor(args):
if base_url_host_matches(_base, "api.kimi.com") and _base.rstrip("/").endswith("/coding"):
_base = _base.rstrip("/") + "/v1"
_url = (_base.rstrip("/") + "/models") if _base else _default_url
_headers = {"Authorization": f"Bearer {_key}"}
_headers = {
"Authorization": f"Bearer {_key}",
"User-Agent": _HERMES_USER_AGENT,
}
if base_url_host_matches(_base, "api.kimi.com"):
_headers["User-Agent"] = "claude-code/0.1.0"
_resp = httpx.get(

View file

@ -267,6 +267,8 @@ def run_dump(args):
("ANTHROPIC_API_KEY", "anthropic"),
("ANTHROPIC_TOKEN", "anthropic_token"),
("NOUS_API_KEY", "nous"),
("GOOGLE_API_KEY", "google/gemini"),
("GEMINI_API_KEY", "gemini"),
("GLM_API_KEY", "glm/zai"),
("ZAI_API_KEY", "zai"),
("KIMI_API_KEY", "kimi"),

View file

@ -166,6 +166,27 @@ from hermes_cli.env_loader import load_hermes_dotenv
load_hermes_dotenv(project_env=PROJECT_ROOT / ".env")
# Bridge security.redact_secrets from config.yaml → HERMES_REDACT_SECRETS env
# var BEFORE hermes_logging imports agent.redact (which snapshots the flag at
# module-import time). Without this, config.yaml's toggle is ignored because
# the setup_logging() call below imports agent.redact, which reads the env var
# exactly once. Env var in .env still wins — this is config.yaml fallback only.
try:
if "HERMES_REDACT_SECRETS" not in os.environ:
import yaml as _yaml_early
_cfg_path = get_hermes_home() / "config.yaml"
if _cfg_path.exists():
with open(_cfg_path, encoding="utf-8") as _f:
_early_sec_cfg = (_yaml_early.safe_load(_f) or {}).get("security", {})
if isinstance(_early_sec_cfg, dict):
_early_redact = _early_sec_cfg.get("redact_secrets")
if _early_redact is not None:
os.environ["HERMES_REDACT_SECRETS"] = str(_early_redact).lower()
del _early_sec_cfg
del _cfg_path
except Exception:
pass # best-effort — redaction stays at default (enabled) on config errors
# Initialize centralized file logging early — all `hermes` subcommands
# (chat, setup, gateway, config, etc.) write to agent.log + errors.log.
try:
@ -1429,6 +1450,7 @@ def select_provider_and_model(args=None):
load_config,
get_env_value,
)
from hermes_cli.providers import resolve_provider_full
config = load_config()
current_model = config.get("model")
@ -1446,14 +1468,30 @@ def select_provider_and_model(args=None):
effective_provider = (
config_provider or os.getenv("HERMES_INFERENCE_PROVIDER") or "auto"
)
try:
active = resolve_provider(effective_provider)
except AuthError as exc:
warning = format_auth_error(exc)
print(f"Warning: {warning} Falling back to auto provider detection.")
compatible_custom_providers = get_compatible_custom_providers(config)
active = None
if effective_provider != "auto":
active_def = resolve_provider_full(
effective_provider,
config.get("providers"),
compatible_custom_providers,
)
if active_def is not None:
active = active_def.id
else:
warning = (
f"Unknown provider '{effective_provider}'. Check 'hermes model' for "
"available providers, or run 'hermes doctor' to diagnose config "
"issues."
)
print(f"Warning: {warning} Falling back to auto provider detection.")
if active is None:
try:
active = resolve_provider("auto")
except AuthError:
except AuthError as exc:
if effective_provider == "auto":
warning = format_auth_error(exc)
print(f"Warning: {warning} Falling back to auto provider detection.")
active = None # no provider yet; default to first in list
# Detect custom endpoint
@ -2311,7 +2349,41 @@ def _model_flow_openai_codex(config, current_model=""):
from hermes_cli.codex_models import get_codex_model_ids
status = get_codex_auth_status()
if not status.get("logged_in"):
if status.get("logged_in"):
print(" OpenAI Codex credentials: ✓")
print()
print(" 1. Use existing credentials")
print(" 2. Reauthenticate (new OAuth login)")
print(" 3. Cancel")
print()
try:
choice = input(" Choice [1/2/3]: ").strip()
except (KeyboardInterrupt, EOFError):
choice = "1"
if choice == "2":
print("Starting a fresh OpenAI Codex login...")
print()
try:
mock_args = argparse.Namespace()
_login_openai_codex(
mock_args,
PROVIDER_REGISTRY["openai-codex"],
force_new_login=True,
)
except SystemExit:
print("Login cancelled or failed.")
return
except Exception as exc:
print(f"Login failed: {exc}")
return
status = get_codex_auth_status()
if not status.get("logged_in"):
print("Login failed.")
return
elif choice == "3":
return
else:
print("Not logged into OpenAI Codex. Starting login...")
print()
try:
@ -2828,11 +2900,16 @@ def _model_flow_named_custom(config, provider_info):
name = provider_info["name"]
base_url = provider_info["base_url"]
api_mode = provider_info.get("api_mode", "")
api_key = provider_info.get("api_key", "")
key_env = provider_info.get("key_env", "")
saved_model = provider_info.get("model", "")
provider_key = (provider_info.get("provider_key") or "").strip()
# Resolve key from env var if api_key not set directly
if not api_key and key_env:
api_key = os.environ.get(key_env, "")
print(f" Provider: {name}")
print(f" URL: {base_url}")
if saved_model:
@ -2840,7 +2917,10 @@ def _model_flow_named_custom(config, provider_info):
print()
print("Fetching available models...")
models = fetch_api_models(api_key, base_url, timeout=8.0)
models = fetch_api_models(
api_key, base_url, timeout=8.0,
api_mode=api_mode or None,
)
if models:
default_idx = 0
@ -3930,12 +4010,71 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
print("Cancelled.")
return
save_env_value(key_env, new_key)
existing_key = new_key
print("API key saved.")
print()
else:
print(f" {pconfig.name} API key: {existing_key[:8]}... ✓")
print()
# Gemini free-tier gate: free-tier daily quotas (<= 250 RPD for Flash)
# are exhausted in a handful of agent turns, so refuse to wire up the
# provider with a free-tier key. Probe is best-effort; network or auth
# errors fall through without blocking.
if provider_id == "gemini" and existing_key:
try:
from agent.gemini_native_adapter import probe_gemini_tier
except Exception:
probe_gemini_tier = None
if probe_gemini_tier is not None:
print(" Checking Gemini API tier...")
probe_base = (
(get_env_value(base_url_env) if base_url_env else "")
or os.getenv(base_url_env or "", "")
or pconfig.inference_base_url
)
tier = probe_gemini_tier(existing_key, probe_base)
if tier == "free":
print()
print(
"❌ This Google API key is on the free tier "
"(<= 250 requests/day for gemini-2.5-flash)."
)
print(
" Hermes typically makes 3-10 API calls per user turn "
"(tool iterations + auxiliary tasks),"
)
print(
" so the free tier is exhausted after a handful of "
"messages and cannot sustain"
)
print(" an agent session.")
print()
print(
" To use Gemini with Hermes, enable billing on your "
"Google Cloud project and regenerate"
)
print(
" the key in a billing-enabled project: "
"https://aistudio.google.com/apikey"
)
print()
print(
" Alternatives with workable free usage: DeepSeek, "
"OpenRouter (free models), Groq, Nous."
)
print()
print("Not saving Gemini as the default provider.")
return
if tier == "paid":
print(" Tier check: paid ✓")
else:
# "unknown" -- network issue, auth problem, unexpected response.
# Don't block; the runtime 429 handler will surface free-tier
# guidance if the key turns out to be free tier.
print(" Tier check: could not verify (proceeding anyway).")
print()
# Optional base URL override
current_base = ""
if base_url_env:
@ -4177,6 +4316,8 @@ def _model_flow_anthropic(config, current_model=""):
from agent.anthropic_adapter import (
read_claude_code_credentials,
is_claude_code_token_valid,
_is_oauth_token,
_resolve_claude_code_token_from_credentials,
)
cc_creds = read_claude_code_credentials()
@ -4185,7 +4326,14 @@ def _model_flow_anthropic(config, current_model=""):
except Exception:
pass
has_creds = bool(existing_key) or cc_available
# Stale-OAuth guard: if the only existing cred is an expired OAuth token
# (no valid cc_creds to fall back on), treat it as missing so the re-auth
# path is offered instead of silently accepting a broken token.
existing_is_stale_oauth = False
if existing_key and _is_oauth_token(existing_key) and not cc_available:
existing_is_stale_oauth = True
has_creds = (bool(existing_key) and not existing_is_stale_oauth) or cc_available
needs_auth = not has_creds
if has_creds:
@ -6567,9 +6715,15 @@ def cmd_dashboard(args):
try:
import fastapi # noqa: F401
import uvicorn # noqa: F401
except ImportError:
print("Web UI dependencies not installed.")
print(f"Install them with: {sys.executable} -m pip install 'fastapi' 'uvicorn[standard]'")
except ImportError as e:
print("Web UI dependencies not installed (need fastapi + uvicorn).")
print(
f"Re-install the package into this interpreter so metadata updates apply:\n"
f" cd {PROJECT_ROOT}\n"
f" {sys.executable} -m pip install -e .\n"
"If `pip` is missing in this venv, use: uv pip install -e ."
)
print(f"Import error: {e}")
sys.exit(1)
if "HERMES_WEB_DIST" not in os.environ:
@ -6578,11 +6732,13 @@ def cmd_dashboard(args):
from hermes_cli.web_server import start_server
embedded_chat = args.tui or os.environ.get("HERMES_DASHBOARD_TUI") == "1"
start_server(
host=args.host,
port=args.port,
open_browser=not args.no_open,
allow_public=getattr(args, "insecure", False),
embedded_chat=embedded_chat,
)
@ -7185,7 +7341,7 @@ For more help on a command:
)
logout_parser.add_argument(
"--provider",
choices=["nous", "openai-codex"],
choices=["nous", "openai-codex", "spotify"],
default=None,
help="Provider to log out from (default: active provider)",
)
@ -7242,6 +7398,17 @@ For more help on a command:
"reset", help="Clear exhaustion status for all credentials for a provider"
)
auth_reset.add_argument("provider", help="Provider id")
auth_status = auth_subparsers.add_parser("status", help="Show auth status for a provider")
auth_status.add_argument("provider", help="Provider id")
auth_logout = auth_subparsers.add_parser("logout", help="Log out a provider and clear stored auth state")
auth_logout.add_argument("provider", help="Provider id")
auth_spotify = auth_subparsers.add_parser("spotify", help="Authenticate Hermes with Spotify via PKCE")
auth_spotify.add_argument("spotify_action", nargs="?", choices=["login", "status", "logout"], default="login")
auth_spotify.add_argument("--client-id", help="Spotify app client_id (or set HERMES_SPOTIFY_CLIENT_ID)")
auth_spotify.add_argument("--redirect-uri", help="Allow-listed localhost redirect URI for your Spotify app")
auth_spotify.add_argument("--scope", help="Override requested Spotify scopes")
auth_spotify.add_argument("--no-browser", action="store_true", help="Do not attempt to open the browser automatically")
auth_spotify.add_argument("--timeout", type=float, help="Callback/token exchange timeout in seconds")
auth_parser.set_defaults(func=cmd_auth)
# =========================================================================
@ -7298,6 +7465,10 @@ For more help on a command:
"--script",
help="Path to a Python script whose stdout is injected into the prompt each run",
)
cron_create.add_argument(
"--workdir",
help="Absolute path for the job to run from. Injects AGENTS.md / CLAUDE.md / .cursorrules from that directory and uses it as the cwd for terminal/file/code_exec tools. Omit to preserve old behaviour (no project context files).",
)
# cron edit
cron_edit = cron_subparsers.add_parser(
@ -7336,6 +7507,10 @@ For more help on a command:
"--script",
help="Path to a Python script whose stdout is injected into the prompt each run. Pass empty string to clear.",
)
cron_edit.add_argument(
"--workdir",
help="Absolute path for the job to run from (injects AGENTS.md etc. and sets terminal cwd). Pass empty string to clear.",
)
# lifecycle actions
cron_pause = cron_subparsers.add_parser("pause", help="Pause a scheduled job")
@ -8749,6 +8924,14 @@ Examples:
action="store_true",
help="Allow binding to non-localhost (DANGEROUS: exposes API keys on the network)",
)
dashboard_parser.add_argument(
"--tui",
action="store_true",
help=(
"Expose the in-browser Chat tab (embedded `hermes --tui` via PTY/WebSocket). "
"Alternatively set HERMES_DASHBOARD_TUI=1."
),
)
dashboard_parser.set_defaults(func=cmd_dashboard)
# =========================================================================

View file

@ -12,8 +12,12 @@ Different LLM providers expect model identifiers in different formats:
model IDs, but Claude still uses hyphenated native names like
``claude-sonnet-4-6``.
- **OpenCode Go** preserves dots in model names: ``minimax-m2.7``.
- **DeepSeek** only accepts two model identifiers:
``deepseek-chat`` and ``deepseek-reasoner``.
- **DeepSeek** accepts ``deepseek-chat`` (V3), ``deepseek-reasoner``
(R1-family), and the first-class V-series IDs (``deepseek-v4-pro``,
``deepseek-v4-flash``, and any future ``deepseek-v<N>-*``). Older
Hermes revisions folded every non-reasoner input into
``deepseek-chat``, which on aggregators routes to V3 so a user
picking V4 Pro was silently downgraded.
- **Custom** and remaining providers pass the name through as-is.
This module centralises that translation so callers can simply write::
@ -25,6 +29,7 @@ Inspired by Clawdbot's ``normalizeAnthropicModelId`` pattern.
from __future__ import annotations
import re
from typing import Optional
# ---------------------------------------------------------------------------
@ -100,6 +105,15 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
"custom",
})
# Providers whose APIs require lowercase model IDs. Xiaomi's
# ``api.xiaomimimo.com`` rejects mixed-case names like ``MiMo-V2.5-Pro``
# that users might copy from marketing docs — it only accepts
# ``mimo-v2.5-pro``. After stripping a matching provider prefix, these
# providers also get ``.lower()`` applied.
_LOWERCASE_MODEL_PROVIDERS: frozenset[str] = frozenset({
"xiaomi",
})
# ---------------------------------------------------------------------------
# DeepSeek special handling
# ---------------------------------------------------------------------------
@ -115,17 +129,30 @@ _DEEPSEEK_REASONER_KEYWORDS: frozenset[str] = frozenset({
})
_DEEPSEEK_CANONICAL_MODELS: frozenset[str] = frozenset({
"deepseek-chat",
"deepseek-reasoner",
"deepseek-chat", # V3 on DeepSeek direct and most aggregators
"deepseek-reasoner", # R1-family reasoning model
"deepseek-v4-pro", # V4 Pro — first-class model ID
"deepseek-v4-flash", # V4 Flash — first-class model ID
})
# First-class V-series IDs (``deepseek-v4-pro``, ``deepseek-v4-flash``,
# future ``deepseek-v5-*``, dated variants like ``deepseek-v4-flash-20260423``).
# Verified empirically 2026-04-24: DeepSeek's Chat Completions API returns
# ``provider: DeepSeek`` / ``model: deepseek-v4-flash-20260423`` when called
# with ``model=deepseek/deepseek-v4-flash``, so these names are not aliases
# of ``deepseek-chat`` and must not be folded into it.
_DEEPSEEK_V_SERIES_RE = re.compile(r"^deepseek-v\d+([-.].+)?$")
def _normalize_for_deepseek(model_name: str) -> str:
"""Map any model input to one of DeepSeek's two accepted identifiers.
"""Map a model input to a DeepSeek-accepted identifier.
Rules:
- Already ``deepseek-chat`` or ``deepseek-reasoner`` -> pass through.
- Contains any reasoner keyword (r1, think, reasoning, cot, reasoner)
- Already a known canonical (``deepseek-chat``/``deepseek-reasoner``/
``deepseek-v4-pro``/``deepseek-v4-flash``) -> pass through.
- Matches the V-series pattern ``deepseek-v<digit>...`` -> pass through
(covers future ``deepseek-v5-*`` and dated variants without a release).
- Contains a reasoner keyword (r1, think, reasoning, cot, reasoner)
-> ``deepseek-reasoner``.
- Everything else -> ``deepseek-chat``.
@ -133,13 +160,17 @@ def _normalize_for_deepseek(model_name: str) -> str:
model_name: The bare model name (vendor prefix already stripped).
Returns:
One of ``"deepseek-chat"`` or ``"deepseek-reasoner"``.
A DeepSeek-accepted model identifier.
"""
bare = _strip_vendor_prefix(model_name).lower()
if bare in _DEEPSEEK_CANONICAL_MODELS:
return bare
# V-series first-class IDs (v4-pro, v4-flash, future v5-*, dated variants)
if _DEEPSEEK_V_SERIES_RE.match(bare):
return bare
# Check for reasoner-like keywords anywhere in the name
for keyword in _DEEPSEEK_REASONER_KEYWORDS:
if keyword in bare:
@ -347,6 +378,9 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
>>> normalize_model_for_provider("claude-sonnet-4.6", "zai")
'claude-sonnet-4.6'
>>> normalize_model_for_provider("MiMo-V2.5-Pro", "xiaomi")
'mimo-v2.5-pro'
"""
name = (model_input or "").strip()
if not name:
@ -410,7 +444,12 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
# --- Direct providers: repair matching provider prefixes only ---
if provider in _MATCHING_PREFIX_STRIP_PROVIDERS:
return _strip_matching_provider_prefix(name, provider)
result = _strip_matching_provider_prefix(name, provider)
# Some providers require lowercase model IDs (e.g. Xiaomi's API
# rejects "MiMo-V2.5-Pro" but accepts "mimo-v2.5-pro").
if provider in _LOWERCASE_MODEL_PROVIDERS:
result = result.lower()
return result
# --- Authoritative native providers: preserve user-facing slugs as-is ---
if provider in _AUTHORITATIVE_NATIVE_PROVIDERS:

View file

@ -527,6 +527,42 @@ def _resolve_alias_fallback(
return None
def resolve_display_context_length(
model: str,
provider: str,
base_url: str = "",
api_key: str = "",
model_info: Optional[ModelInfo] = None,
) -> Optional[int]:
"""Resolve the context length to show in /model output.
models.dev reports per-vendor context (e.g. gpt-5.5 = 1.05M on openai)
but provider-enforced limits can be lower (e.g. Codex OAuth caps the
same slug at 272k). The authoritative source is
``agent.model_metadata.get_model_context_length`` which already knows
about Codex OAuth, Copilot, Nous, and falls back to models.dev for the
rest.
Prefer the provider-aware value; fall back to ``model_info.context_window``
only if the resolver returns nothing.
"""
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
model,
base_url=base_url or "",
api_key=api_key or "",
provider=provider or None,
)
if ctx:
return int(ctx)
except Exception:
pass
if model_info is not None and model_info.context_window:
return int(model_info.context_window)
return None
# ---------------------------------------------------------------------------
# Core model-switching pipeline
# ---------------------------------------------------------------------------
@ -771,7 +807,10 @@ def switch_model(
if provider_changed or explicit_provider:
try:
runtime = resolve_runtime_provider(requested=target_provider)
runtime = resolve_runtime_provider(
requested=target_provider,
target_model=new_model,
)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
@ -788,7 +827,10 @@ def switch_model(
)
else:
try:
runtime = resolve_runtime_provider(requested=current_provider)
runtime = resolve_runtime_provider(
requested=current_provider,
target_model=new_model,
)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
@ -815,6 +857,7 @@ def switch_model(
target_provider,
api_key=api_key,
base_url=base_url,
api_mode=api_mode or None,
)
except Exception as e:
validation = {
@ -936,7 +979,7 @@ def list_authenticated_providers(
from hermes_cli.auth import PROVIDER_REGISTRY
from hermes_cli.models import (
OPENROUTER_MODELS, _PROVIDER_MODELS,
_MODELS_DEV_PREFERRED, _merge_with_models_dev,
_MODELS_DEV_PREFERRED, _merge_with_models_dev, provider_model_ids,
)
results: List[dict] = []
@ -984,6 +1027,14 @@ def list_authenticated_providers(
# Check if any env var is set
has_creds = any(os.environ.get(ev) for ev in env_vars)
if not has_creds:
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
if store and hermes_id in store.get("credential_pool", {}):
has_creds = True
except Exception:
pass
if not has_creds:
continue
@ -1095,11 +1146,14 @@ def list_authenticated_providers(
if not has_creds:
continue
# Use curated list — look up by Hermes slug, fall back to overlay key
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
# Merge with models.dev for preferred providers (same rationale as above).
if hermes_slug in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_slug, model_ids)
if hermes_slug in {"copilot", "copilot-acp"}:
model_ids = provider_model_ids(hermes_slug)
else:
# Use curated list — look up by Hermes slug, fall back to overlay key
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
# Merge with models.dev for preferred providers (same rationale as above).
if hermes_slug in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_slug, model_ids)
total = len(model_ids)
top = model_ids[:max_models]
@ -1222,6 +1276,15 @@ def list_authenticated_providers(
if m and m not in models_list:
models_list.append(m)
# Official OpenAI API rows in providers: often have base_url but no
# explicit models: dict — avoid a misleading zero count in /model.
if not models_list:
url_lower = str(api_url).strip().lower()
if "api.openai.com" in url_lower:
fb = curated.get("openai") or []
if fb:
models_list = list(fb)
# Try to probe /v1/models if URL is set (but don't block on it)
# For now just show what we know from config
results.append({

View file

@ -33,6 +33,8 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("moonshotai/kimi-k2.6", "recommended"),
("deepseek/deepseek-v4-pro", ""),
("deepseek/deepseek-v4-flash", ""),
("anthropic/claude-opus-4.7", ""),
("anthropic/claude-opus-4.6", ""),
("anthropic/claude-sonnet-4.6", ""),
@ -40,7 +42,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-sonnet-4.5", ""),
("anthropic/claude-haiku-4.5", ""),
("openrouter/elephant-alpha", "free"),
("openai/gpt-5.4", ""),
("openai/gpt-5.5", ""),
("openai/gpt-5.4-mini", ""),
("xiaomi/mimo-v2.5-pro", ""),
("xiaomi/mimo-v2.5", ""),
@ -63,7 +65,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("nvidia/nemotron-3-super-120b-a12b:free", "free"),
("arcee-ai/trinity-large-preview:free", "free"),
("arcee-ai/trinity-large-thinking", ""),
("openai/gpt-5.4-pro", ""),
("openai/gpt-5.5-pro", ""),
("openai/gpt-5.4-nano", ""),
]
@ -109,6 +111,8 @@ def _codex_curated_models() -> list[str]:
_PROVIDER_MODELS: dict[str, list[str]] = {
"nous": [
"moonshotai/kimi-k2.6",
"deepseek/deepseek-v4-pro",
"deepseek/deepseek-v4-flash",
"xiaomi/mimo-v2.5-pro",
"xiaomi/mimo-v2.5",
"anthropic/claude-opus-4.7",
@ -116,7 +120,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"anthropic/claude-sonnet-4.6",
"anthropic/claude-sonnet-4.5",
"anthropic/claude-haiku-4.5",
"openai/gpt-5.4",
"openai/gpt-5.5",
"openai/gpt-5.4-mini",
"openai/gpt-5.3-codex",
"google/gemini-3-pro-preview",
@ -135,9 +139,21 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"x-ai/grok-4.20-beta",
"nvidia/nemotron-3-super-120b-a12b",
"arcee-ai/trinity-large-thinking",
"openai/gpt-5.4-pro",
"openai/gpt-5.5-pro",
"openai/gpt-5.4-nano",
],
# Native OpenAI Chat Completions (api.openai.com). Used by /model counts and
# provider_model_ids fallback when /v1/models is unavailable.
"openai": [
"gpt-5.4",
"gpt-5.4-mini",
"gpt-5-mini",
"gpt-5.3-codex",
"gpt-5.2-codex",
"gpt-4.1",
"gpt-4o",
"gpt-4o-mini",
],
"openai-codex": _codex_curated_models(),
"copilot-acp": [
"copilot-acp",
@ -151,10 +167,13 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"gpt-4.1",
"gpt-4o",
"gpt-4o-mini",
"claude-opus-4.6",
"claude-sonnet-4.6",
"claude-sonnet-4",
"claude-sonnet-4.5",
"claude-haiku-4.5",
"gemini-3.1-pro-preview",
"gemini-3-pro-preview",
"gemini-3-flash-preview",
"gemini-2.5-pro",
"grok-code-fast-1",
],
@ -246,6 +265,8 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"claude-haiku-4-5-20251001",
],
"deepseek": [
"deepseek-v4-pro",
"deepseek-v4-flash",
"deepseek-chat",
"deepseek-reasoner",
],
@ -676,7 +697,7 @@ def get_nous_recommended_aux_model(
# ---------------------------------------------------------------------------
# Canonical provider list — single source of truth for provider identity.
# Every code path that lists, displays, or iterates providers derives from
# this list: hermes model, /model, /provider, list_authenticated_providers.
# this list: hermes model, /model, list_authenticated_providers.
#
# Fields:
# slug — internal provider ID (used in config.yaml, --provider flag)
@ -1104,7 +1125,10 @@ def fetch_models_with_pricing(
return _pricing_cache[cache_key]
url = cache_key.rstrip("/") + "/v1/models"
headers: dict[str, str] = {"Accept": "application/json"}
headers: dict[str, str] = {
"Accept": "application/json",
"User-Agent": _HERMES_USER_AGENT,
}
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
@ -1736,6 +1760,17 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
live = fetch_ollama_cloud_models(force_refresh=force_refresh)
if live:
return live
if normalized == "openai":
api_key = os.getenv("OPENAI_API_KEY", "").strip()
if api_key:
base_raw = os.getenv("OPENAI_BASE_URL", "").strip().rstrip("/")
base = base_raw or "https://api.openai.com/v1"
try:
live = fetch_api_models(api_key, base)
if live:
return live
except Exception:
pass
if normalized == "custom":
base_url = _get_custom_base_url()
if base_url:
@ -1890,6 +1925,51 @@ def fetch_github_model_catalog(
return None
# ─── Copilot catalog context-window helpers ─────────────────────────────────
# Module-level cache: {model_id: max_prompt_tokens}
_copilot_context_cache: dict[str, int] = {}
_copilot_context_cache_time: float = 0.0
_COPILOT_CONTEXT_CACHE_TTL = 3600 # 1 hour
def get_copilot_model_context(model_id: str, api_key: Optional[str] = None) -> Optional[int]:
"""Look up max_prompt_tokens for a Copilot model from the live /models API.
Results are cached in-process for 1 hour to avoid repeated API calls.
Returns the token limit or None if not found.
"""
global _copilot_context_cache, _copilot_context_cache_time
# Serve from cache if fresh
if _copilot_context_cache and (time.time() - _copilot_context_cache_time < _COPILOT_CONTEXT_CACHE_TTL):
if model_id in _copilot_context_cache:
return _copilot_context_cache[model_id]
# Cache is fresh but model not in it — don't re-fetch
return None
# Fetch and populate cache
catalog = fetch_github_model_catalog(api_key=api_key)
if not catalog:
return None
cache: dict[str, int] = {}
for item in catalog:
mid = str(item.get("id") or "").strip()
if not mid:
continue
caps = item.get("capabilities") or {}
limits = caps.get("limits") or {}
max_prompt = limits.get("max_prompt_tokens")
if isinstance(max_prompt, int) and max_prompt > 0:
cache[mid] = max_prompt
_copilot_context_cache = cache
_copilot_context_cache_time = time.time()
return cache.get(model_id)
def _is_github_models_base_url(base_url: Optional[str]) -> bool:
normalized = (base_url or "").strip().rstrip("/").lower()
return (
@ -1923,6 +2003,7 @@ _COPILOT_MODEL_ALIASES = {
"openai/o4-mini": "gpt-5-mini",
"anthropic/claude-opus-4.6": "claude-opus-4.6",
"anthropic/claude-sonnet-4.6": "claude-sonnet-4.6",
"anthropic/claude-sonnet-4": "claude-sonnet-4",
"anthropic/claude-sonnet-4.5": "claude-sonnet-4.5",
"anthropic/claude-haiku-4.5": "claude-haiku-4.5",
# Dash-notation fallbacks: Hermes' default Claude IDs elsewhere use
@ -1932,10 +2013,12 @@ _COPILOT_MODEL_ALIASES = {
# "model_not_supported". See issue #6879.
"claude-opus-4-6": "claude-opus-4.6",
"claude-sonnet-4-6": "claude-sonnet-4.6",
"claude-sonnet-4-0": "claude-sonnet-4",
"claude-sonnet-4-5": "claude-sonnet-4.5",
"claude-haiku-4-5": "claude-haiku-4.5",
"anthropic/claude-opus-4-6": "claude-opus-4.6",
"anthropic/claude-sonnet-4-6": "claude-sonnet-4.6",
"anthropic/claude-sonnet-4-0": "claude-sonnet-4",
"anthropic/claude-sonnet-4-5": "claude-sonnet-4.5",
"anthropic/claude-haiku-4-5": "claude-haiku-4.5",
}
@ -2160,8 +2243,15 @@ def probe_api_models(
api_key: Optional[str],
base_url: Optional[str],
timeout: float = 5.0,
api_mode: Optional[str] = None,
) -> dict[str, Any]:
"""Probe an OpenAI-compatible ``/models`` endpoint with light URL heuristics."""
"""Probe a ``/models`` endpoint with light URL heuristics.
For ``anthropic_messages`` mode, uses ``x-api-key`` and
``anthropic-version`` headers (Anthropic's native auth) instead of
``Authorization: Bearer``. The response shape (``data[].id``) is
identical, so the same parser works for both.
"""
normalized = (base_url or "").strip().rstrip("/")
if not normalized:
return {
@ -2193,7 +2283,10 @@ def probe_api_models(
tried: list[str] = []
headers: dict[str, str] = {"User-Agent": _HERMES_USER_AGENT}
if api_key:
if api_key and api_mode == "anthropic_messages":
headers["x-api-key"] = api_key
headers["anthropic-version"] = "2023-06-01"
elif api_key:
headers["Authorization"] = f"Bearer {api_key}"
if normalized.startswith(COPILOT_BASE_URL):
headers.update(copilot_default_headers())
@ -2235,7 +2328,10 @@ def _fetch_ai_gateway_models(timeout: float = 5.0) -> Optional[list[str]]:
base_url = AI_GATEWAY_BASE_URL
url = base_url.rstrip("/") + "/models"
headers: dict[str, str] = {"Authorization": f"Bearer {api_key}"}
headers: dict[str, str] = {
"Authorization": f"Bearer {api_key}",
"User-Agent": _HERMES_USER_AGENT,
}
req = urllib.request.Request(url, headers=headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
@ -2255,13 +2351,14 @@ def fetch_api_models(
api_key: Optional[str],
base_url: Optional[str],
timeout: float = 5.0,
api_mode: Optional[str] = None,
) -> Optional[list[str]]:
"""Fetch the list of available model IDs from the provider's ``/models`` endpoint.
Returns a list of model ID strings, or ``None`` if the endpoint could not
be reached (network error, timeout, auth failure, etc.).
"""
return probe_api_models(api_key, base_url, timeout=timeout).get("models")
return probe_api_models(api_key, base_url, timeout=timeout, api_mode=api_mode).get("models")
# ---------------------------------------------------------------------------
@ -2389,6 +2486,7 @@ def validate_requested_model(
*,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
api_mode: Optional[str] = None,
) -> dict[str, Any]:
"""
Validate a ``/model`` value for the active provider.
@ -2430,7 +2528,11 @@ def validate_requested_model(
}
if normalized == "custom":
probe = probe_api_models(api_key, base_url)
# Try probing with correct auth for the api_mode.
if api_mode == "anthropic_messages":
probe = probe_api_models(api_key, base_url, api_mode=api_mode)
else:
probe = probe_api_models(api_key, base_url)
api_models = probe.get("models")
if api_models is not None:
if requested_for_lookup in set(api_models):
@ -2479,12 +2581,17 @@ def validate_requested_model(
f"Note: could not reach this custom endpoint's model listing at `{probe.get('probed_url')}`. "
f"Hermes will still save `{requested}`, but the endpoint should expose `/models` for verification."
)
if api_mode == "anthropic_messages":
message += (
"\n Many Anthropic-compatible proxies do not implement the Models API "
"(GET /v1/models). The model name has been accepted without verification."
)
if probe.get("suggested_base_url"):
message += f"\n If this server expects `/v1`, try base URL: `{probe.get('suggested_base_url')}`"
return {
"accepted": False,
"persist": False,
"accepted": api_mode == "anthropic_messages",
"persist": True,
"recognized": False,
"message": message,
}
@ -2572,10 +2679,100 @@ def validate_requested_model(
),
}
# Native Anthropic provider: /v1/models requires x-api-key (or Bearer for
# OAuth) plus anthropic-version headers. The generic OpenAI-style probe
# below uses plain Bearer auth and 401s against Anthropic, so dispatch to
# the native fetcher which handles both API keys and Claude-Code OAuth
# tokens. (The api_mode=="anthropic_messages" branch below handles the
# Messages-API transport case separately.)
if normalized == "anthropic":
anthropic_models = _fetch_anthropic_models()
if anthropic_models is not None:
if requested_for_lookup in set(anthropic_models):
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
auto = get_close_matches(requested_for_lookup, anthropic_models, n=1, cutoff=0.9)
if auto:
return {
"accepted": True,
"persist": True,
"recognized": True,
"corrected_model": auto[0],
"message": f"Auto-corrected `{requested}` → `{auto[0]}`",
}
suggestions = get_close_matches(requested, anthropic_models, n=3, cutoff=0.5)
suggestion_text = ""
if suggestions:
suggestion_text = "\n Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
# Accept anyway — Anthropic sometimes gates newer/preview models
# (e.g. snapshot IDs, early-access releases) behind accounts
# even though they aren't listed on /v1/models.
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": (
f"Note: `{requested}` was not found in Anthropic's /v1/models listing. "
f"It may still work if you have early-access or snapshot IDs."
f"{suggestion_text}"
),
}
# _fetch_anthropic_models returned None — no token resolvable or
# network failure. Fall through to the generic warning below.
# Anthropic Messages API: many proxies don't implement /v1/models.
# Try probing with correct auth; if it fails, accept with a warning.
if api_mode == "anthropic_messages":
api_models = fetch_api_models(api_key, base_url, api_mode=api_mode)
if api_models is not None:
if requested_for_lookup in set(api_models):
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
auto = get_close_matches(requested_for_lookup, api_models, n=1, cutoff=0.9)
if auto:
return {
"accepted": True,
"persist": True,
"recognized": True,
"corrected_model": auto[0],
"message": f"Auto-corrected `{requested}` → `{auto[0]}`",
}
# Probe failed or model not found — accept anyway (proxy likely
# doesn't implement the Anthropic Models API).
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": (
f"Note: could not verify `{requested}` against this endpoint's "
f"model listing. Many Anthropic-compatible proxies do not "
f"implement GET /v1/models. The model name has been accepted "
f"without verification."
),
}
# Probe the live API to check if the model actually exists
api_models = fetch_api_models(api_key, base_url)
if api_models is not None:
# Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs
# prefixed with "models/" (e.g. "models/gemini-2.5-flash") — native
# Gemini-API convention. Our curated list and user input both use
# the bare ID, so a direct set-membership check drops every known
# Gemini model. Strip the prefix before comparison. See #12532.
if normalized == "gemini":
api_models = [
m[len("models/"):] if isinstance(m, str) and m.startswith("models/") else m
for m in api_models
]
if requested_for_lookup in set(api_models):
# API confirmed the model exists
return {

View file

@ -38,6 +38,7 @@ PLATFORMS: OrderedDict[str, PlatformInfo] = OrderedDict([
("qqbot", PlatformInfo(label="💬 QQBot", default_toolset="hermes-qqbot")),
("webhook", PlatformInfo(label="🔗 Webhook", default_toolset="hermes-webhook")),
("api_server", PlatformInfo(label="🌐 API Server", default_toolset="hermes-api-server")),
("cron", PlatformInfo(label="⏰ Cron", default_toolset="hermes-cron")),
])

View file

@ -71,6 +71,14 @@ VALID_HOOKS: Set[str] = {
"on_session_finalize",
"on_session_reset",
"subagent_stop",
# Gateway pre-dispatch hook. Fired once per incoming MessageEvent
# after the internal-event guard but BEFORE auth/pairing and agent
# dispatch. Plugins may return a dict to influence flow:
# {"action": "skip", "reason": "..."} -> drop message (no reply)
# {"action": "rewrite", "text": "..."} -> replace event.text, continue
# {"action": "allow"} / None -> normal dispatch
# Kwargs: event: MessageEvent, gateway: GatewayRunner, session_store.
"pre_gateway_dispatch",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"

View file

@ -116,6 +116,10 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="DASHSCOPE_BASE_URL",
),
"alibaba-coding-plan": HermesOverlay(
transport="openai_chat",
base_url_env_var="ALIBABA_CODING_PLAN_BASE_URL",
),
"vercel": HermesOverlay(
transport="openai_chat",
is_aggregator=True,
@ -259,6 +263,9 @@ ALIASES: Dict[str, str] = {
"aliyun": "alibaba",
"qwen": "alibaba",
"alibaba-cloud": "alibaba",
"alibaba_coding": "alibaba-coding-plan",
"alibaba-coding": "alibaba-coding-plan",
"alibaba_coding_plan": "alibaba-coding-plan",
# google-gemini-cli (OAuth + Code Assist)
"gemini-cli": "google-gemini-cli",

229
hermes_cli/pty_bridge.py Normal file
View file

@ -0,0 +1,229 @@
"""PTY bridge for `hermes dashboard` chat tab.
Wraps a child process behind a pseudo-terminal so its ANSI output can be
streamed to a browser-side terminal emulator (xterm.js) and typed
keystrokes can be fed back in. The only caller today is the
``/api/pty`` WebSocket endpoint in ``hermes_cli.web_server``.
Design constraints:
* **POSIX-only.** Hermes Agent supports Windows exclusively via WSL, which
exposes a native POSIX PTY via ``openpty(3)``. Native Windows Python
has no PTY; :class:`PtyUnavailableError` is raised with a user-readable
install/platform message so the dashboard can render a banner instead of
crashing.
* **Zero Node dependency on the server side.** We use :mod:`ptyprocess`,
which is a pure-Python wrapper around the OS calls. The browser talks
to the same ``hermes --tui`` binary it would launch from the CLI, so
every TUI feature (slash popover, model picker, tool rows, markdown,
skin engine, clarify/sudo/approval prompts) ships automatically.
* **Byte-safe I/O.** Reads and writes go through the PTY master fd
directly we avoid :class:`ptyprocess.PtyProcessUnicode` because
streaming ANSI is inherently byte-oriented and UTF-8 boundaries may land
mid-read.
"""
from __future__ import annotations
import errno
import fcntl
import os
import select
import signal
import struct
import sys
import termios
import time
from typing import Optional, Sequence
try:
import ptyprocess # type: ignore
_PTY_AVAILABLE = not sys.platform.startswith("win")
except ImportError: # pragma: no cover - dev env without ptyprocess
ptyprocess = None # type: ignore
_PTY_AVAILABLE = False
__all__ = ["PtyBridge", "PtyUnavailableError"]
class PtyUnavailableError(RuntimeError):
"""Raised when a PTY cannot be created on this platform.
Today this means native Windows (no ConPTY bindings) or a dev
environment missing the ``ptyprocess`` dependency. The dashboard
surfaces the message to the user as a chat-tab banner.
"""
class PtyBridge:
"""Thin wrapper around ``ptyprocess.PtyProcess`` for byte streaming.
Not thread-safe. A single bridge is owned by the WebSocket handler
that spawned it; the reader runs in an executor thread while writes
happen on the event-loop thread. Both sides are OK because the
kernel PTY is the actual synchronization point we never call
:mod:`ptyprocess` methods concurrently, we only call ``os.read`` and
``os.write`` on the master fd, which is safe.
"""
def __init__(self, proc: "ptyprocess.PtyProcess"): # type: ignore[name-defined]
self._proc = proc
self._fd: int = proc.fd
self._closed = False
# -- lifecycle --------------------------------------------------------
@classmethod
def is_available(cls) -> bool:
"""True if a PTY can be spawned on this platform."""
return bool(_PTY_AVAILABLE)
@classmethod
def spawn(
cls,
argv: Sequence[str],
*,
cwd: Optional[str] = None,
env: Optional[dict] = None,
cols: int = 80,
rows: int = 24,
) -> "PtyBridge":
"""Spawn ``argv`` behind a new PTY and return a bridge.
Raises :class:`PtyUnavailableError` if the platform can't host a
PTY. Raises :class:`FileNotFoundError` or :class:`OSError` for
ordinary exec failures (missing binary, bad cwd, etc.).
"""
if not _PTY_AVAILABLE:
if sys.platform.startswith("win"):
raise PtyUnavailableError(
"Pseudo-terminals are unavailable on this platform. "
"Hermes Agent supports Windows only via WSL."
)
if ptyprocess is None:
raise PtyUnavailableError(
"The `ptyprocess` package is missing. "
"Install with: pip install ptyprocess "
"(or pip install -e '.[pty]')."
)
raise PtyUnavailableError("Pseudo-terminals are unavailable.")
# Let caller-supplied env fully override inheritance; if they pass
# None we inherit the server's env (same semantics as subprocess).
spawn_env = os.environ.copy() if env is None else env
proc = ptyprocess.PtyProcess.spawn( # type: ignore[union-attr]
list(argv),
cwd=cwd,
env=spawn_env,
dimensions=(rows, cols),
)
return cls(proc)
@property
def pid(self) -> int:
return int(self._proc.pid)
def is_alive(self) -> bool:
if self._closed:
return False
try:
return bool(self._proc.isalive())
except Exception:
return False
# -- I/O --------------------------------------------------------------
def read(self, timeout: float = 0.2) -> Optional[bytes]:
"""Read up to 64 KiB of raw bytes from the PTY master.
Returns:
* bytes zero or more bytes of child output
* empty bytes (``b""``) no data available within ``timeout``
* None child has exited and the master fd is at EOF
Never blocks longer than ``timeout`` seconds. Safe to call after
:meth:`close`; returns ``None`` in that case.
"""
if self._closed:
return None
try:
readable, _, _ = select.select([self._fd], [], [], timeout)
except (OSError, ValueError):
return None
if not readable:
return b""
try:
data = os.read(self._fd, 65536)
except OSError as exc:
# EIO on Linux = slave side closed. EBADF = already closed.
if exc.errno in (errno.EIO, errno.EBADF):
return None
raise
if not data:
return None
return data
def write(self, data: bytes) -> None:
"""Write raw bytes to the PTY master (i.e. the child's stdin)."""
if self._closed or not data:
return
# os.write can return a short write under load; loop until drained.
view = memoryview(data)
while view:
try:
n = os.write(self._fd, view)
except OSError as exc:
if exc.errno in (errno.EIO, errno.EBADF, errno.EPIPE):
return
raise
if n <= 0:
return
view = view[n:]
def resize(self, cols: int, rows: int) -> None:
"""Forward a terminal resize to the child via ``TIOCSWINSZ``."""
if self._closed:
return
# struct winsize: rows, cols, xpixel, ypixel (all unsigned short)
winsize = struct.pack("HHHH", max(1, rows), max(1, cols), 0, 0)
try:
fcntl.ioctl(self._fd, termios.TIOCSWINSZ, winsize)
except OSError:
pass
# -- teardown ---------------------------------------------------------
def close(self) -> None:
"""Terminate the child (SIGTERM → 0.5s grace → SIGKILL) and close fds.
Idempotent. Reaping the child is important so we don't leak
zombies across the lifetime of the dashboard process.
"""
if self._closed:
return
self._closed = True
# SIGHUP is the conventional "your terminal went away" signal.
# We escalate if the child ignores it.
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
if not self._proc.isalive():
break
try:
self._proc.kill(sig)
except Exception:
pass
deadline = time.monotonic() + 0.5
while self._proc.isalive() and time.monotonic() < deadline:
time.sleep(0.02)
try:
self._proc.close(force=True)
except Exception:
pass
# Context-manager sugar — handy in tests and ad-hoc scripts.
def __enter__(self) -> "PtyBridge":
return self
def __exit__(self, *_exc) -> None:
self.close()

View file

@ -36,6 +36,29 @@ def _normalize_custom_provider_name(value: str) -> str:
return value.strip().lower().replace(" ", "-")
def _loopback_hostname(host: str) -> bool:
h = (host or "").lower().rstrip(".")
return h in {"localhost", "127.0.0.1", "::1", "0.0.0.0"}
def _config_base_url_trustworthy_for_bare_custom(cfg_base_url: str, cfg_provider: str) -> bool:
"""Decide whether ``model.base_url`` may back bare ``custom`` runtime resolution.
GitHub #14676: the model picker can select Custom while ``model.provider`` still reflects a
previous provider. Reject non-loopback URLs unless the YAML provider is already ``custom``,
so a stale OpenRouter/Z.ai base_url cannot hijack local ``custom`` sessions.
"""
cfg_provider_norm = (cfg_provider or "").strip().lower()
bu = (cfg_base_url or "").strip()
if not bu:
return False
if cfg_provider_norm == "custom":
return True
if base_url_host_matches(bu, "openrouter.ai"):
return False
return _loopback_hostname(base_url_hostname(bu))
def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
"""Auto-detect api_mode from the resolved base URL.
@ -160,8 +183,16 @@ def _resolve_runtime_from_pool_entry(
requested_provider: str,
model_cfg: Optional[Dict[str, Any]] = None,
pool: Optional[CredentialPool] = None,
target_model: Optional[str] = None,
) -> Dict[str, Any]:
model_cfg = model_cfg or _get_model_config()
# When the caller is resolving for a specific target model (e.g. a /model
# mid-session switch), prefer that over the persisted model.default. This
# prevents api_mode being computed from a stale config default that no
# longer matches the model actually being used — the bug that caused
# opencode-zen /v1 to be stripped for chat_completions requests when
# config.default was still a Claude model.
effective_model = (target_model or model_cfg.get("default") or "")
base_url = (getattr(entry, "runtime_base_url", None) or getattr(entry, "base_url", None) or "").rstrip("/")
api_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
api_mode = "chat_completions"
@ -207,7 +238,7 @@ def _resolve_runtime_from_pool_entry(
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
api_mode = opencode_model_api_mode(provider, effective_model)
else:
# Auto-detect Anthropic-compatible endpoints (/anthropic suffix,
# Kimi /coding, api.openai.com → codex_responses, api.x.ai →
@ -323,12 +354,16 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
# Found match by provider key
base_url = entry.get("api") or entry.get("url") or entry.get("base_url") or ""
if base_url:
return {
result = {
"name": entry.get("name", ep_name),
"base_url": base_url.strip(),
"api_key": resolved_api_key,
"model": entry.get("default_model", ""),
}
api_mode = _parse_api_mode(entry.get("api_mode"))
if api_mode:
result["api_mode"] = api_mode
return result
# Also check the 'name' field if present
display_name = entry.get("name", "")
if display_name:
@ -337,12 +372,16 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
# Found match by display name
base_url = entry.get("api") or entry.get("url") or entry.get("base_url") or ""
if base_url:
return {
result = {
"name": display_name,
"base_url": base_url.strip(),
"api_key": resolved_api_key,
"model": entry.get("default_model", ""),
}
api_mode = _parse_api_mode(entry.get("api_mode"))
if api_mode:
result["api_mode"] = api_mode
return result
# Fall back to custom_providers: list (legacy format)
custom_providers = config.get("custom_providers")
@ -464,6 +503,7 @@ def _resolve_openrouter_runtime(
cfg_provider = cfg_provider.strip().lower()
env_openrouter_base_url = os.getenv("OPENROUTER_BASE_URL", "").strip()
env_custom_base_url = os.getenv("CUSTOM_BASE_URL", "").strip()
# Use config base_url when available and the provider context matches.
# OPENAI_BASE_URL env var is no longer consulted — config.yaml is
@ -473,11 +513,14 @@ def _resolve_openrouter_runtime(
if requested_norm == "auto":
if not cfg_provider or cfg_provider == "auto":
use_config_base_url = True
elif requested_norm == "custom" and cfg_provider == "custom":
elif requested_norm == "custom" and _config_base_url_trustworthy_for_bare_custom(
cfg_base_url, cfg_provider
):
use_config_base_url = True
base_url = (
(explicit_base_url or "").strip()
or env_custom_base_url
or (cfg_base_url.strip() if use_config_base_url else "")
or env_openrouter_base_url
or OPENROUTER_BASE_URL
@ -689,8 +732,18 @@ def resolve_runtime_provider(
requested: Optional[str] = None,
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
target_model: Optional[str] = None,
) -> Dict[str, Any]:
"""Resolve runtime provider credentials for agent execution."""
"""Resolve runtime provider credentials for agent execution.
target_model: Optional override for model_cfg.get("default") when
computing provider-specific api_mode (e.g. OpenCode Zen/Go where different
models route through different API surfaces). Callers performing an
explicit mid-session model switch should pass the new model here so
api_mode is derived from the model they are switching TO, not the stale
persisted default. Other callers can leave it None to preserve existing
behavior (api_mode derived from config).
"""
requested_provider = resolve_requested_provider(requested)
custom_runtime = _resolve_named_custom_runtime(
@ -772,6 +825,7 @@ def resolve_runtime_provider(
requested_provider=requested_provider,
model_cfg=model_cfg,
pool=pool,
target_model=target_model,
)
if provider == "nous":
@ -990,7 +1044,11 @@ def resolve_runtime_provider(
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
# Prefer the target_model from the caller (explicit mid-session
# switch) over the stale model.default; see _resolve_runtime_from_pool_entry
# for the same rationale.
_effective = target_model or model_cfg.get("default", "")
api_mode = opencode_model_api_mode(provider, _effective)
else:
# Auto-detect Anthropic-compatible endpoints by URL convention
# (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)

View file

@ -500,6 +500,15 @@ def _print_setup_summary(config: dict, hermes_home):
if get_env_value("HASS_TOKEN"):
tool_status.append(("Smart Home (Home Assistant)", True, None))
# Spotify (OAuth via hermes auth spotify — check auth.json, not env vars)
try:
from hermes_cli.auth import get_provider_auth_state
_spotify_state = get_provider_auth_state("spotify") or {}
if _spotify_state.get("access_token") or _spotify_state.get("refresh_token"):
tool_status.append(("Spotify (PKCE OAuth)", True, None))
except Exception:
pass
# Skills Hub
if get_env_value("GITHUB_TOKEN"):
tool_status.append(("Skills Hub (GitHub)", True, None))

View file

@ -164,19 +164,26 @@ def show_status(args):
qwen_status = {}
nous_logged_in = bool(nous_status.get("logged_in"))
nous_error = nous_status.get("error")
nous_label = "logged in" if nous_logged_in else "not logged in (run: hermes auth add nous --type oauth)"
print(
f" {'Nous Portal':<12} {check_mark(nous_logged_in)} "
f"{'logged in' if nous_logged_in else 'not logged in (run: hermes model)'}"
f"{nous_label}"
)
if nous_logged_in:
portal_url = nous_status.get("portal_base_url") or "(unknown)"
access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
portal_url = nous_status.get("portal_base_url") or "(unknown)"
access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
if nous_logged_in or portal_url != "(unknown)" or nous_error:
print(f" Portal URL: {portal_url}")
if nous_logged_in or nous_status.get("access_expires_at"):
print(f" Access exp: {access_exp}")
if nous_logged_in or nous_status.get("agent_key_expires_at"):
print(f" Key exp: {key_exp}")
if nous_logged_in or nous_status.get("has_refresh_token"):
print(f" Refresh: {refresh_label}")
if nous_error and not nous_logged_in:
print(f" Error: {nous_error}")
codex_logged_in = bool(codex_status.get("logged_in"))
print(

View file

@ -127,7 +127,7 @@ TIPS = [
# --- Tools & Capabilities ---
"execute_code runs Python scripts that call Hermes tools programmatically — results stay out of context.",
"delegate_task spawns up to 3 concurrent sub-agents by default (configurable via delegation.max_concurrent_children) with isolated contexts for parallel work.",
"delegate_task spawns up to 3 concurrent sub-agents by default (delegation.max_concurrent_children) with isolated contexts for parallel work.",
"web_extract works on PDF URLs — pass any PDF link and it converts to markdown.",
"search_files is ripgrep-backed and faster than grep — use it instead of terminal grep.",
"patch uses 9 fuzzy matching strategies so minor whitespace differences won't break edits.",

View file

@ -67,12 +67,13 @@ CONFIGURABLE_TOOLSETS = [
("messaging", "📨 Cross-Platform Messaging", "send_message"),
("rl", "🧪 RL Training", "Tinker-Atropos training tools"),
("homeassistant", "🏠 Home Assistant", "smart home device control"),
("spotify", "🎵 Spotify", "playback, search, playlists, library"),
]
# Toolsets that are OFF by default for new installs.
# They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
# but the setup checklist won't pre-select them for first-time users.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl"}
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify"}
def _get_effective_configurable_toolsets():
@ -361,6 +362,18 @@ TOOL_CATEGORIES = {
},
],
},
"spotify": {
"name": "Spotify",
"icon": "🎵",
"providers": [
{
"name": "Spotify Web API",
"tag": "PKCE OAuth — opens the setup wizard",
"env_vars": [],
"post_setup": "spotify",
},
],
},
"rl": {
"name": "RL Training",
"icon": "🧪",
@ -461,6 +474,35 @@ def _run_post_setup(post_setup_key: str):
_print_warning(" kittentts install timed out (>5min)")
_print_info(f" Run manually: python -m pip install -U '{wheel_url}' soundfile")
elif post_setup_key == "spotify":
# Run the full `hermes auth spotify` flow — if the user has no
# client_id yet, this drops them into the interactive wizard
# (opens the Spotify dashboard, prompts for client_id, persists
# to ~/.hermes/.env), then continues straight into PKCE. If they
# already have an app, it skips the wizard and just does OAuth.
from types import SimpleNamespace
try:
from hermes_cli.auth import login_spotify_command
except Exception as exc:
_print_warning(f" Could not load Spotify auth: {exc}")
_print_info(" Run manually: hermes auth spotify")
return
_print_info(" Starting Spotify login...")
try:
login_spotify_command(SimpleNamespace(
client_id=None, redirect_uri=None, scope=None,
no_browser=False, timeout=None,
))
_print_success(" Spotify authenticated")
except SystemExit as exc:
# User aborted the wizard, or OAuth failed — don't fail the
# toolset enable; they can retry with `hermes auth spotify`.
_print_warning(f" Spotify login did not complete: {exc}")
_print_info(" Run later: hermes auth spotify")
except Exception as exc:
_print_warning(f" Spotify login failed: {exc}")
_print_info(" Run manually: hermes auth spotify")
elif post_setup_key == "rl_training":
try:
__import__("tinker_atropos")
@ -590,7 +632,10 @@ def _get_platform_tools(
default_off.remove(platform)
enabled_toolsets -= default_off
# Plugin toolsets: enabled by default unless explicitly disabled.
# Plugin toolsets: enabled by default unless explicitly disabled, or
# unless the toolset is in _DEFAULT_OFF_TOOLSETS (e.g. spotify —
# shipped as a bundled plugin but user must opt in via `hermes tools`
# so we don't ship 7 Spotify tool schemas to users who don't use it).
# A plugin toolset is "known" for a platform once `hermes tools`
# has been saved for that platform (tracked via known_plugin_toolsets).
# Unknown plugins default to enabled; known-but-absent = disabled.
@ -602,6 +647,9 @@ def _get_platform_tools(
if pts in toolset_names:
# Explicitly listed in config — enabled
enabled_toolsets.add(pts)
elif pts in _DEFAULT_OFF_TOOLSETS:
# Opt-in plugin toolset — stay off until user picks it
continue
elif pts not in known_for_platform:
# New plugin not yet seen by hermes tools — default enabled
enabled_toolsets.add(pts)

548
hermes_cli/voice.py Normal file
View file

@ -0,0 +1,548 @@
"""Process-wide voice recording + TTS API for the TUI gateway.
Wraps ``tools.voice_mode`` (recording/transcription) and ``tools.tts_tool``
(text-to-speech) behind idempotent, stateful entry points that the gateway's
``voice.record``, ``voice.toggle``, and ``voice.tts`` JSON-RPC handlers can
call from a dedicated thread. The gateway imports this module lazily so that
missing optional audio deps (sounddevice, faster-whisper, numpy) surface as
an ``ImportError`` at call time, not at startup.
Two usage modes are exposed:
* **Push-to-talk** (``start_recording`` / ``stop_and_transcribe``) single
manually-bounded capture used when the caller drives the start/stop pair
explicitly.
* **Continuous (VAD)** (``start_continuous`` / ``stop_continuous``) mirrors
the classic CLI voice mode: recording auto-stops on silence, transcribes,
hands the result to a callback, and then auto-restarts for the next turn.
Three consecutive no-speech cycles stop the loop and fire
``on_silent_limit`` so the UI can turn the mode off.
"""
from __future__ import annotations
import logging
import os
import sys
import threading
from typing import Any, Callable, Optional
from tools.voice_mode import (
create_audio_recorder,
is_whisper_hallucination,
play_audio_file,
transcribe_recording,
)
logger = logging.getLogger(__name__)
def _debug(msg: str) -> None:
"""Emit a debug breadcrumb when HERMES_VOICE_DEBUG=1.
Goes to stderr so the TUI gateway wraps it as a gateway.stderr event,
which createGatewayEventHandler shows as an Activity line exactly
what we need to diagnose "why didn't the loop auto-restart?" in the
user's real terminal without shipping a separate debug RPC.
Any OSError / BrokenPipeError is swallowed because this fires from
background threads (silence callback, TTS daemon, beep) where a
broken stderr pipe must not kill the whole gateway the main
command pipe (stdin+stdout) is what actually matters.
"""
if os.environ.get("HERMES_VOICE_DEBUG", "").strip() != "1":
return
try:
print(f"[voice] {msg}", file=sys.stderr, flush=True)
except (BrokenPipeError, OSError):
pass
def _beeps_enabled() -> bool:
"""CLI parity: voice.beep_enabled in config.yaml (default True)."""
try:
from hermes_cli.config import load_config
voice_cfg = load_config().get("voice", {})
if isinstance(voice_cfg, dict):
return bool(voice_cfg.get("beep_enabled", True))
except Exception:
pass
return True
def _play_beep(frequency: int, count: int = 1) -> None:
"""Audible cue matching cli.py's record/stop beeps.
880 Hz single-beep on start (cli.py:_voice_start_recording line 7532),
660 Hz double-beep on stop (cli.py:_voice_stop_and_transcribe line 7585).
Best-effort sounddevice failures are silently swallowed so the
voice loop never breaks because a speaker was unavailable.
"""
if not _beeps_enabled():
return
try:
from tools.voice_mode import play_beep
play_beep(frequency=frequency, count=count)
except Exception as e:
_debug(f"beep {frequency}Hz failed: {e}")
# ── Push-to-talk state ───────────────────────────────────────────────
_recorder = None
_recorder_lock = threading.Lock()
# ── Continuous (VAD) state ───────────────────────────────────────────
_continuous_lock = threading.Lock()
_continuous_active = False
_continuous_recorder: Any = None
# ── TTS-vs-STT feedback guard ────────────────────────────────────────
# When TTS plays the agent reply over the speakers, the live microphone
# picks it up and transcribes the agent's own voice as user input — an
# infinite loop the agent happily joins ("Ha, looks like we're in a loop").
# This Event mirrors cli.py:_voice_tts_done: cleared while speak_text is
# playing, set while silent. _continuous_on_silence waits on it before
# re-arming the recorder, and speak_text itself cancels any live capture
# before starting playback so the tail of the previous utterance doesn't
# leak into the mic.
_tts_playing = threading.Event()
_tts_playing.set() # initially "not playing"
_continuous_on_transcript: Optional[Callable[[str], None]] = None
_continuous_on_status: Optional[Callable[[str], None]] = None
_continuous_on_silent_limit: Optional[Callable[[], None]] = None
_continuous_no_speech_count = 0
_CONTINUOUS_NO_SPEECH_LIMIT = 3
# ── Push-to-talk API ─────────────────────────────────────────────────
def start_recording() -> None:
"""Begin capturing from the default input device (push-to-talk).
Idempotent calling again while a recording is in progress is a no-op.
"""
global _recorder
with _recorder_lock:
if _recorder is not None and getattr(_recorder, "is_recording", False):
return
rec = create_audio_recorder()
rec.start()
_recorder = rec
def stop_and_transcribe() -> Optional[str]:
"""Stop the active push-to-talk recording, transcribe, return text.
Returns ``None`` when no recording is active, when the microphone
captured no speech, or when Whisper returned a known hallucination.
"""
global _recorder
with _recorder_lock:
rec = _recorder
_recorder = None
if rec is None:
return None
wav_path = rec.stop()
if not wav_path:
return None
try:
result = transcribe_recording(wav_path)
except Exception as e:
logger.warning("voice transcription failed: %s", e)
return None
finally:
try:
if os.path.isfile(wav_path):
os.unlink(wav_path)
except Exception:
pass
# transcribe_recording returns {"success": bool, "transcript": str, ...}
# — matches cli.py:_voice_stop_and_transcribe's result.get("transcript").
if not result.get("success"):
return None
text = (result.get("transcript") or "").strip()
if not text or is_whisper_hallucination(text):
return None
return text
# ── Continuous (VAD) API ─────────────────────────────────────────────
def start_continuous(
on_transcript: Callable[[str], None],
on_status: Optional[Callable[[str], None]] = None,
on_silent_limit: Optional[Callable[[], None]] = None,
silence_threshold: int = 200,
silence_duration: float = 3.0,
) -> None:
"""Start a VAD-driven continuous recording loop.
The loop calls ``on_transcript(text)`` each time speech is detected and
transcribed successfully, then auto-restarts. After
``_CONTINUOUS_NO_SPEECH_LIMIT`` consecutive silent cycles (no speech
picked up at all) the loop stops itself and calls ``on_silent_limit``
so the UI can reflect "voice off". Idempotent calling while already
active is a no-op.
``on_status`` is called with ``"listening"`` / ``"transcribing"`` /
``"idle"`` so the UI can show a live indicator.
"""
global _continuous_active, _continuous_recorder
global _continuous_on_transcript, _continuous_on_status, _continuous_on_silent_limit
global _continuous_no_speech_count
with _continuous_lock:
if _continuous_active:
_debug("start_continuous: already active — no-op")
return
_continuous_active = True
_continuous_on_transcript = on_transcript
_continuous_on_status = on_status
_continuous_on_silent_limit = on_silent_limit
_continuous_no_speech_count = 0
if _continuous_recorder is None:
_continuous_recorder = create_audio_recorder()
_continuous_recorder._silence_threshold = silence_threshold
_continuous_recorder._silence_duration = silence_duration
rec = _continuous_recorder
_debug(
f"start_continuous: begin (threshold={silence_threshold}, duration={silence_duration}s)"
)
# CLI parity: single 880 Hz beep *before* opening the stream — placing
# the beep after stream.start() on macOS triggers a CoreAudio conflict
# (cli.py:7528 comment).
_play_beep(frequency=880, count=1)
try:
rec.start(on_silence_stop=_continuous_on_silence)
except Exception as e:
logger.error("failed to start continuous recording: %s", e)
_debug(f"start_continuous: rec.start raised {type(e).__name__}: {e}")
with _continuous_lock:
_continuous_active = False
raise
if on_status:
try:
on_status("listening")
except Exception:
pass
def stop_continuous() -> None:
"""Stop the active continuous loop and release the microphone.
Idempotent calling while not active is a no-op. Any in-flight
transcription completes but its result is discarded (the callback
checks ``_continuous_active`` before firing).
"""
global _continuous_active, _continuous_on_transcript
global _continuous_on_status, _continuous_on_silent_limit
global _continuous_recorder, _continuous_no_speech_count
with _continuous_lock:
if not _continuous_active:
return
_continuous_active = False
rec = _continuous_recorder
on_status = _continuous_on_status
_continuous_on_transcript = None
_continuous_on_status = None
_continuous_on_silent_limit = None
_continuous_no_speech_count = 0
if rec is not None:
try:
# cancel() (not stop()) discards buffered frames — the loop
# is over, we don't want to transcribe a half-captured turn.
rec.cancel()
except Exception as e:
logger.warning("failed to cancel recorder: %s", e)
# Audible "recording stopped" cue (CLI parity: same 660 Hz × 2 the
# silence-auto-stop path plays).
_play_beep(frequency=660, count=2)
if on_status:
try:
on_status("idle")
except Exception:
pass
def is_continuous_active() -> bool:
"""Whether a continuous voice loop is currently running."""
with _continuous_lock:
return _continuous_active
def _continuous_on_silence() -> None:
"""AudioRecorder silence callback — runs in a daemon thread.
Stops the current capture, transcribes, delivers the text via
``on_transcript``, and if the loop is still active starts the
next capture. Three consecutive silent cycles end the loop.
"""
global _continuous_active, _continuous_no_speech_count
_debug("_continuous_on_silence: fired")
with _continuous_lock:
if not _continuous_active:
_debug("_continuous_on_silence: loop inactive — abort")
return
rec = _continuous_recorder
on_transcript = _continuous_on_transcript
on_status = _continuous_on_status
on_silent_limit = _continuous_on_silent_limit
if rec is None:
_debug("_continuous_on_silence: no recorder — abort")
return
if on_status:
try:
on_status("transcribing")
except Exception:
pass
wav_path = rec.stop()
# Peak RMS is the critical diagnostic when stop() returns None despite
# the VAD firing — tells us at a glance whether the mic was too quiet
# for SILENCE_RMS_THRESHOLD (200) or the VAD + peak checks disagree.
peak_rms = getattr(rec, "_peak_rms", -1)
_debug(
f"_continuous_on_silence: rec.stop -> {wav_path!r} (peak_rms={peak_rms})"
)
# CLI parity: double 660 Hz beep after the stream stops (safe from the
# CoreAudio conflict that blocks pre-start beeps).
_play_beep(frequency=660, count=2)
transcript: Optional[str] = None
if wav_path:
try:
result = transcribe_recording(wav_path)
# transcribe_recording returns {"success": bool, "transcript": str,
# "error": str?} — NOT {"text": str}. Using the wrong key silently
# produced empty transcripts even when Groq/local STT returned fine,
# which masqueraded as "not hearing the user" to the caller.
success = bool(result.get("success"))
text = (result.get("transcript") or "").strip()
err = result.get("error")
_debug(
f"_continuous_on_silence: transcribe -> success={success} "
f"text={text!r} err={err!r}"
)
if success and text and not is_whisper_hallucination(text):
transcript = text
except Exception as e:
logger.warning("continuous transcription failed: %s", e)
_debug(f"_continuous_on_silence: transcribe raised {type(e).__name__}: {e}")
finally:
try:
if os.path.isfile(wav_path):
os.unlink(wav_path)
except Exception:
pass
with _continuous_lock:
if not _continuous_active:
# User stopped us while we were transcribing — discard.
_debug("_continuous_on_silence: stopped during transcribe — no restart")
return
if transcript:
_continuous_no_speech_count = 0
else:
_continuous_no_speech_count += 1
should_halt = _continuous_no_speech_count >= _CONTINUOUS_NO_SPEECH_LIMIT
no_speech = _continuous_no_speech_count
if transcript and on_transcript:
try:
on_transcript(transcript)
except Exception as e:
logger.warning("on_transcript callback raised: %s", e)
if should_halt:
_debug(f"_continuous_on_silence: {no_speech} silent cycles — halting")
with _continuous_lock:
_continuous_active = False
_continuous_no_speech_count = 0
if on_silent_limit:
try:
on_silent_limit()
except Exception:
pass
try:
rec.cancel()
except Exception:
pass
if on_status:
try:
on_status("idle")
except Exception:
pass
return
# CLI parity (cli.py:10619-10621): wait for any in-flight TTS to
# finish before re-arming the mic, then leave a small gap to avoid
# catching the tail of the speaker output. Without this the voice
# loop becomes a feedback loop — the agent's spoken reply lands
# back in the mic and gets re-submitted.
if not _tts_playing.is_set():
_debug("_continuous_on_silence: waiting for TTS to finish")
_tts_playing.wait(timeout=60)
import time as _time
_time.sleep(0.3)
# User may have stopped the loop during the wait.
with _continuous_lock:
if not _continuous_active:
_debug("_continuous_on_silence: stopped while waiting for TTS")
return
# Restart for the next turn.
_debug(f"_continuous_on_silence: restarting loop (no_speech={no_speech})")
_play_beep(frequency=880, count=1)
try:
rec.start(on_silence_stop=_continuous_on_silence)
except Exception as e:
logger.error("failed to restart continuous recording: %s", e)
_debug(f"_continuous_on_silence: restart raised {type(e).__name__}: {e}")
with _continuous_lock:
_continuous_active = False
return
if on_status:
try:
on_status("listening")
except Exception:
pass
# ── TTS API ──────────────────────────────────────────────────────────
def speak_text(text: str) -> None:
"""Synthesize ``text`` with the configured TTS provider and play it.
Mirrors cli.py:_voice_speak_response exactly same markdown strip
pipeline, same 4000-char cap, same explicit mp3 output path, same
MP3-over-OGG playback choice (afplay misbehaves on OGG), same cleanup
of both extensions. Keeping these in sync means a voice-mode TTS
session in the TUI sounds identical to one in the classic CLI.
While playback is in flight the module-level _tts_playing Event is
cleared so the continuous-recording loop knows to wait before
re-arming the mic (otherwise the agent's spoken reply feedback-loops
through the microphone and the agent ends up replying to itself).
"""
if not text or not text.strip():
return
import re
import tempfile
import time
# Cancel any live capture before we open the speakers — otherwise the
# last ~200ms of the user's turn tail + the first syllables of our TTS
# both end up in the next recording window. The continuous loop will
# re-arm itself after _tts_playing flips back (see _continuous_on_silence).
paused_recording = False
with _continuous_lock:
if (
_continuous_active
and _continuous_recorder is not None
and getattr(_continuous_recorder, "is_recording", False)
):
try:
_continuous_recorder.cancel()
paused_recording = True
except Exception as e:
logger.warning("failed to pause recorder for TTS: %s", e)
_tts_playing.clear()
_debug(f"speak_text: TTS begin (paused_recording={paused_recording})")
try:
from tools.tts_tool import text_to_speech_tool
tts_text = text[:4000] if len(text) > 4000 else text
tts_text = re.sub(r'```[\s\S]*?```', ' ', tts_text) # fenced code blocks
tts_text = re.sub(r'\[([^\]]+)\]\([^)]+\)', r'\1', tts_text) # [text](url) → text
tts_text = re.sub(r'https?://\S+', '', tts_text) # bare URLs
tts_text = re.sub(r'\*\*(.+?)\*\*', r'\1', tts_text) # bold
tts_text = re.sub(r'\*(.+?)\*', r'\1', tts_text) # italic
tts_text = re.sub(r'`(.+?)`', r'\1', tts_text) # inline code
tts_text = re.sub(r'^#+\s*', '', tts_text, flags=re.MULTILINE) # headers
tts_text = re.sub(r'^\s*[-*]\s+', '', tts_text, flags=re.MULTILINE) # list bullets
tts_text = re.sub(r'---+', '', tts_text) # horizontal rules
tts_text = re.sub(r'\n{3,}', '\n\n', tts_text) # excess newlines
tts_text = tts_text.strip()
if not tts_text:
return
# MP3 output path, pre-chosen so we can play the MP3 directly even
# when text_to_speech_tool auto-converts to OGG for messaging
# platforms. afplay's OGG support is flaky, MP3 always works.
os.makedirs(os.path.join(tempfile.gettempdir(), "hermes_voice"), exist_ok=True)
mp3_path = os.path.join(
tempfile.gettempdir(),
"hermes_voice",
f"tts_{time.strftime('%Y%m%d_%H%M%S')}.mp3",
)
_debug(f"speak_text: synthesizing {len(tts_text)} chars -> {mp3_path}")
text_to_speech_tool(text=tts_text, output_path=mp3_path)
if os.path.isfile(mp3_path) and os.path.getsize(mp3_path) > 0:
_debug(f"speak_text: playing {mp3_path} ({os.path.getsize(mp3_path)} bytes)")
play_audio_file(mp3_path)
try:
os.unlink(mp3_path)
ogg_path = mp3_path.rsplit(".", 1)[0] + ".ogg"
if os.path.isfile(ogg_path):
os.unlink(ogg_path)
except OSError:
pass
else:
_debug(f"speak_text: TTS tool produced no audio at {mp3_path}")
except Exception as e:
logger.warning("Voice TTS playback failed: %s", e)
_debug(f"speak_text raised {type(e).__name__}: {e}")
finally:
_tts_playing.set()
_debug("speak_text: TTS done")
# Re-arm the mic so the user can answer without pressing Ctrl+B.
# Small delay lets the OS flush speaker output and afplay fully
# release the audio device before sounddevice re-opens the input.
if paused_recording:
time.sleep(0.3)
with _continuous_lock:
if _continuous_active and _continuous_recorder is not None:
try:
_continuous_recorder.start(
on_silence_stop=_continuous_on_silence
)
_debug("speak_text: recording resumed after TTS")
except Exception as e:
logger.warning(
"failed to resume recorder after TTS: %s", e
)

View file

@ -49,7 +49,7 @@ from hermes_cli.config import (
from gateway.status import get_running_pid, read_runtime_status
try:
from fastapi import FastAPI, HTTPException, Request
from fastapi import FastAPI, HTTPException, Request, WebSocket, WebSocketDisconnect
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
@ -73,6 +73,10 @@ app = FastAPI(title="Hermes Agent", version=__version__)
_SESSION_TOKEN = secrets.token_urlsafe(32)
_SESSION_HEADER_NAME = "X-Hermes-Session-Token"
# In-browser Chat tab (/chat, /api/pty, …). Off unless ``hermes dashboard --tui``
# or HERMES_DASHBOARD_TUI=1. Set from :func:`start_server`.
_DASHBOARD_EMBEDDED_CHAT_ENABLED = False
# Simple rate limiter for the reveal endpoint
_reveal_timestamps: List[float] = []
_REVEAL_MAX_PER_WINDOW = 5
@ -283,7 +287,7 @@ _SCHEMA_OVERRIDES: Dict[str, Dict[str, Any]] = {
"display.busy_input_mode": {
"type": "select",
"description": "Input behavior while agent is running",
"options": ["queue", "interrupt", "block"],
"options": ["interrupt", "queue"],
},
"memory.provider": {
"type": "select",
@ -1529,26 +1533,30 @@ def _submit_anthropic_pkce(session_id: str, code_input: str) -> Dict[str, Any]:
with urllib.request.urlopen(req, timeout=20) as resp:
result = json.loads(resp.read().decode())
except Exception as e:
sess["status"] = "error"
sess["error_message"] = f"Token exchange failed: {e}"
with _oauth_sessions_lock:
sess["status"] = "error"
sess["error_message"] = f"Token exchange failed: {e}"
return {"ok": False, "status": "error", "message": sess["error_message"]}
access_token = result.get("access_token", "")
refresh_token = result.get("refresh_token", "")
expires_in = int(result.get("expires_in") or 3600)
if not access_token:
sess["status"] = "error"
sess["error_message"] = "No access token returned"
with _oauth_sessions_lock:
sess["status"] = "error"
sess["error_message"] = "No access token returned"
return {"ok": False, "status": "error", "message": sess["error_message"]}
expires_at_ms = int(time.time() * 1000) + (expires_in * 1000)
try:
_save_anthropic_oauth_creds(access_token, refresh_token, expires_at_ms)
except Exception as e:
sess["status"] = "error"
sess["error_message"] = f"Save failed: {e}"
with _oauth_sessions_lock:
sess["status"] = "error"
sess["error_message"] = f"Save failed: {e}"
return {"ok": False, "status": "error", "message": sess["error_message"]}
sess["status"] = "approved"
with _oauth_sessions_lock:
sess["status"] = "approved"
_log.info("oauth/pkce: anthropic login completed (session=%s)", session_id)
return {"ok": True, "status": "approved"}
@ -2263,6 +2271,329 @@ async def get_usage_analytics(days: int = 30):
db.close()
# ---------------------------------------------------------------------------
# /api/pty — PTY-over-WebSocket bridge for the dashboard "Chat" tab.
#
# The endpoint spawns the same ``hermes --tui`` binary the CLI uses, behind
# a POSIX pseudo-terminal, and forwards bytes + resize escapes across a
# WebSocket. The browser renders the ANSI through xterm.js (see
# web/src/pages/ChatPage.tsx).
#
# Auth: ``?token=<session_token>`` query param (browsers can't set
# Authorization on the WS upgrade). Same ephemeral ``_SESSION_TOKEN`` as
# REST. Localhost-only — we defensively reject non-loopback clients even
# though uvicorn binds to 127.0.0.1.
# ---------------------------------------------------------------------------
import re
import asyncio
from hermes_cli.pty_bridge import PtyBridge, PtyUnavailableError
_RESIZE_RE = re.compile(rb"\x1b\[RESIZE:(\d+);(\d+)\]")
_PTY_READ_CHUNK_TIMEOUT = 0.2
_VALID_CHANNEL_RE = re.compile(r"^[A-Za-z0-9._-]{1,128}$")
# Starlette's TestClient reports the peer as "testclient"; treat it as
# loopback so tests don't need to rewrite request scope.
_LOOPBACK_HOSTS = frozenset({"127.0.0.1", "::1", "localhost", "testclient"})
# Per-channel subscriber registry used by /api/pub (PTY-side gateway → dashboard)
# and /api/events (dashboard → browser sidebar). Keyed by an opaque channel id
# the chat tab generates on mount; entries auto-evict when the last subscriber
# drops AND the publisher has disconnected.
_event_channels: dict[str, set] = {}
_event_lock = asyncio.Lock()
def _resolve_chat_argv(
resume: Optional[str] = None,
sidecar_url: Optional[str] = None,
) -> tuple[list[str], Optional[str], Optional[dict]]:
"""Resolve the argv + cwd + env for the chat PTY.
Default: whatever ``hermes --tui`` would run. Tests monkeypatch this
function to inject a tiny fake command (``cat``, ``sh -c 'printf …'``)
so nothing has to build Node or the TUI bundle.
Session resume is propagated via the ``HERMES_TUI_RESUME`` env var
matching what ``hermes_cli.main._launch_tui`` does for the CLI path.
Appending ``--resume <id>`` to argv doesn't work because ``ui-tui`` does
not parse its argv.
`sidecar_url` (when set) is forwarded as ``HERMES_TUI_SIDECAR_URL`` so
the spawned ``tui_gateway.entry`` can mirror dispatcher emits to the
dashboard's ``/api/pub`` endpoint (see :func:`pub_ws`).
"""
from hermes_cli.main import PROJECT_ROOT, _make_tui_argv
argv, cwd = _make_tui_argv(PROJECT_ROOT / "ui-tui", tui_dev=False)
env: Optional[dict] = None
if resume or sidecar_url:
env = os.environ.copy()
if resume:
env["HERMES_TUI_RESUME"] = resume
if sidecar_url:
env["HERMES_TUI_SIDECAR_URL"] = sidecar_url
return list(argv), str(cwd) if cwd else None, env
def _build_sidecar_url(channel: str) -> Optional[str]:
"""ws:// URL the PTY child should publish events to, or None when unbound."""
host = getattr(app.state, "bound_host", None)
port = getattr(app.state, "bound_port", None)
if not host or not port:
return None
netloc = f"[{host}]:{port}" if ":" in host and not host.startswith("[") else f"{host}:{port}"
qs = urllib.parse.urlencode({"token": _SESSION_TOKEN, "channel": channel})
return f"ws://{netloc}/api/pub?{qs}"
async def _broadcast_event(channel: str, payload: str) -> None:
"""Fan out one publisher frame to every subscriber on `channel`."""
async with _event_lock:
subs = list(_event_channels.get(channel, ()))
for sub in subs:
try:
await sub.send_text(payload)
except Exception:
# Subscriber went away mid-send; the /api/events finally clause
# will remove it from the registry on its next iteration.
pass
def _channel_or_close_code(ws: WebSocket) -> Optional[str]:
"""Return the channel id from the query string or None if invalid."""
channel = ws.query_params.get("channel", "")
return channel if _VALID_CHANNEL_RE.match(channel) else None
@app.websocket("/api/pty")
async def pty_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
# --- auth + loopback check (before accept so we can close cleanly) ---
token = ws.query_params.get("token", "")
expected = _SESSION_TOKEN
if not hmac.compare_digest(token.encode(), expected.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
await ws.accept()
# --- spawn PTY ------------------------------------------------------
resume = ws.query_params.get("resume") or None
channel = _channel_or_close_code(ws)
sidecar_url = _build_sidecar_url(channel) if channel else None
try:
argv, cwd, env = _resolve_chat_argv(resume=resume, sidecar_url=sidecar_url)
except SystemExit as exc:
# _make_tui_argv calls sys.exit(1) when node/npm is missing.
await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
await ws.close(code=1011)
return
try:
bridge = PtyBridge.spawn(argv, cwd=cwd, env=env)
except PtyUnavailableError as exc:
await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
await ws.close(code=1011)
return
except (FileNotFoundError, OSError) as exc:
await ws.send_text(f"\r\n\x1b[31mChat failed to start: {exc}\x1b[0m\r\n")
await ws.close(code=1011)
return
loop = asyncio.get_running_loop()
# --- reader task: PTY master → WebSocket ----------------------------
async def pump_pty_to_ws() -> None:
while True:
chunk = await loop.run_in_executor(
None, bridge.read, _PTY_READ_CHUNK_TIMEOUT
)
if chunk is None: # EOF
return
if not chunk: # no data this tick; yield control and retry
await asyncio.sleep(0)
continue
try:
await ws.send_bytes(chunk)
except Exception:
return
reader_task = asyncio.create_task(pump_pty_to_ws())
# --- writer loop: WebSocket → PTY master ----------------------------
try:
while True:
msg = await ws.receive()
msg_type = msg.get("type")
if msg_type == "websocket.disconnect":
break
raw = msg.get("bytes")
if raw is None:
text = msg.get("text")
raw = text.encode("utf-8") if isinstance(text, str) else b""
if not raw:
continue
# Resize escape is consumed locally, never written to the PTY.
match = _RESIZE_RE.match(raw)
if match and match.end() == len(raw):
cols = int(match.group(1))
rows = int(match.group(2))
bridge.resize(cols=cols, rows=rows)
continue
bridge.write(raw)
except WebSocketDisconnect:
pass
finally:
reader_task.cancel()
try:
await reader_task
except (asyncio.CancelledError, Exception):
pass
bridge.close()
# ---------------------------------------------------------------------------
# /api/ws — JSON-RPC WebSocket sidecar for the dashboard "Chat" tab.
#
# Drives the same `tui_gateway.dispatch` surface Ink uses over stdio, so the
# dashboard can render structured metadata (model badge, tool-call sidebar,
# slash launcher, session info) alongside the xterm.js terminal that PTY
# already paints. Both transports bind to the same session id when one is
# active, so a tool.start emitted by the agent fans out to both sinks.
# ---------------------------------------------------------------------------
@app.websocket("/api/ws")
async def gateway_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
token = ws.query_params.get("token", "")
if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
from tui_gateway.ws import handle_ws
await handle_ws(ws)
# ---------------------------------------------------------------------------
# /api/pub + /api/events — chat-tab event broadcast.
#
# The PTY-side ``tui_gateway.entry`` opens /api/pub at startup (driven by
# HERMES_TUI_SIDECAR_URL set in /api/pty's PTY env) and writes every
# dispatcher emit through it. The dashboard fans those frames out to any
# subscriber that opened /api/events on the same channel id. This is what
# gives the React sidebar its tool-call feed without breaking the PTY
# child's stdio handshake with Ink.
# ---------------------------------------------------------------------------
@app.websocket("/api/pub")
async def pub_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
token = ws.query_params.get("token", "")
if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
channel = _channel_or_close_code(ws)
if not channel:
await ws.close(code=4400)
return
await ws.accept()
try:
while True:
await _broadcast_event(channel, await ws.receive_text())
except WebSocketDisconnect:
pass
@app.websocket("/api/events")
async def events_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
token = ws.query_params.get("token", "")
if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
channel = _channel_or_close_code(ws)
if not channel:
await ws.close(code=4400)
return
await ws.accept()
async with _event_lock:
_event_channels.setdefault(channel, set()).add(ws)
try:
while True:
# Subscribers don't speak — the receive() just blocks until
# disconnect so the connection stays open as long as the
# browser holds it.
await ws.receive_text()
except WebSocketDisconnect:
pass
finally:
async with _event_lock:
subs = _event_channels.get(channel)
if subs is not None:
subs.discard(ws)
if not subs:
_event_channels.pop(channel, None)
def mount_spa(application: FastAPI):
"""Mount the built SPA. Falls back to index.html for client-side routing.
@ -2284,8 +2615,10 @@ def mount_spa(application: FastAPI):
def _serve_index():
"""Return index.html with the session token injected."""
html = _index_path.read_text()
chat_js = "true" if _DASHBOARD_EMBEDDED_CHAT_ENABLED else "false"
token_script = (
f'<script>window.__HERMES_SESSION_TOKEN__="{_SESSION_TOKEN}";</script>'
f'<script>window.__HERMES_SESSION_TOKEN__="{_SESSION_TOKEN}";'
f"window.__HERMES_DASHBOARD_EMBEDDED_CHAT__={chat_js};</script>"
)
html = html.replace("</head>", f"{token_script}</head>", 1)
return HTMLResponse(
@ -2798,10 +3131,15 @@ def start_server(
port: int = 9119,
open_browser: bool = True,
allow_public: bool = False,
*,
embedded_chat: bool = False,
):
"""Start the web UI server."""
import uvicorn
global _DASHBOARD_EMBEDDED_CHAT_ENABLED
_DASHBOARD_EMBEDDED_CHAT_ENABLED = embedded_chat
_LOCALHOST = ("127.0.0.1", "localhost", "::1")
if host not in _LOCALHOST and not allow_public:
raise SystemExit(
@ -2817,7 +3155,10 @@ def start_server(
# Record the bound host so host_header_middleware can validate incoming
# Host headers against it. Defends against DNS rebinding (GHSA-ppp5-vxwm-4cf7).
# bound_port is also stashed so /api/pty can build the back-WS URL the
# PTY child uses to publish events to the dashboard sidebar.
app.state.bound_host = host
app.state.bound_port = port
if open_browser:
import webbrowser

View file

@ -1039,6 +1039,71 @@ class SessionDB:
result.append(msg)
return result
def resolve_resume_session_id(self, session_id: str) -> str:
"""Redirect a resume target to the descendant session that holds the messages.
Context compression ends the current session and forks a new child session
(linked via ``parent_session_id``). The flush cursor is reset, so the
child is where new messages actually land the parent ends up with
``message_count = 0`` rows unless messages had already been flushed to
it before compression. See #15000.
This helper walks ``parent_session_id`` forward from ``session_id`` and
returns the first descendant in the chain that has at least one message
row. If the original session already has messages, or no descendant
has any, the original ``session_id`` is returned unchanged.
The chain is always walked via the child whose ``started_at`` is
latest; that matches the single-chain shape that compression creates.
A depth cap (32) guards against accidental loops in malformed data.
"""
if not session_id:
return session_id
with self._lock:
# If this session already has messages, nothing to redirect.
try:
row = self._conn.execute(
"SELECT 1 FROM messages WHERE session_id = ? LIMIT 1",
(session_id,),
).fetchone()
except Exception:
return session_id
if row is not None:
return session_id
# Walk descendants: at each step, pick the most-recently-started
# child session; stop once we find one with messages.
current = session_id
seen = {current}
for _ in range(32):
try:
child_row = self._conn.execute(
"SELECT id FROM sessions "
"WHERE parent_session_id = ? "
"ORDER BY started_at DESC, id DESC LIMIT 1",
(current,),
).fetchone()
except Exception:
return session_id
if child_row is None:
return session_id
child_id = child_row["id"] if hasattr(child_row, "keys") else child_row[0]
if not child_id or child_id in seen:
return session_id
seen.add(child_id)
try:
msg_row = self._conn.execute(
"SELECT 1 FROM messages WHERE session_id = ? LIMIT 1",
(child_id,),
).fetchone()
except Exception:
return session_id
if msg_row is not None:
return child_id
current = child_id
return session_id
def get_messages_as_conversation(self, session_id: str) -> List[Dict[str, Any]]:
"""
Load messages in the OpenAI conversation format (role + content dicts).

View file

@ -343,6 +343,18 @@ def get_tool_definitions(
global _last_resolved_tool_names
_last_resolved_tool_names = [t["function"]["name"] for t in filtered_tools]
# Sanitize schemas for broad backend compatibility. llama.cpp's
# json-schema-to-grammar converter (used by its OAI server to build
# GBNF tool-call parsers) rejects some shapes that cloud providers
# silently accept — bare "type": "object" with no properties,
# string-valued schema nodes from malformed MCP servers, etc. This
# is a no-op for schemas that are already well-formed.
try:
from tools.schema_sanitizer import sanitize_tool_schemas
filtered_tools = sanitize_tool_schemas(filtered_tools)
except Exception as e: # pragma: no cover — defensive
logger.warning("Schema sanitization skipped: %s", e)
return filtered_tools
@ -418,6 +430,31 @@ def _coerce_value(value: str, expected_type):
return _coerce_number(value, integer_only=(expected_type == "integer"))
if expected_type == "boolean":
return _coerce_boolean(value)
if expected_type == "array":
return _coerce_json(value, list)
if expected_type == "object":
return _coerce_json(value, dict)
return value
def _coerce_json(value: str, expected_python_type: type):
"""Parse *value* as JSON when the schema expects an array or object.
Handles model output drift where a complex oneOf/discriminated-union schema
causes the LLM to emit the array/object as a JSON string instead of a native
structure. Returns the original string if parsing fails or yields the wrong
Python type.
"""
try:
parsed = json.loads(value)
except (ValueError, TypeError):
return value
if isinstance(parsed, expected_python_type):
logger.debug(
"coerce_tool_args: coerced string to %s via json.loads",
expected_python_type.__name__,
)
return parsed
return value
@ -427,9 +464,9 @@ def _coerce_number(value: str, integer_only: bool = False):
f = float(value)
except (ValueError, OverflowError):
return value
# Guard against inf/nan before int() conversion
# Guard against inf/nan — not JSON-serializable, keep original string
if f != f or f == float("inf") or f == float("-inf"):
return f
return value
# If it looks like an integer (no fractional part), return int
if f == int(f):
return int(f)

View file

@ -156,7 +156,7 @@
for entry in "''${ENTRIES[@]}"; do
IFS=":" read -r ATTR FOLDER NIX_FILE <<< "$entry"
echo "==> .#$ATTR ($FOLDER -> $NIX_FILE)"
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --print-build-logs 2>&1)
OUTPUT=$(nix build ".#$ATTR.npmDeps" --no-link --rebuild --print-build-logs 2>&1)
STATUS=$?
if [ "$STATUS" -eq 0 ]; then
echo " ok"

View file

@ -4,7 +4,7 @@ let
src = ../web;
npmDeps = pkgs.fetchNpmDeps {
inherit src;
hash = "sha256-TS/vrCHbdvXkPcAPxImKzAd2pdDCrKlgYZkXBMQ+TEg=";
hash = "sha256-4Z8KQ69QhO83X6zff+5urWBv6MME686MhTTMdwSl65o=";
};
npm = hermesNpmLib.mkNpmPassthru { folder = "web"; attr = "web"; pname = "hermes-web"; };

View file

@ -59,7 +59,8 @@ Config file: `~/.hermes/hindsight/config.json`
| Key | Default | Description |
|-----|---------|-------------|
| `bank_id` | `hermes` | Memory bank name |
| `bank_id` | `hermes` | Memory bank name (static fallback used when `bank_id_template` is unset or resolves empty) |
| `bank_id_template` | — | Optional template to derive the bank name dynamically. Placeholders: `{profile}`, `{workspace}`, `{platform}`, `{user}`, `{session}`. Example: `hermes-{profile}` isolates memory per active Hermes profile. Empty placeholders collapse cleanly (e.g. `hermes-{user}` with no user becomes `hermes`). |
| `bank_mission` | — | Reflect mission (identity/framing for reflect reasoning). Applied via Banks API. |
| `bank_retain_mission` | — | Retain mission (steers what gets extracted). Applied via Banks API. |

View file

@ -3,6 +3,8 @@
Long-term memory with knowledge graph, entity resolution, and multi-strategy
retrieval. Supports cloud (API key) and local modes.
Configurable timeout via HINDSIGHT_TIMEOUT env var or config.json.
Original PR #1811 by benfrank241, adapted to MemoryProvider ABC.
Config via environment variables:
@ -11,6 +13,7 @@ Config via environment variables:
HINDSIGHT_BUDGET recall budget: low/mid/high (default: mid)
HINDSIGHT_API_URL API endpoint
HINDSIGHT_MODE cloud or local (default: cloud)
HINDSIGHT_TIMEOUT API request timeout in seconds (default: 120)
HINDSIGHT_RETAIN_TAGS comma-separated tags attached to retained memories
HINDSIGHT_RETAIN_SOURCE metadata source value attached to retained memories
HINDSIGHT_RETAIN_USER_PREFIX label used before user turns in retained transcripts
@ -23,6 +26,7 @@ Or via $HERMES_HOME/hindsight/config.json (profile-scoped), falling back to
from __future__ import annotations
import asyncio
import importlib
import json
import logging
import os
@ -40,6 +44,7 @@ logger = logging.getLogger(__name__)
_DEFAULT_API_URL = "https://api.hindsight.vectorize.io"
_DEFAULT_LOCAL_URL = "http://localhost:8888"
_MIN_CLIENT_VERSION = "0.4.22"
_DEFAULT_TIMEOUT = 120 # seconds — cloud API can take 30-40s per request
_VALID_BUDGETS = {"low", "mid", "high"}
_PROVIDER_DEFAULT_MODELS = {
"openai": "gpt-4o-mini",
@ -54,6 +59,22 @@ _PROVIDER_DEFAULT_MODELS = {
}
def _check_local_runtime() -> tuple[bool, str | None]:
"""Return whether local embedded Hindsight imports cleanly.
On older CPUs, importing the local Hindsight stack can raise a runtime
error from NumPy before the daemon starts. Treat that as "unavailable"
so Hermes can degrade gracefully instead of repeatedly trying to start
a broken local memory backend.
"""
try:
importlib.import_module("hindsight")
importlib.import_module("hindsight_embed.daemon_embed_manager")
return True, None
except Exception as exc:
return False, str(exc)
# ---------------------------------------------------------------------------
# Dedicated event loop for Hindsight async calls (one per process, reused).
# Avoids creating ephemeral loops that leak aiohttp sessions.
@ -81,13 +102,18 @@ def _get_loop() -> asyncio.AbstractEventLoop:
return _loop
def _run_sync(coro, timeout: float = 120.0):
def _run_sync(coro, timeout: float = _DEFAULT_TIMEOUT):
"""Schedule *coro* on the shared loop and block until done."""
loop = _get_loop()
future = asyncio.run_coroutine_threadsafe(coro, loop)
return future.result(timeout=timeout)
# ---------------------------------------------------------------------------
# Backward-compatible alias — instances use self._run_sync() instead.
# ---------------------------------------------------------------------------
# ---------------------------------------------------------------------------
# Tool schemas
# ---------------------------------------------------------------------------
@ -233,6 +259,126 @@ def _utc_timestamp() -> str:
return datetime.now(timezone.utc).isoformat(timespec="milliseconds").replace("+00:00", "Z")
def _embedded_profile_name(config: dict[str, Any]) -> str:
"""Return the Hindsight embedded profile name for this Hermes config."""
profile = config.get("profile", "hermes")
return str(profile or "hermes")
def _load_simple_env(path) -> dict[str, str]:
"""Parse a simple KEY=VALUE env file, ignoring comments and blank lines."""
if not path.exists():
return {}
values: dict[str, str] = {}
for line in path.read_text(encoding="utf-8").splitlines():
if not line or line.startswith("#") or "=" not in line:
continue
key, value = line.split("=", 1)
values[key.strip()] = value.strip()
return values
def _build_embedded_profile_env(config: dict[str, Any], *, llm_api_key: str | None = None) -> dict[str, str]:
"""Build the profile-scoped env file that standalone hindsight-embed consumes."""
current_key = llm_api_key
if current_key is None:
current_key = (
config.get("llmApiKey")
or config.get("llm_api_key")
or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
)
current_provider = config.get("llm_provider", "")
current_model = config.get("llm_model", "")
current_base_url = config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
# The embedded daemon expects OpenAI wire format for these providers.
daemon_provider = "openai" if current_provider in ("openai_compatible", "openrouter") else current_provider
env_values = {
"HINDSIGHT_API_LLM_PROVIDER": str(daemon_provider),
"HINDSIGHT_API_LLM_API_KEY": str(current_key or ""),
"HINDSIGHT_API_LLM_MODEL": str(current_model),
"HINDSIGHT_API_LOG_LEVEL": "info",
}
if current_base_url:
env_values["HINDSIGHT_API_LLM_BASE_URL"] = str(current_base_url)
return env_values
def _embedded_profile_env_path(config: dict[str, Any]):
from pathlib import Path
return Path.home() / ".hindsight" / "profiles" / f"{_embedded_profile_name(config)}.env"
def _materialize_embedded_profile_env(config: dict[str, Any], *, llm_api_key: str | None = None):
"""Write the profile-scoped env file that standalone hindsight-embed uses."""
profile_env = _embedded_profile_env_path(config)
profile_env.parent.mkdir(parents=True, exist_ok=True)
env_values = _build_embedded_profile_env(config, llm_api_key=llm_api_key)
profile_env.write_text(
"".join(f"{key}={value}\n" for key, value in env_values.items()),
encoding="utf-8",
)
return profile_env
def _sanitize_bank_segment(value: str) -> str:
"""Sanitize a bank_id_template placeholder value.
Bank IDs should be safe for URL paths and filesystem use. Replaces any
character that isn't alphanumeric, dash, or underscore with a dash, and
collapses runs of dashes.
"""
if not value:
return ""
out = []
prev_dash = False
for ch in str(value):
if ch.isalnum() or ch == "-" or ch == "_":
out.append(ch)
prev_dash = False
else:
if not prev_dash:
out.append("-")
prev_dash = True
return "".join(out).strip("-_")
def _resolve_bank_id_template(template: str, fallback: str, **placeholders: str) -> str:
"""Resolve a bank_id template string with the given placeholders.
Supported placeholders (each is sanitized before substitution):
{profile} active Hermes profile name (from agent_identity)
{workspace} Hermes workspace name (from agent_workspace)
{platform} "cli", "telegram", "discord", etc.
{user} platform user id (gateway sessions)
{session} current session id
Missing/empty placeholders are rendered as the empty string and then
collapsed e.g. ``hermes-{user}`` with no user becomes ``hermes``.
If the template is empty, resolution falls back to *fallback*.
Returns the sanitized bank id.
"""
if not template:
return fallback
sanitized = {k: _sanitize_bank_segment(v) for k, v in placeholders.items()}
try:
rendered = template.format(**sanitized)
except (KeyError, IndexError) as exc:
logger.warning("Invalid bank_id_template %r: %s — using fallback %r",
template, exc, fallback)
return fallback
while "--" in rendered:
rendered = rendered.replace("--", "-")
while "__" in rendered:
rendered = rendered.replace("__", "_")
rendered = rendered.strip("-_")
return rendered or fallback
# ---------------------------------------------------------------------------
# MemoryProvider implementation
# ---------------------------------------------------------------------------
@ -262,13 +408,17 @@ class HindsightMemoryProvider(MemoryProvider):
self._chat_type = ""
self._thread_id = ""
self._agent_identity = ""
self._agent_workspace = ""
self._turn_index = 0
self._client = None
self._timeout = _DEFAULT_TIMEOUT
self._prefetch_result = ""
self._prefetch_lock = threading.Lock()
self._prefetch_thread = None
self._sync_thread = None
self._session_id = ""
self._parent_session_id = ""
self._document_id = ""
# Tags
self._tags: list[str] | None = None
@ -293,6 +443,7 @@ class HindsightMemoryProvider(MemoryProvider):
# Bank
self._bank_mission = ""
self._bank_retain_mission: str | None = None
self._bank_id_template = ""
@property
def name(self) -> str:
@ -302,9 +453,16 @@ class HindsightMemoryProvider(MemoryProvider):
try:
cfg = _load_config()
mode = cfg.get("mode", "cloud")
if mode in ("local", "local_embedded", "local_external"):
if mode in ("local", "local_embedded"):
available, _ = _check_local_runtime()
return available
if mode == "local_external":
return True
has_key = bool(cfg.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", ""))
has_key = bool(
cfg.get("apiKey")
or cfg.get("api_key")
or os.environ.get("HINDSIGHT_API_KEY", "")
)
has_url = bool(cfg.get("api_url") or os.environ.get("HINDSIGHT_API_URL", ""))
return has_key or has_url
except Exception:
@ -363,7 +521,7 @@ class HindsightMemoryProvider(MemoryProvider):
else:
deps_to_install = [cloud_dep]
print(f"\n Checking dependencies...")
print("\n Checking dependencies...")
uv_path = shutil.which("uv")
if not uv_path:
print(" ⚠ uv not found — install it: curl -LsSf https://astral.sh/uv/install.sh | sh")
@ -374,14 +532,14 @@ class HindsightMemoryProvider(MemoryProvider):
[uv_path, "pip", "install", "--python", sys.executable, "--quiet", "--upgrade"] + deps_to_install,
check=True, timeout=120, capture_output=True,
)
print(f" ✓ Dependencies up to date")
print(" ✓ Dependencies up to date")
except Exception as e:
print(f" ⚠ Install failed: {e}")
print(f" Run manually: uv pip install --python {sys.executable} {' '.join(deps_to_install)}")
# Step 3: Mode-specific config
if mode == "cloud":
print(f"\n Get your API key at https://ui.hindsight.vectorize.io\n")
print("\n Get your API key at https://ui.hindsight.vectorize.io\n")
existing_key = os.environ.get("HINDSIGHT_API_KEY", "")
if existing_key:
masked = f"...{existing_key[-4:]}" if len(existing_key) > 4 else "set"
@ -434,13 +592,19 @@ class HindsightMemoryProvider(MemoryProvider):
sys.stdout.write(" LLM API key: ")
sys.stdout.flush()
llm_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
if llm_key:
env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
# Always write explicitly (including empty) so the provider sees ""
# rather than a missing variable. The daemon reads from .env at
# startup and fails when HINDSIGHT_LLM_API_KEY is unset.
env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
# Step 4: Save everything
provider_config["bank_id"] = "hermes"
provider_config["recall_budget"] = "mid"
bank_id = "hermes"
# Read existing timeout from config if present, otherwise use default
existing_timeout = self._config.get("timeout") if self._config else None
timeout_val = existing_timeout if existing_timeout else _DEFAULT_TIMEOUT
provider_config["timeout"] = timeout_val
env_writes["HINDSIGHT_TIMEOUT"] = str(timeout_val)
config["memory"]["provider"] = "hindsight"
save_config(config)
@ -466,10 +630,32 @@ class HindsightMemoryProvider(MemoryProvider):
new_lines.append(f"{k}={v}")
env_path.write_text("\n".join(new_lines) + "\n")
if mode == "local_embedded":
materialized_config = dict(provider_config)
config_path = Path(hermes_home) / "hindsight" / "config.json"
try:
materialized_config = json.loads(config_path.read_text(encoding="utf-8"))
except Exception:
pass
llm_api_key = env_writes.get("HINDSIGHT_LLM_API_KEY", "")
if not llm_api_key:
llm_api_key = _load_simple_env(Path(hermes_home) / ".env").get("HINDSIGHT_LLM_API_KEY", "")
if not llm_api_key:
llm_api_key = _load_simple_env(_embedded_profile_env_path(materialized_config)).get(
"HINDSIGHT_API_LLM_API_KEY",
"",
)
_materialize_embedded_profile_env(
materialized_config,
llm_api_key=llm_api_key or None,
)
print(f"\n ✓ Hindsight memory configured ({mode} mode)")
if env_writes:
print(f" API keys saved to .env")
print(f"\n Start a new session to activate.\n")
print(" API keys saved to .env")
print("\n Start a new session to activate.\n")
def get_config_schema(self):
return [
@ -485,7 +671,8 @@ class HindsightMemoryProvider(MemoryProvider):
{"key": "llm_base_url", "description": "Endpoint URL (e.g. http://192.168.1.10:8080/v1)", "default": "", "when": {"mode": "local_embedded", "llm_provider": "openai_compatible"}},
{"key": "llm_api_key", "description": "LLM API key (optional for openai_compatible)", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY", "when": {"mode": "local_embedded"}},
{"key": "llm_model", "description": "LLM model", "default": "gpt-4o-mini", "default_from": {"field": "llm_provider", "map": _PROVIDER_DEFAULT_MODELS}, "when": {"mode": "local_embedded"}},
{"key": "bank_id", "description": "Memory bank name", "default": "hermes"},
{"key": "bank_id", "description": "Memory bank name (static fallback when bank_id_template is unset)", "default": "hermes"},
{"key": "bank_id_template", "description": "Optional template to derive bank_id dynamically. Placeholders: {profile}, {workspace}, {platform}, {user}, {session}. Example: hermes-{profile}", "default": ""},
{"key": "bank_mission", "description": "Mission/purpose description for the memory bank"},
{"key": "bank_retain_mission", "description": "Custom extraction prompt for memory retention"},
{"key": "recall_budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
@ -505,12 +692,19 @@ class HindsightMemoryProvider(MemoryProvider):
{"key": "recall_max_tokens", "description": "Maximum tokens for recall results", "default": 4096},
{"key": "recall_max_input_chars", "description": "Maximum input query length for auto-recall", "default": 800},
{"key": "recall_prompt_preamble", "description": "Custom preamble for recalled memories in context"},
{"key": "timeout", "description": "API request timeout in seconds", "default": _DEFAULT_TIMEOUT},
]
def _get_client(self):
"""Return the cached Hindsight client (created once, reused)."""
if self._client is None:
if self._mode == "local_embedded":
available, reason = _check_local_runtime()
if not available:
raise RuntimeError(
"Hindsight local runtime is unavailable"
+ (f": {reason}" if reason else "")
)
from hindsight import HindsightEmbedded
HindsightEmbedded.__del__ = lambda self: None
llm_provider = self._config.get("llm_provider", "")
@ -529,16 +723,30 @@ class HindsightMemoryProvider(MemoryProvider):
self._client = HindsightEmbedded(**kwargs)
else:
from hindsight_client import Hindsight
kwargs = {"base_url": self._api_url, "timeout": 30.0}
timeout = self._timeout or _DEFAULT_TIMEOUT
kwargs = {"base_url": self._api_url, "timeout": float(timeout)}
if self._api_key:
kwargs["api_key"] = self._api_key
logger.debug("Creating Hindsight cloud client (url=%s, has_key=%s)",
self._api_url, bool(self._api_key))
logger.debug("Creating Hindsight cloud client (url=%s, has_key=%s, timeout=%s)",
self._api_url, bool(self._api_key), kwargs["timeout"])
self._client = Hindsight(**kwargs)
return self._client
def _run_sync(self, coro):
"""Schedule *coro* on the shared loop using the configured timeout."""
return _run_sync(coro, timeout=self._timeout)
def initialize(self, session_id: str, **kwargs) -> None:
self._session_id = str(session_id or "").strip()
self._parent_session_id = str(kwargs.get("parent_session_id", "") or "").strip()
# Each process lifecycle gets its own document_id. Reusing session_id
# alone caused overwrites on /resume — the reloaded session starts
# with an empty _session_turns, so the next retain would replace the
# previously stored content. session_id stays in tags so processes
# for the same session remain filterable together.
start_ts = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
self._document_id = f"{self._session_id}-{start_ts}"
# Check client version and auto-upgrade if needed
try:
@ -548,7 +756,9 @@ class HindsightMemoryProvider(MemoryProvider):
if Version(installed) < Version(_MIN_CLIENT_VERSION):
logger.warning("hindsight-client %s is outdated (need >=%s), attempting upgrade...",
installed, _MIN_CLIENT_VERSION)
import shutil, subprocess, sys
import shutil
import subprocess
import sys
uv_path = shutil.which("uv")
if uv_path:
try:
@ -575,19 +785,41 @@ class HindsightMemoryProvider(MemoryProvider):
self._chat_type = str(kwargs.get("chat_type") or "").strip()
self._thread_id = str(kwargs.get("thread_id") or "").strip()
self._agent_identity = str(kwargs.get("agent_identity") or "").strip()
self._agent_workspace = str(kwargs.get("agent_workspace") or "").strip()
self._turn_index = 0
self._session_turns = []
self._mode = self._config.get("mode", "cloud")
# Read timeout from config or env var, fall back to default
self._timeout = self._config.get("timeout") or int(os.environ.get("HINDSIGHT_TIMEOUT", str(_DEFAULT_TIMEOUT)))
# "local" is a legacy alias for "local_embedded"
if self._mode == "local":
self._mode = "local_embedded"
if self._mode == "local_embedded":
available, reason = _check_local_runtime()
if not available:
logger.warning(
"Hindsight local mode disabled because its runtime could not be imported: %s",
reason,
)
self._mode = "disabled"
return
self._api_key = self._config.get("apiKey") or self._config.get("api_key") or os.environ.get("HINDSIGHT_API_KEY", "")
default_url = _DEFAULT_LOCAL_URL if self._mode in ("local_embedded", "local_external") else _DEFAULT_API_URL
self._api_url = self._config.get("api_url") or os.environ.get("HINDSIGHT_API_URL", default_url)
self._llm_base_url = self._config.get("llm_base_url", "")
banks = self._config.get("banks", {}).get("hermes", {})
self._bank_id = self._config.get("bank_id") or banks.get("bankId", "hermes")
static_bank_id = self._config.get("bank_id") or banks.get("bankId", "hermes")
self._bank_id_template = self._config.get("bank_id_template", "") or ""
self._bank_id = _resolve_bank_id_template(
self._bank_id_template,
fallback=static_bank_id,
profile=self._agent_identity,
workspace=self._agent_workspace,
platform=self._platform,
user=self._user_id,
session=self._session_id,
)
budget = self._config.get("recall_budget") or self._config.get("budget") or banks.get("budget", "mid")
self._budget = budget if budget in _VALID_BUDGETS else "mid"
@ -640,6 +872,10 @@ class HindsightMemoryProvider(MemoryProvider):
pass
logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s, client=%s",
self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method, _client_version)
if self._bank_id_template:
logger.debug("Hindsight bank resolved from template %r: profile=%s workspace=%s platform=%s user=%s -> bank=%s",
self._bank_id_template, self._agent_identity, self._agent_workspace,
self._platform, self._user_id, self._bank_id)
logger.debug("Hindsight config: auto_retain=%s, auto_recall=%s, retain_every_n=%d, "
"retain_async=%s, retain_context=%s, recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
self._auto_retain, self._auto_recall, self._retain_every_n_turns,
@ -669,42 +905,13 @@ class HindsightMemoryProvider(MemoryProvider):
# Update the profile .env to match our current config so
# the daemon always starts with the right settings.
# If the config changed and the daemon is running, stop it.
from pathlib import Path as _Path
profile_env = _Path.home() / ".hindsight" / "profiles" / f"{profile}.env"
current_key = self._config.get("llm_api_key") or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
current_provider = self._config.get("llm_provider", "")
current_model = self._config.get("llm_model", "")
current_base_url = self._config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
# Map openai_compatible/openrouter → openai for the daemon (OpenAI wire format)
daemon_provider = "openai" if current_provider in ("openai_compatible", "openrouter") else current_provider
# Read saved profile config
saved = {}
if profile_env.exists():
for line in profile_env.read_text().splitlines():
if "=" in line and not line.startswith("#"):
k, v = line.split("=", 1)
saved[k.strip()] = v.strip()
config_changed = (
saved.get("HINDSIGHT_API_LLM_PROVIDER") != daemon_provider or
saved.get("HINDSIGHT_API_LLM_MODEL") != current_model or
saved.get("HINDSIGHT_API_LLM_API_KEY") != current_key or
saved.get("HINDSIGHT_API_LLM_BASE_URL", "") != current_base_url
)
profile_env = _embedded_profile_env_path(self._config)
expected_env = _build_embedded_profile_env(self._config)
saved = _load_simple_env(profile_env)
config_changed = saved != expected_env
if config_changed:
# Write updated profile .env
profile_env.parent.mkdir(parents=True, exist_ok=True)
env_lines = (
f"HINDSIGHT_API_LLM_PROVIDER={daemon_provider}\n"
f"HINDSIGHT_API_LLM_API_KEY={current_key}\n"
f"HINDSIGHT_API_LLM_MODEL={current_model}\n"
f"HINDSIGHT_API_LOG_LEVEL=info\n"
)
if current_base_url:
env_lines += f"HINDSIGHT_API_LLM_BASE_URL={current_base_url}\n"
profile_env.write_text(env_lines)
profile_env = _materialize_embedded_profile_env(self._config)
if client._manager.is_running(profile):
with open(log_path, "a") as f:
f.write("\n=== Config changed, restarting daemon ===\n")
@ -777,7 +984,7 @@ class HindsightMemoryProvider(MemoryProvider):
client = self._get_client()
if self._prefetch_method == "reflect":
logger.debug("Prefetch: calling reflect (bank=%s, query_len=%d)", self._bank_id, len(query))
resp = _run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
resp = self._run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
text = resp.text or ""
else:
recall_kwargs: dict = {
@ -791,7 +998,7 @@ class HindsightMemoryProvider(MemoryProvider):
recall_kwargs["types"] = self._recall_types
logger.debug("Prefetch: calling recall (bank=%s, query_len=%d, budget=%s)",
self._bank_id, len(query), self._budget)
resp = _run_sync(client.arecall(**recall_kwargs))
resp = self._run_sync(client.arecall(**recall_kwargs))
num_results = len(resp.results) if resp.results else 0
logger.debug("Prefetch: recall returned %d results", num_results)
text = "\n".join(f"- {r.text}" for r in resp.results if r.text) if resp.results else ""
@ -888,7 +1095,7 @@ class HindsightMemoryProvider(MemoryProvider):
if session_id:
self._session_id = str(session_id).strip()
turn = json.dumps(self._build_turn_messages(user_content, assistant_content))
turn = json.dumps(self._build_turn_messages(user_content, assistant_content), ensure_ascii=False)
self._session_turns.append(turn)
self._turn_counter += 1
self._turn_index = self._turn_counter
@ -902,6 +1109,12 @@ class HindsightMemoryProvider(MemoryProvider):
len(self._session_turns), sum(len(t) for t in self._session_turns))
content = "[" + ",".join(self._session_turns) + "]"
lineage_tags: list[str] = []
if self._session_id:
lineage_tags.append(f"session:{self._session_id}")
if self._parent_session_id:
lineage_tags.append(f"parent:{self._parent_session_id}")
def _sync():
try:
client = self._get_client()
@ -912,15 +1125,16 @@ class HindsightMemoryProvider(MemoryProvider):
message_count=len(self._session_turns) * 2,
turn_index=self._turn_index,
),
tags=lineage_tags or None,
)
item.pop("bank_id", None)
item.pop("retain_async", None)
logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
self._bank_id, self._session_id, self._retain_async, len(content), len(self._session_turns))
_run_sync(client.aretain_batch(
self._bank_id, self._document_id, self._retain_async, len(content), len(self._session_turns))
self._run_sync(client.aretain_batch(
bank_id=self._bank_id,
items=[item],
document_id=self._session_id,
document_id=self._document_id,
retain_async=self._retain_async,
))
logger.debug("Hindsight retain succeeded")
@ -957,7 +1171,7 @@ class HindsightMemoryProvider(MemoryProvider):
)
logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
self._bank_id, len(content), context)
_run_sync(client.aretain(**retain_kwargs))
self._run_sync(client.aretain(**retain_kwargs))
logger.debug("Tool hindsight_retain: success")
return json.dumps({"result": "Memory stored successfully."})
except Exception as e:
@ -980,7 +1194,7 @@ class HindsightMemoryProvider(MemoryProvider):
recall_kwargs["types"] = self._recall_types
logger.debug("Tool hindsight_recall: bank=%s, query_len=%d, budget=%s",
self._bank_id, len(query), self._budget)
resp = _run_sync(client.arecall(**recall_kwargs))
resp = self._run_sync(client.arecall(**recall_kwargs))
num_results = len(resp.results) if resp.results else 0
logger.debug("Tool hindsight_recall: %d results", num_results)
if not resp.results:
@ -998,7 +1212,7 @@ class HindsightMemoryProvider(MemoryProvider):
try:
logger.debug("Tool hindsight_reflect: bank=%s, query_len=%d, budget=%s",
self._bank_id, len(query), self._budget)
resp = _run_sync(client.areflect(
resp = self._run_sync(client.areflect(
bank_id=self._bank_id, query=query, budget=self._budget
))
logger.debug("Tool hindsight_reflect: response_len=%d", len(resp.text or ""))
@ -1011,7 +1225,6 @@ class HindsightMemoryProvider(MemoryProvider):
def shutdown(self) -> None:
logger.debug("Hindsight shutdown: waiting for background threads")
global _loop, _loop_thread
for t in (self._prefetch_thread, self._sync_thread):
if t and t.is_alive():
t.join(timeout=5.0)
@ -1026,17 +1239,21 @@ class HindsightMemoryProvider(MemoryProvider):
except RuntimeError:
pass
else:
_run_sync(self._client.aclose())
self._run_sync(self._client.aclose())
except Exception:
pass
self._client = None
# Stop the background event loop so no tasks are pending at exit
if _loop is not None and _loop.is_running():
_loop.call_soon_threadsafe(_loop.stop)
if _loop_thread is not None:
_loop_thread.join(timeout=5.0)
_loop = None
_loop_thread = None
# The module-global background event loop (_loop / _loop_thread)
# is intentionally NOT stopped here. It is shared across every
# HindsightMemoryProvider instance in the process — the plugin
# loader creates a new provider per AIAgent, and the gateway
# creates one AIAgent per concurrent chat session. Stopping the
# loop from one provider's shutdown() strands the aiohttp
# ClientSession + TCPConnector owned by every sibling provider
# on a dead loop, which surfaces as the "Unclosed client session"
# / "Unclosed connector" warnings reported in #11923. The loop
# runs on a daemon thread and is reclaimed on process exit;
# per-session cleanup happens via self._client.aclose() above.
def register(ctx) -> None:

View file

@ -0,0 +1,66 @@
"""Spotify integration plugin — bundled, auto-loaded.
Registers 7 tools (playback, devices, queue, search, playlists, albums,
library) into the ``spotify`` toolset. Each tool's handler is gated by
``_check_spotify_available()`` when the user has not run ``hermes auth
spotify``, the tools remain registered (so they appear in ``hermes
tools``) but the runtime check prevents dispatch.
Why a plugin instead of a top-level ``tools/`` file?
- ``plugins/`` is where third-party service integrations live (see
``plugins/image_gen/`` for the backend-provider pattern, ``plugins/
disk-cleanup/`` for the standalone pattern). ``tools/`` is reserved
for foundational capabilities (terminal, read_file, web_search, etc.).
- Mirroring the image_gen plugin layout (``plugins/<category>/<backend>/``
for categories, flat ``plugins/<name>/`` for standalones) makes new
service integrations a pattern contributors can copy.
- Bundled + ``kind: backend`` auto-loads on startup just like image_gen
backends no user opt-in needed, no ``plugins.enabled`` config.
The Spotify auth flow (``hermes auth spotify``), CLI plumbing, and docs
are unchanged. This move is purely structural.
"""
from __future__ import annotations
from plugins.spotify.tools import (
SPOTIFY_ALBUMS_SCHEMA,
SPOTIFY_DEVICES_SCHEMA,
SPOTIFY_LIBRARY_SCHEMA,
SPOTIFY_PLAYBACK_SCHEMA,
SPOTIFY_PLAYLISTS_SCHEMA,
SPOTIFY_QUEUE_SCHEMA,
SPOTIFY_SEARCH_SCHEMA,
_check_spotify_available,
_handle_spotify_albums,
_handle_spotify_devices,
_handle_spotify_library,
_handle_spotify_playback,
_handle_spotify_playlists,
_handle_spotify_queue,
_handle_spotify_search,
)
_TOOLS = (
("spotify_playback", SPOTIFY_PLAYBACK_SCHEMA, _handle_spotify_playback, "🎵"),
("spotify_devices", SPOTIFY_DEVICES_SCHEMA, _handle_spotify_devices, "🔈"),
("spotify_queue", SPOTIFY_QUEUE_SCHEMA, _handle_spotify_queue, "📻"),
("spotify_search", SPOTIFY_SEARCH_SCHEMA, _handle_spotify_search, "🔎"),
("spotify_playlists", SPOTIFY_PLAYLISTS_SCHEMA, _handle_spotify_playlists, "📚"),
("spotify_albums", SPOTIFY_ALBUMS_SCHEMA, _handle_spotify_albums, "💿"),
("spotify_library", SPOTIFY_LIBRARY_SCHEMA, _handle_spotify_library, "❤️"),
)
def register(ctx) -> None:
"""Register all Spotify tools. Called once by the plugin loader."""
for name, schema, handler, emoji in _TOOLS:
ctx.register_tool(
name=name,
toolset="spotify",
schema=schema,
handler=handler,
check_fn=_check_spotify_available,
emoji=emoji,
)

435
plugins/spotify/client.py Normal file
View file

@ -0,0 +1,435 @@
"""Thin Spotify Web API helper used by Hermes native tools."""
from __future__ import annotations
import json
from typing import Any, Dict, Iterable, Optional
from urllib.parse import urlparse
import httpx
from hermes_cli.auth import (
AuthError,
resolve_spotify_runtime_credentials,
)
class SpotifyError(RuntimeError):
"""Base Spotify tool error."""
class SpotifyAuthRequiredError(SpotifyError):
"""Raised when the user needs to authenticate with Spotify first."""
class SpotifyAPIError(SpotifyError):
"""Structured Spotify API failure."""
def __init__(
self,
message: str,
*,
status_code: Optional[int] = None,
response_body: Optional[str] = None,
) -> None:
super().__init__(message)
self.status_code = status_code
self.response_body = response_body
self.path = None
class SpotifyClient:
def __init__(self) -> None:
self._runtime = self._resolve_runtime(refresh_if_expiring=True)
def _resolve_runtime(self, *, force_refresh: bool = False, refresh_if_expiring: bool = True) -> Dict[str, Any]:
try:
return resolve_spotify_runtime_credentials(
force_refresh=force_refresh,
refresh_if_expiring=refresh_if_expiring,
)
except AuthError as exc:
raise SpotifyAuthRequiredError(str(exc)) from exc
@property
def base_url(self) -> str:
return str(self._runtime.get("base_url") or "").rstrip("/")
def _headers(self) -> Dict[str, str]:
return {
"Authorization": f"Bearer {self._runtime['access_token']}",
"Content-Type": "application/json",
}
def request(
self,
method: str,
path: str,
*,
params: Optional[Dict[str, Any]] = None,
json_body: Optional[Dict[str, Any]] = None,
allow_retry_on_401: bool = True,
empty_response: Optional[Dict[str, Any]] = None,
) -> Any:
url = f"{self.base_url}{path}"
response = httpx.request(
method,
url,
headers=self._headers(),
params=_strip_none(params),
json=_strip_none(json_body) if json_body is not None else None,
timeout=30.0,
)
if response.status_code == 401 and allow_retry_on_401:
self._runtime = self._resolve_runtime(force_refresh=True, refresh_if_expiring=True)
return self.request(
method,
path,
params=params,
json_body=json_body,
allow_retry_on_401=False,
)
if response.status_code >= 400:
self._raise_api_error(response, method=method, path=path)
if response.status_code == 204 or not response.content:
return empty_response or {"success": True, "status_code": response.status_code, "empty": True}
if "application/json" in response.headers.get("content-type", ""):
return response.json()
return {"success": True, "text": response.text}
def _raise_api_error(self, response: httpx.Response, *, method: str, path: str) -> None:
detail = response.text.strip()
message = _friendly_spotify_error_message(
status_code=response.status_code,
detail=_extract_spotify_error_detail(response, fallback=detail),
method=method,
path=path,
retry_after=response.headers.get("Retry-After"),
)
error = SpotifyAPIError(message, status_code=response.status_code, response_body=detail)
error.path = path
raise error
def get_devices(self) -> Any:
return self.request("GET", "/me/player/devices")
def transfer_playback(self, *, device_id: str, play: bool = False) -> Any:
return self.request("PUT", "/me/player", json_body={
"device_ids": [device_id],
"play": play,
})
def get_playback_state(self, *, market: Optional[str] = None) -> Any:
return self.request(
"GET",
"/me/player",
params={"market": market},
empty_response={
"status_code": 204,
"empty": True,
"message": "No active Spotify playback session was found. Open Spotify on a device and start playback, or transfer playback to an available device.",
},
)
def get_currently_playing(self, *, market: Optional[str] = None) -> Any:
return self.request(
"GET",
"/me/player/currently-playing",
params={"market": market},
empty_response={
"status_code": 204,
"empty": True,
"message": "Spotify is not currently playing anything. Start playback in Spotify and try again.",
},
)
def start_playback(
self,
*,
device_id: Optional[str] = None,
context_uri: Optional[str] = None,
uris: Optional[list[str]] = None,
offset: Optional[Dict[str, Any]] = None,
position_ms: Optional[int] = None,
) -> Any:
return self.request(
"PUT",
"/me/player/play",
params={"device_id": device_id},
json_body={
"context_uri": context_uri,
"uris": uris,
"offset": offset,
"position_ms": position_ms,
},
)
def pause_playback(self, *, device_id: Optional[str] = None) -> Any:
return self.request("PUT", "/me/player/pause", params={"device_id": device_id})
def skip_next(self, *, device_id: Optional[str] = None) -> Any:
return self.request("POST", "/me/player/next", params={"device_id": device_id})
def skip_previous(self, *, device_id: Optional[str] = None) -> Any:
return self.request("POST", "/me/player/previous", params={"device_id": device_id})
def seek(self, *, position_ms: int, device_id: Optional[str] = None) -> Any:
return self.request("PUT", "/me/player/seek", params={
"position_ms": position_ms,
"device_id": device_id,
})
def set_repeat(self, *, state: str, device_id: Optional[str] = None) -> Any:
return self.request("PUT", "/me/player/repeat", params={"state": state, "device_id": device_id})
def set_shuffle(self, *, state: bool, device_id: Optional[str] = None) -> Any:
return self.request("PUT", "/me/player/shuffle", params={"state": str(bool(state)).lower(), "device_id": device_id})
def set_volume(self, *, volume_percent: int, device_id: Optional[str] = None) -> Any:
return self.request("PUT", "/me/player/volume", params={
"volume_percent": volume_percent,
"device_id": device_id,
})
def get_queue(self) -> Any:
return self.request("GET", "/me/player/queue")
def add_to_queue(self, *, uri: str, device_id: Optional[str] = None) -> Any:
return self.request("POST", "/me/player/queue", params={"uri": uri, "device_id": device_id})
def search(
self,
*,
query: str,
search_types: list[str],
limit: int = 10,
offset: int = 0,
market: Optional[str] = None,
include_external: Optional[str] = None,
) -> Any:
return self.request("GET", "/search", params={
"q": query,
"type": ",".join(search_types),
"limit": limit,
"offset": offset,
"market": market,
"include_external": include_external,
})
def get_my_playlists(self, *, limit: int = 20, offset: int = 0) -> Any:
return self.request("GET", "/me/playlists", params={"limit": limit, "offset": offset})
def get_playlist(self, *, playlist_id: str, market: Optional[str] = None) -> Any:
return self.request("GET", f"/playlists/{playlist_id}", params={"market": market})
def create_playlist(
self,
*,
name: str,
public: bool = False,
collaborative: bool = False,
description: Optional[str] = None,
) -> Any:
return self.request("POST", "/me/playlists", json_body={
"name": name,
"public": public,
"collaborative": collaborative,
"description": description,
})
def add_playlist_items(
self,
*,
playlist_id: str,
uris: list[str],
position: Optional[int] = None,
) -> Any:
return self.request("POST", f"/playlists/{playlist_id}/items", json_body={
"uris": uris,
"position": position,
})
def remove_playlist_items(
self,
*,
playlist_id: str,
uris: list[str],
snapshot_id: Optional[str] = None,
) -> Any:
return self.request("DELETE", f"/playlists/{playlist_id}/items", json_body={
"items": [{"uri": uri} for uri in uris],
"snapshot_id": snapshot_id,
})
def update_playlist_details(
self,
*,
playlist_id: str,
name: Optional[str] = None,
public: Optional[bool] = None,
collaborative: Optional[bool] = None,
description: Optional[str] = None,
) -> Any:
return self.request("PUT", f"/playlists/{playlist_id}", json_body={
"name": name,
"public": public,
"collaborative": collaborative,
"description": description,
})
def get_album(self, *, album_id: str, market: Optional[str] = None) -> Any:
return self.request("GET", f"/albums/{album_id}", params={"market": market})
def get_album_tracks(self, *, album_id: str, limit: int = 20, offset: int = 0, market: Optional[str] = None) -> Any:
return self.request("GET", f"/albums/{album_id}/tracks", params={
"limit": limit,
"offset": offset,
"market": market,
})
def get_saved_tracks(self, *, limit: int = 20, offset: int = 0, market: Optional[str] = None) -> Any:
return self.request("GET", "/me/tracks", params={"limit": limit, "offset": offset, "market": market})
def save_library_items(self, *, uris: list[str]) -> Any:
return self.request("PUT", "/me/library", params={"uris": ",".join(uris)})
def library_contains(self, *, uris: list[str]) -> Any:
return self.request("GET", "/me/library/contains", params={"uris": ",".join(uris)})
def get_saved_albums(self, *, limit: int = 20, offset: int = 0, market: Optional[str] = None) -> Any:
return self.request("GET", "/me/albums", params={"limit": limit, "offset": offset, "market": market})
def remove_saved_tracks(self, *, track_ids: list[str]) -> Any:
uris = [f"spotify:track:{track_id}" for track_id in track_ids]
return self.request("DELETE", "/me/library", params={"uris": ",".join(uris)})
def remove_saved_albums(self, *, album_ids: list[str]) -> Any:
uris = [f"spotify:album:{album_id}" for album_id in album_ids]
return self.request("DELETE", "/me/library", params={"uris": ",".join(uris)})
def get_recently_played(
self,
*,
limit: int = 20,
after: Optional[int] = None,
before: Optional[int] = None,
) -> Any:
return self.request("GET", "/me/player/recently-played", params={
"limit": limit,
"after": after,
"before": before,
})
def _extract_spotify_error_detail(response: httpx.Response, *, fallback: str) -> str:
detail = fallback
try:
payload = response.json()
if isinstance(payload, dict):
error_obj = payload.get("error")
if isinstance(error_obj, dict):
detail = str(error_obj.get("message") or detail)
elif isinstance(error_obj, str):
detail = error_obj
except Exception:
pass
return detail.strip()
def _friendly_spotify_error_message(
*,
status_code: int,
detail: str,
method: str,
path: str,
retry_after: Optional[str],
) -> str:
normalized_detail = detail.lower()
is_playback_path = path.startswith("/me/player")
if status_code == 401:
return "Spotify authentication failed or expired. Run `hermes auth spotify` again."
if status_code == 403:
if is_playback_path:
return (
"Spotify rejected this playback request. Playback control usually requires a Spotify Premium account "
"and an active Spotify Connect device."
)
if "scope" in normalized_detail or "permission" in normalized_detail:
return "Spotify rejected the request because the current auth scope is insufficient. Re-run `hermes auth spotify` to refresh permissions."
return "Spotify rejected the request. The account may not have permission for this action."
if status_code == 404:
if is_playback_path:
return "Spotify could not find an active playback device or player session for this request."
return "Spotify resource not found."
if status_code == 429:
message = "Spotify rate limit exceeded."
if retry_after:
message += f" Retry after {retry_after} seconds."
return message
if detail:
return detail
return f"Spotify API request failed with status {status_code}."
def _strip_none(payload: Optional[Dict[str, Any]]) -> Dict[str, Any]:
if not payload:
return {}
return {key: value for key, value in payload.items() if value is not None}
def normalize_spotify_id(value: str, expected_type: Optional[str] = None) -> str:
cleaned = (value or "").strip()
if not cleaned:
raise SpotifyError("Spotify id/uri/url is required.")
if cleaned.startswith("spotify:"):
parts = cleaned.split(":")
if len(parts) >= 3:
item_type = parts[1]
if expected_type and item_type != expected_type:
raise SpotifyError(f"Expected a Spotify {expected_type}, got {item_type}.")
return parts[2]
if "open.spotify.com" in cleaned:
parsed = urlparse(cleaned)
path_parts = [part for part in parsed.path.split("/") if part]
if len(path_parts) >= 2:
item_type, item_id = path_parts[0], path_parts[1]
if expected_type and item_type != expected_type:
raise SpotifyError(f"Expected a Spotify {expected_type}, got {item_type}.")
return item_id
return cleaned
def normalize_spotify_uri(value: str, expected_type: Optional[str] = None) -> str:
cleaned = (value or "").strip()
if not cleaned:
raise SpotifyError("Spotify URI/url/id is required.")
if cleaned.startswith("spotify:"):
if expected_type:
parts = cleaned.split(":")
if len(parts) >= 3 and parts[1] != expected_type:
raise SpotifyError(f"Expected a Spotify {expected_type}, got {parts[1]}.")
return cleaned
item_id = normalize_spotify_id(cleaned, expected_type)
if expected_type:
return f"spotify:{expected_type}:{item_id}"
return cleaned
def normalize_spotify_uris(values: Iterable[str], expected_type: Optional[str] = None) -> list[str]:
uris: list[str] = []
for value in values:
uri = normalize_spotify_uri(str(value), expected_type)
if uri not in uris:
uris.append(uri)
if not uris:
raise SpotifyError("At least one Spotify item is required.")
return uris
def compact_json(data: Any) -> str:
return json.dumps(data, ensure_ascii=False)

View file

@ -0,0 +1,13 @@
name: spotify
version: 1.0.0
description: "Native Spotify integration — 7 tools (playback, devices, queue, search, playlists, albums, library) using Spotify Web API + PKCE OAuth. Auth via `hermes auth spotify`. Tools gate on `providers.spotify` in ~/.hermes/auth.json."
author: NousResearch
kind: backend
provides_tools:
- spotify_playback
- spotify_devices
- spotify_queue
- spotify_search
- spotify_playlists
- spotify_albums
- spotify_library

454
plugins/spotify/tools.py Normal file
View file

@ -0,0 +1,454 @@
"""Native Spotify tools for Hermes (registered via plugins/spotify)."""
from __future__ import annotations
from typing import Any, Dict, List
from hermes_cli.auth import get_auth_status
from plugins.spotify.client import (
SpotifyAPIError,
SpotifyAuthRequiredError,
SpotifyClient,
SpotifyError,
normalize_spotify_id,
normalize_spotify_uri,
normalize_spotify_uris,
)
from tools.registry import tool_error, tool_result
def _check_spotify_available() -> bool:
try:
return bool(get_auth_status("spotify").get("logged_in"))
except Exception:
return False
def _spotify_client() -> SpotifyClient:
return SpotifyClient()
def _spotify_tool_error(exc: Exception) -> str:
if isinstance(exc, (SpotifyError, SpotifyAuthRequiredError)):
return tool_error(str(exc))
if isinstance(exc, SpotifyAPIError):
return tool_error(str(exc), status_code=exc.status_code)
return tool_error(f"Spotify tool failed: {type(exc).__name__}: {exc}")
def _coerce_limit(raw: Any, *, default: int = 20, minimum: int = 1, maximum: int = 50) -> int:
try:
value = int(raw)
except Exception:
value = default
return max(minimum, min(maximum, value))
def _coerce_bool(raw: Any, default: bool = False) -> bool:
if isinstance(raw, bool):
return raw
if isinstance(raw, str):
cleaned = raw.strip().lower()
if cleaned in {"1", "true", "yes", "on"}:
return True
if cleaned in {"0", "false", "no", "off"}:
return False
return default
def _as_list(raw: Any) -> List[str]:
if raw is None:
return []
if isinstance(raw, list):
return [str(item).strip() for item in raw if str(item).strip()]
return [str(raw).strip()] if str(raw).strip() else []
def _describe_empty_playback(payload: Any, *, action: str) -> dict | None:
if not isinstance(payload, dict) or not payload.get("empty"):
return None
if action == "get_currently_playing":
return {
"success": True,
"action": action,
"is_playing": False,
"status_code": payload.get("status_code", 204),
"message": payload.get("message") or "Spotify is not currently playing anything.",
}
if action == "get_state":
return {
"success": True,
"action": action,
"has_active_device": False,
"status_code": payload.get("status_code", 204),
"message": payload.get("message") or "No active Spotify playback session was found.",
}
return None
def _handle_spotify_playback(args: dict, **kw) -> str:
action = str(args.get("action") or "get_state").strip().lower()
client = _spotify_client()
try:
if action == "get_state":
payload = client.get_playback_state(market=args.get("market"))
empty_result = _describe_empty_playback(payload, action=action)
return tool_result(empty_result or payload)
if action == "get_currently_playing":
payload = client.get_currently_playing(market=args.get("market"))
empty_result = _describe_empty_playback(payload, action=action)
return tool_result(empty_result or payload)
if action == "play":
offset = args.get("offset")
if isinstance(offset, dict):
payload_offset = {k: v for k, v in offset.items() if v is not None}
else:
payload_offset = None
uris = normalize_spotify_uris(_as_list(args.get("uris")), "track") if args.get("uris") else None
context_uri = None
if args.get("context_uri"):
raw_context = str(args.get("context_uri"))
context_type = None
if raw_context.startswith("spotify:album:") or "/album/" in raw_context:
context_type = "album"
elif raw_context.startswith("spotify:playlist:") or "/playlist/" in raw_context:
context_type = "playlist"
elif raw_context.startswith("spotify:artist:") or "/artist/" in raw_context:
context_type = "artist"
context_uri = normalize_spotify_uri(raw_context, context_type)
result = client.start_playback(
device_id=args.get("device_id"),
context_uri=context_uri,
uris=uris,
offset=payload_offset,
position_ms=args.get("position_ms"),
)
return tool_result({"success": True, "action": action, "result": result})
if action == "pause":
result = client.pause_playback(device_id=args.get("device_id"))
return tool_result({"success": True, "action": action, "result": result})
if action == "next":
result = client.skip_next(device_id=args.get("device_id"))
return tool_result({"success": True, "action": action, "result": result})
if action == "previous":
result = client.skip_previous(device_id=args.get("device_id"))
return tool_result({"success": True, "action": action, "result": result})
if action == "seek":
if args.get("position_ms") is None:
return tool_error("position_ms is required for action='seek'")
result = client.seek(position_ms=int(args["position_ms"]), device_id=args.get("device_id"))
return tool_result({"success": True, "action": action, "result": result})
if action == "set_repeat":
state = str(args.get("state") or "").strip().lower()
if state not in {"track", "context", "off"}:
return tool_error("state must be one of: track, context, off")
result = client.set_repeat(state=state, device_id=args.get("device_id"))
return tool_result({"success": True, "action": action, "result": result})
if action == "set_shuffle":
result = client.set_shuffle(state=_coerce_bool(args.get("state")), device_id=args.get("device_id"))
return tool_result({"success": True, "action": action, "result": result})
if action == "set_volume":
if args.get("volume_percent") is None:
return tool_error("volume_percent is required for action='set_volume'")
result = client.set_volume(volume_percent=max(0, min(100, int(args["volume_percent"]))), device_id=args.get("device_id"))
return tool_result({"success": True, "action": action, "result": result})
if action == "recently_played":
after = args.get("after")
before = args.get("before")
if after and before:
return tool_error("Provide only one of 'after' or 'before'")
return tool_result(client.get_recently_played(
limit=_coerce_limit(args.get("limit"), default=20),
after=int(after) if after is not None else None,
before=int(before) if before is not None else None,
))
return tool_error(f"Unknown spotify_playback action: {action}")
except Exception as exc:
return _spotify_tool_error(exc)
def _handle_spotify_devices(args: dict, **kw) -> str:
action = str(args.get("action") or "list").strip().lower()
client = _spotify_client()
try:
if action == "list":
return tool_result(client.get_devices())
if action == "transfer":
device_id = str(args.get("device_id") or "").strip()
if not device_id:
return tool_error("device_id is required for action='transfer'")
result = client.transfer_playback(device_id=device_id, play=_coerce_bool(args.get("play")))
return tool_result({"success": True, "action": action, "result": result})
return tool_error(f"Unknown spotify_devices action: {action}")
except Exception as exc:
return _spotify_tool_error(exc)
def _handle_spotify_queue(args: dict, **kw) -> str:
action = str(args.get("action") or "get").strip().lower()
client = _spotify_client()
try:
if action == "get":
return tool_result(client.get_queue())
if action == "add":
uri = normalize_spotify_uri(str(args.get("uri") or ""), None)
result = client.add_to_queue(uri=uri, device_id=args.get("device_id"))
return tool_result({"success": True, "action": action, "uri": uri, "result": result})
return tool_error(f"Unknown spotify_queue action: {action}")
except Exception as exc:
return _spotify_tool_error(exc)
def _handle_spotify_search(args: dict, **kw) -> str:
client = _spotify_client()
query = str(args.get("query") or "").strip()
if not query:
return tool_error("query is required")
raw_types = _as_list(args.get("types") or args.get("type") or ["track"])
search_types = [value.lower() for value in raw_types if value.lower() in {"album", "artist", "playlist", "track", "show", "episode", "audiobook"}]
if not search_types:
return tool_error("types must contain one or more of: album, artist, playlist, track, show, episode, audiobook")
try:
return tool_result(client.search(
query=query,
search_types=search_types,
limit=_coerce_limit(args.get("limit"), default=10),
offset=max(0, int(args.get("offset") or 0)),
market=args.get("market"),
include_external=args.get("include_external"),
))
except Exception as exc:
return _spotify_tool_error(exc)
def _handle_spotify_playlists(args: dict, **kw) -> str:
action = str(args.get("action") or "list").strip().lower()
client = _spotify_client()
try:
if action == "list":
return tool_result(client.get_my_playlists(
limit=_coerce_limit(args.get("limit"), default=20),
offset=max(0, int(args.get("offset") or 0)),
))
if action == "get":
playlist_id = normalize_spotify_id(str(args.get("playlist_id") or ""), "playlist")
return tool_result(client.get_playlist(playlist_id=playlist_id, market=args.get("market")))
if action == "create":
name = str(args.get("name") or "").strip()
if not name:
return tool_error("name is required for action='create'")
return tool_result(client.create_playlist(
name=name,
public=_coerce_bool(args.get("public")),
collaborative=_coerce_bool(args.get("collaborative")),
description=args.get("description"),
))
if action == "add_items":
playlist_id = normalize_spotify_id(str(args.get("playlist_id") or ""), "playlist")
uris = normalize_spotify_uris(_as_list(args.get("uris")))
return tool_result(client.add_playlist_items(
playlist_id=playlist_id,
uris=uris,
position=args.get("position"),
))
if action == "remove_items":
playlist_id = normalize_spotify_id(str(args.get("playlist_id") or ""), "playlist")
uris = normalize_spotify_uris(_as_list(args.get("uris")))
return tool_result(client.remove_playlist_items(
playlist_id=playlist_id,
uris=uris,
snapshot_id=args.get("snapshot_id"),
))
if action == "update_details":
playlist_id = normalize_spotify_id(str(args.get("playlist_id") or ""), "playlist")
return tool_result(client.update_playlist_details(
playlist_id=playlist_id,
name=args.get("name"),
public=args.get("public"),
collaborative=args.get("collaborative"),
description=args.get("description"),
))
return tool_error(f"Unknown spotify_playlists action: {action}")
except Exception as exc:
return _spotify_tool_error(exc)
def _handle_spotify_albums(args: dict, **kw) -> str:
action = str(args.get("action") or "get").strip().lower()
client = _spotify_client()
try:
album_id = normalize_spotify_id(str(args.get("album_id") or args.get("id") or ""), "album")
if action == "get":
return tool_result(client.get_album(album_id=album_id, market=args.get("market")))
if action == "tracks":
return tool_result(client.get_album_tracks(
album_id=album_id,
limit=_coerce_limit(args.get("limit"), default=20),
offset=max(0, int(args.get("offset") or 0)),
market=args.get("market"),
))
return tool_error(f"Unknown spotify_albums action: {action}")
except Exception as exc:
return _spotify_tool_error(exc)
def _handle_spotify_library(args: dict, **kw) -> str:
"""Unified handler for saved tracks + saved albums (formerly two tools)."""
kind = str(args.get("kind") or "").strip().lower()
if kind not in {"tracks", "albums"}:
return tool_error("kind must be one of: tracks, albums")
action = str(args.get("action") or "list").strip().lower()
item_type = "track" if kind == "tracks" else "album"
client = _spotify_client()
try:
if action == "list":
limit = _coerce_limit(args.get("limit"), default=20)
offset = max(0, int(args.get("offset") or 0))
market = args.get("market")
if kind == "tracks":
return tool_result(client.get_saved_tracks(limit=limit, offset=offset, market=market))
return tool_result(client.get_saved_albums(limit=limit, offset=offset, market=market))
if action == "save":
uris = normalize_spotify_uris(_as_list(args.get("uris") or args.get("items")), item_type)
return tool_result(client.save_library_items(uris=uris))
if action == "remove":
ids = [normalize_spotify_id(item, item_type) for item in _as_list(args.get("ids") or args.get("items"))]
if not ids:
return tool_error("ids/items is required for action='remove'")
if kind == "tracks":
return tool_result(client.remove_saved_tracks(track_ids=ids))
return tool_result(client.remove_saved_albums(album_ids=ids))
return tool_error(f"Unknown spotify_library action: {action}")
except Exception as exc:
return _spotify_tool_error(exc)
COMMON_STRING = {"type": "string"}
SPOTIFY_PLAYBACK_SCHEMA = {
"name": "spotify_playback",
"description": "Control Spotify playback, inspect the active playback state, or fetch recently played tracks.",
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["get_state", "get_currently_playing", "play", "pause", "next", "previous", "seek", "set_repeat", "set_shuffle", "set_volume", "recently_played"]},
"device_id": COMMON_STRING,
"market": COMMON_STRING,
"context_uri": COMMON_STRING,
"uris": {"type": "array", "items": COMMON_STRING},
"offset": {"type": "object"},
"position_ms": {"type": "integer"},
"state": {"description": "For set_repeat use track/context/off. For set_shuffle use boolean-like true/false.", "oneOf": [{"type": "string"}, {"type": "boolean"}]},
"volume_percent": {"type": "integer"},
"limit": {"type": "integer", "description": "For recently_played: number of tracks (max 50)"},
"after": {"type": "integer", "description": "For recently_played: Unix ms cursor (after this timestamp)"},
"before": {"type": "integer", "description": "For recently_played: Unix ms cursor (before this timestamp)"},
},
"required": ["action"],
},
}
SPOTIFY_DEVICES_SCHEMA = {
"name": "spotify_devices",
"description": "List Spotify Connect devices or transfer playback to a different device.",
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["list", "transfer"]},
"device_id": COMMON_STRING,
"play": {"type": "boolean"},
},
"required": ["action"],
},
}
SPOTIFY_QUEUE_SCHEMA = {
"name": "spotify_queue",
"description": "Inspect the user's Spotify queue or add an item to it.",
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["get", "add"]},
"uri": COMMON_STRING,
"device_id": COMMON_STRING,
},
"required": ["action"],
},
}
SPOTIFY_SEARCH_SCHEMA = {
"name": "spotify_search",
"description": "Search the Spotify catalog for tracks, albums, artists, playlists, shows, or episodes.",
"parameters": {
"type": "object",
"properties": {
"query": COMMON_STRING,
"types": {"type": "array", "items": COMMON_STRING},
"type": COMMON_STRING,
"limit": {"type": "integer"},
"offset": {"type": "integer"},
"market": COMMON_STRING,
"include_external": COMMON_STRING,
},
"required": ["query"],
},
}
SPOTIFY_PLAYLISTS_SCHEMA = {
"name": "spotify_playlists",
"description": "List, inspect, create, update, and modify Spotify playlists.",
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["list", "get", "create", "add_items", "remove_items", "update_details"]},
"playlist_id": COMMON_STRING,
"market": COMMON_STRING,
"limit": {"type": "integer"},
"offset": {"type": "integer"},
"name": COMMON_STRING,
"description": COMMON_STRING,
"public": {"type": "boolean"},
"collaborative": {"type": "boolean"},
"uris": {"type": "array", "items": COMMON_STRING},
"position": {"type": "integer"},
"snapshot_id": COMMON_STRING,
},
"required": ["action"],
},
}
SPOTIFY_ALBUMS_SCHEMA = {
"name": "spotify_albums",
"description": "Fetch Spotify album metadata or album tracks.",
"parameters": {
"type": "object",
"properties": {
"action": {"type": "string", "enum": ["get", "tracks"]},
"album_id": COMMON_STRING,
"id": COMMON_STRING,
"market": COMMON_STRING,
"limit": {"type": "integer"},
"offset": {"type": "integer"},
},
"required": ["action"],
},
}
SPOTIFY_LIBRARY_SCHEMA = {
"name": "spotify_library",
"description": "List, save, or remove the user's saved Spotify tracks or albums. Use `kind` to select which.",
"parameters": {
"type": "object",
"properties": {
"kind": {"type": "string", "enum": ["tracks", "albums"], "description": "Which library to operate on"},
"action": {"type": "string", "enum": ["list", "save", "remove"]},
"limit": {"type": "integer"},
"offset": {"type": "integer"},
"market": COMMON_STRING,
"uris": {"type": "array", "items": COMMON_STRING},
"ids": {"type": "array", "items": COMMON_STRING},
"items": {"type": "array", "items": COMMON_STRING},
},
"required": ["kind", "action"],
},
}

View file

@ -78,6 +78,16 @@ termux = [
]
dingtalk = ["dingtalk-stream>=0.20,<1", "alibabacloud-dingtalk>=2.0.0", "qrcode>=7.0,<8"]
feishu = ["lark-oapi>=1.5.3,<2", "qrcode>=7.0,<8"]
google = [
# Required by the google-workspace skill (Gmail, Calendar, Drive, Contacts,
# Sheets, Docs). Declared here so packagers (Nix, Homebrew) ship them with
# the [all] extra and users don't hit runtime `pip install` paths that fail
# in environments without pip (e.g. Nix-managed Python).
"google-api-python-client>=2.100,<3",
"google-auth-oauthlib>=1.0,<2",
"google-auth-httplib2>=0.2,<1",
]
# `hermes dashboard` (localhost SPA + API). Not in core to keep the default install lean.
web = ["fastapi>=0.104.0,<1", "uvicorn[standard]>=0.24.0,<1"]
rl = [
"atroposlib @ git+https://github.com/NousResearch/atropos.git@c20c85256e5a45ad31edf8b7276e9c5ee1995a30",
@ -109,6 +119,7 @@ all = [
"hermes-agent[voice]",
"hermes-agent[dingtalk]",
"hermes-agent[feishu]",
"hermes-agent[google]",
"hermes-agent[mistral]",
"hermes-agent[bedrock]",
"hermes-agent[web]",

File diff suppressed because it is too large Load diff

View file

@ -44,9 +44,13 @@ AUTHOR_MAP = {
"teknium@nousresearch.com": "teknium1",
"127238744+teknium1@users.noreply.github.com": "teknium1",
"343873859@qq.com": "DrStrangerUJN",
"uzmpsk.dilekakbas@gmail.com": "dlkakbs",
"jefferson@heimdallstrategy.com": "Mind-Dragon",
"130918800+devorun@users.noreply.github.com": "devorun",
"maks.mir@yahoo.com": "say8hi",
"web3blind@users.noreply.github.com": "web3blind",
"julia@alexland.us": "alexg0bot",
"1060770+benjaminsehl@users.noreply.github.com": "benjaminsehl",
# contributors (from noreply pattern)
"david.vv@icloud.com": "davidvv",
"wangqiang@wangqiangdeMac-mini.local": "xiaoqiang243",
@ -58,13 +62,19 @@ AUTHOR_MAP = {
"keifergu@tencent.com": "keifergu",
"kshitijk4poor@users.noreply.github.com": "kshitijk4poor",
"abner.the.foreman@agentmail.to": "Abnertheforeman",
"thomasgeorgevii09@gmail.com": "tochukwuada",
"harryykyle1@gmail.com": "hharry11",
"kshitijk4poor@gmail.com": "kshitijk4poor",
"keira.voss94@gmail.com": "keiravoss94",
"16443023+stablegenius49@users.noreply.github.com": "stablegenius49",
"simbamax99@gmail.com": "simbam99",
"185121704+stablegenius49@users.noreply.github.com": "stablegenius49",
"101283333+batuhankocyigit@users.noreply.github.com": "batuhankocyigit",
"255305877+ismell0992-afk@users.noreply.github.com": "ismell0992-afk",
"cyprian@ironin.pl": "iRonin",
"valdi.jorge@gmail.com": "jvcl",
"q19dcp@gmail.com": "aj-nt",
"ebukau84@gmail.com": "UgwujaGeorge",
"francip@gmail.com": "francip",
"omni@comelse.com": "omnissiah-comelse",
"oussama.redcode@gmail.com": "mavrickdeveloper",
@ -77,6 +87,7 @@ AUTHOR_MAP = {
"77628552+raulvidis@users.noreply.github.com": "raulvidis",
"145567217+Aum08Desai@users.noreply.github.com": "Aum08Desai",
"256820943+kshitij-eliza@users.noreply.github.com": "kshitij-eliza",
"jiechengwu@pony.ai": "Jason2031",
"44278268+shitcoinsherpa@users.noreply.github.com": "shitcoinsherpa",
"104278804+Sertug17@users.noreply.github.com": "Sertug17",
"112503481+caentzminger@users.noreply.github.com": "caentzminger",
@ -103,6 +114,7 @@ AUTHOR_MAP = {
"30841158+n-WN@users.noreply.github.com": "n-WN",
"tsuijinglei@gmail.com": "hiddenpuppy",
"jerome@clawwork.ai": "HiddenPuppy",
"jerome.benoit@sap.com": "jerome-benoit",
"wysie@users.noreply.github.com": "Wysie",
"leoyuan0099@gmail.com": "keyuyuan",
"bxzt2006@163.com": "Only-Code-A",
@ -167,6 +179,39 @@ AUTHOR_MAP = {
"socrates1024@gmail.com": "socrates1024",
"seanalt555@gmail.com": "Salt-555",
"satelerd@gmail.com": "satelerd",
"dan@danlynn.com": "danklynn",
"mattmaximo@hotmail.com": "MattMaximo",
"149063006+j3ffffff@users.noreply.github.com": "j3ffffff",
"A-FdL-Prog@users.noreply.github.com": "A-FdL-Prog",
"l0hde@users.noreply.github.com": "l0hde",
"difujia@users.noreply.github.com": "difujia",
"vominh1919@gmail.com": "vominh1919",
"yue.gu2023@gmail.com": "YueLich",
"51783311+andyylin@users.noreply.github.com": "andyylin",
"me@jakubkrcmar.cz": "jakubkrcmar",
"prasadus92@gmail.com": "prasadus92",
"michael@make.software": "mssteuer",
"der@konsi.org": "konsisumer",
"abogale2@gmail.com": "amanuel2",
"alexazzjjtt@163.com": "alexzhu0",
"pub_forgreatagent@antgroup.com": "AntAISecurityLab",
"252620095+briandevans@users.noreply.github.com": "briandevans",
"danielrpike9@gmail.com": "Bartok9",
"skozyuk@cruxexperts.com": "CruxExperts",
"154585401+LeonSGP43@users.noreply.github.com": "LeonSGP43",
"mgparkprint@gmail.com": "vlwkaos",
"tranquil_flow@protonmail.com": "Tranquil-Flow",
"wangshengyang2004@163.com": "Wangshengyang2004",
"hasan.ali13381@gmail.com": "H-Ali13381",
"xienb@proton.me": "XieNBi",
"139681654+maymuneth@users.noreply.github.com": "maymuneth",
"zengwei@nightq.cn": "nightq",
"1434494126@qq.com": "5park1e",
"158153005+5park1e@users.noreply.github.com": "5park1e",
"innocarpe@gmail.com": "innocarpe",
"noreply@ked.com": "qike-ms",
"andrekurait@gmail.com": "AndreKurait",
"bsgdigital@users.noreply.github.com": "bsgdigital",
"numman.ali@gmail.com": "nummanali",
"rohithsaimidigudla@gmail.com": "whitehatjr1001",
"0xNyk@users.noreply.github.com": "0xNyk",
@ -185,6 +230,11 @@ AUTHOR_MAP = {
"bryan@intertwinesys.com": "bryanyoung",
"christo.mitov@gmail.com": "christomitov",
"hermes@nousresearch.com": "NousResearch",
"reginaldasr@gmail.com": "ReginaldasR",
"ntconguit@gmail.com": "0xharryriddle",
"agent@wildcat.local": "ericnicolaides",
"georgex8001@gmail.com": "georgex8001",
"stefan@dimagents.ai": "dimitrovi",
"hermes@noushq.ai": "benbarclay",
"chinmingcock@gmail.com": "ChimingLiu",
"openclaw@sparklab.ai": "openclaw",
@ -333,6 +383,9 @@ AUTHOR_MAP = {
"brian@bde.io": "briandevans",
"hubin_ll@qq.com": "LLQWQ",
"memosr_email@gmail.com": "memosr",
"jperlow@gmail.com": "perlowja",
"tangyuanjc@JCdeAIfenshendeMac-mini.local": "tangyuanjc",
"harryplusplus@gmail.com": "harryplusplus",
"anthhub@163.com": "anthhub",
"shenuu@gmail.com": "shenuu",
"xiayh17@gmail.com": "xiayh0107",
@ -436,6 +489,12 @@ AUTHOR_MAP = {
"topcheer@me.com": "topcheer",
"walli@tencent.com": "walli",
"zhuofengwang@tencent.com": "Zhuofeng-Wang",
# April 2026 salvage-PR batch (#14920, #14986, #14966)
"mrunmayeerane17@gmail.com": "mrunmayee17",
"69489633+camaragon@users.noreply.github.com": "camaragon",
"shamork@outlook.com": "shamork",
# April 2026 Discord Copilot /model salvage (#15030)
"cshong2017@outlook.com": "Nicecsh",
# no-github-match — keep as display names
"clio-agent@sisyphuslabs.ai": "Sisyphus",
"marco@rutimka.de": "Marco Rutsch",
@ -443,6 +502,7 @@ AUTHOR_MAP = {
"zhangxicen@example.com": "zhangxicen",
"codex@openai.invalid": "teknium1",
"screenmachine@gmail.com": "teknium1",
"chenzeshi@live.com": "chen1749144759",
}

View file

@ -248,7 +248,6 @@ Type these during an interactive chat session.
```
/config Show config (CLI)
/model [name] Show or change model
/provider Show provider info
/personality [name] Set personality
/reasoning [level] Set reasoning (none|minimal|low|medium|high|xhigh|show|hide)
/verbose Cycle: off → new → all → verbose

View file

@ -0,0 +1,196 @@
---
name: design-md
description: Author, validate, diff, and export DESIGN.md files — Google's open-source format spec that gives coding agents a persistent, structured understanding of a design system (tokens + rationale in one file). Use when building a design system, porting style rules between projects, generating UI with consistent brand, or auditing accessibility/contrast.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [design, design-system, tokens, ui, accessibility, wcag, tailwind, dtcg, google]
related_skills: [popular-web-designs, excalidraw, architecture-diagram]
---
# DESIGN.md Skill
DESIGN.md is Google's open spec (Apache-2.0, `google-labs-code/design.md`) for
describing a visual identity to coding agents. One file combines:
- **YAML front matter** — machine-readable design tokens (normative values)
- **Markdown body** — human-readable rationale, organized into canonical sections
Tokens give exact values. Prose tells agents *why* those values exist and how to
apply them. The CLI (`npx @google/design.md`) lints structure + WCAG contrast,
diffs versions for regressions, and exports to Tailwind or W3C DTCG JSON.
## When to use this skill
- User asks for a DESIGN.md file, design tokens, or a design system spec
- User wants consistent UI/brand across multiple projects or tools
- User pastes an existing DESIGN.md and asks to lint, diff, export, or extend it
- User asks to port a style guide into a format agents can consume
- User wants contrast / WCAG accessibility validation on their color palette
For purely visual inspiration or layout examples, use `popular-web-designs`
instead. This skill is for the *formal spec file* itself.
## File anatomy
```md
---
version: alpha
name: Heritage
description: Architectural minimalism meets journalistic gravitas.
colors:
primary: "#1A1C1E"
secondary: "#6C7278"
tertiary: "#B8422E"
neutral: "#F7F5F2"
typography:
h1:
fontFamily: Public Sans
fontSize: 3rem
fontWeight: 700
lineHeight: 1.1
letterSpacing: "-0.02em"
body-md:
fontFamily: Public Sans
fontSize: 1rem
rounded:
sm: 4px
md: 8px
lg: 16px
spacing:
sm: 8px
md: 16px
lg: 24px
components:
button-primary:
backgroundColor: "{colors.tertiary}"
textColor: "#FFFFFF"
rounded: "{rounded.sm}"
padding: 12px
button-primary-hover:
backgroundColor: "{colors.primary}"
---
## Overview
Architectural Minimalism meets Journalistic Gravitas...
## Colors
- **Primary (#1A1C1E):** Deep ink for headlines and core text.
- **Tertiary (#B8422E):** "Boston Clay" — the sole driver for interaction.
## Typography
Public Sans for everything except small all-caps labels...
## Components
`button-primary` is the only high-emphasis action on a page...
```
## Token types
| Type | Format | Example |
|------|--------|---------|
| Color | `#` + hex (sRGB) | `"#1A1C1E"` |
| Dimension | number + unit (`px`, `em`, `rem`) | `48px`, `-0.02em` |
| Token reference | `{path.to.token}` | `{colors.primary}` |
| Typography | object with `fontFamily`, `fontSize`, `fontWeight`, `lineHeight`, `letterSpacing`, `fontFeature`, `fontVariation` | see above |
Component property whitelist: `backgroundColor`, `textColor`, `typography`,
`rounded`, `padding`, `size`, `height`, `width`. Variants (hover, active,
pressed) are **separate component entries** with related key names
(`button-primary-hover`), not nested.
## Canonical section order
Sections are optional, but present ones MUST appear in this order. Duplicate
headings reject the file.
1. Overview (alias: Brand & Style)
2. Colors
3. Typography
4. Layout (alias: Layout & Spacing)
5. Elevation & Depth (alias: Elevation)
6. Shapes
7. Components
8. Do's and Don'ts
Unknown sections are preserved, not errored. Unknown token names are accepted
if the value type is valid. Unknown component properties produce a warning.
## Workflow: authoring a new DESIGN.md
1. **Ask the user** (or infer) the brand tone, accent color, and typography
direction. If they provided a site, image, or vibe, translate it to the
token shape above.
2. **Write `DESIGN.md`** in their project root using `write_file`. Always
include `name:` and `colors:`; other sections optional but encouraged.
3. **Use token references** (`{colors.primary}`) in the `components:` section
instead of re-typing hex values. Keeps the palette single-source.
4. **Lint it** (see below). Fix any broken references or WCAG failures
before returning.
5. **If the user has an existing project**, also write Tailwind or DTCG
exports next to the file (`tailwind.theme.json`, `tokens.json`).
## Workflow: lint / diff / export
The CLI is `@google/design.md` (Node). Use `npx` — no global install needed.
```bash
# Validate structure + token references + WCAG contrast
npx -y @google/design.md lint DESIGN.md
# Compare two versions, fail on regression (exit 1 = regression)
npx -y @google/design.md diff DESIGN.md DESIGN-v2.md
# Export to Tailwind theme JSON
npx -y @google/design.md export --format tailwind DESIGN.md > tailwind.theme.json
# Export to W3C DTCG (Design Tokens Format Module) JSON
npx -y @google/design.md export --format dtcg DESIGN.md > tokens.json
# Print the spec itself — useful when injecting into an agent prompt
npx -y @google/design.md spec --rules-only --format json
```
All commands accept `-` for stdin. `lint` returns exit 1 on errors. Use the
`--format json` flag and parse the output if you need to report findings
structurally.
### Lint rule reference (what the 7 rules catch)
- `broken-ref` (error) — `{colors.missing}` points at a non-existent token
- `duplicate-section` (error) — same `## Heading` appears twice
- `invalid-color`, `invalid-dimension`, `invalid-typography` (error)
- `wcag-contrast` (warning/info) — component `textColor` vs `backgroundColor`
ratio against WCAG AA (4.5:1) and AAA (7:1)
- `unknown-component-property` (warning) — outside the whitelist above
When the user cares about accessibility, call this out explicitly in your
summary — WCAG findings are the most load-bearing reason to use the CLI.
## Pitfalls
- **Don't nest component variants.** `button-primary.hover` is wrong;
`button-primary-hover` as a sibling key is right.
- **Hex colors must be quoted strings.** YAML will otherwise choke on `#` or
truncate values like `#1A1C1E` oddly.
- **Negative dimensions need quotes too.** `letterSpacing: -0.02em` parses as
a YAML flow — write `letterSpacing: "-0.02em"`.
- **Section order is enforced.** If the user gives you prose in a random order,
reorder it to match the canonical list before saving.
- **`version: alpha` is the current spec version** (as of Apr 2026). The spec
is marked alpha — watch for breaking changes.
- **Token references resolve by dotted path.** `{colors.primary}` works;
`{primary}` does not.
## Spec source of truth
- Repo: https://github.com/google-labs-code/design.md (Apache-2.0)
- CLI: `@google/design.md` on npm
- License of generated DESIGN.md files: whatever the user's project uses;
the spec itself is Apache-2.0.

View file

@ -0,0 +1,99 @@
---
version: alpha
name: MyBrand
description: One-sentence description of the visual identity.
colors:
primary: "#0F172A"
secondary: "#64748B"
tertiary: "#2563EB"
neutral: "#F8FAFC"
on-primary: "#FFFFFF"
on-tertiary: "#FFFFFF"
typography:
h1:
fontFamily: Inter
fontSize: 3rem
fontWeight: 700
lineHeight: 1.1
letterSpacing: "-0.02em"
h2:
fontFamily: Inter
fontSize: 2rem
fontWeight: 600
lineHeight: 1.2
body-md:
fontFamily: Inter
fontSize: 1rem
lineHeight: 1.5
label-caps:
fontFamily: Inter
fontSize: 0.75rem
fontWeight: 600
letterSpacing: "0.08em"
rounded:
sm: 4px
md: 8px
lg: 16px
full: 9999px
spacing:
xs: 4px
sm: 8px
md: 16px
lg: 24px
xl: 48px
components:
button-primary:
backgroundColor: "{colors.tertiary}"
textColor: "{colors.on-tertiary}"
rounded: "{rounded.sm}"
padding: 12px
button-primary-hover:
backgroundColor: "{colors.primary}"
textColor: "{colors.on-primary}"
card:
backgroundColor: "{colors.neutral}"
textColor: "{colors.primary}"
rounded: "{rounded.md}"
padding: 24px
---
## Overview
Describe the voice and feel of the brand in one or two paragraphs. What mood
does it evoke? What emotional response should a user have on first impression?
## Colors
- **Primary ({colors.primary}):** Core text, headlines, high-emphasis surfaces.
- **Secondary ({colors.secondary}):** Supporting text, borders, metadata.
- **Tertiary ({colors.tertiary}):** Interaction driver — buttons, links,
selected states. Use sparingly to preserve its signal.
- **Neutral ({colors.neutral}):** Page background and surface fills.
## Typography
Inter for everything. Weight and size carry hierarchy, not font family. Tight
letter-spacing on display sizes; default tracking on body.
## Layout
Spacing scale is a 4px baseline. Use `md` (16px) for intra-component gaps,
`lg` (24px) for inter-component gaps, `xl` (48px) for section breaks.
## Shapes
Rounded corners are modest — `sm` on interactive elements, `md` on cards.
`full` is reserved for avatars and pill badges.
## Components
- `button-primary` is the only high-emphasis action per screen.
- `card` is the default surface for grouped content. No shadow by default.
## Do's and Don'ts
- **Do** use token references (`{colors.primary}`) instead of literal hex in
component definitions.
- **Don't** introduce colors outside the palette — extend the palette first.
- **Don't** nest component variants. `button-primary-hover` is a sibling,
not a child.

View file

@ -0,0 +1,134 @@
---
name: spotify
description: Control Spotify — play music, search the catalog, manage playlists and library, inspect devices and playback state. Loads when the user asks to play/pause/queue music, search tracks/albums/artists, manage playlists, or check what's playing. Assumes the Hermes Spotify toolset is enabled and `hermes auth spotify` has been run.
version: 1.0.0
author: Hermes Agent
license: MIT
prerequisites:
tools: [spotify_playback, spotify_devices, spotify_queue, spotify_search, spotify_playlists, spotify_albums, spotify_library]
metadata:
hermes:
tags: [spotify, music, playback, playlists, media]
related_skills: [gif-search]
---
# Spotify
Control the user's Spotify account via the Hermes Spotify toolset (7 tools). Setup guide: https://hermes-agent.nousresearch.com/docs/user-guide/features/spotify
## When to use this skill
The user says something like "play X", "pause", "skip", "queue up X", "what's playing", "search for X", "add to my X playlist", "make a playlist", "save this to my library", etc.
## The 7 tools
- `spotify_playback` — play, pause, next, previous, seek, set_repeat, set_shuffle, set_volume, get_state, get_currently_playing, recently_played
- `spotify_devices` — list, transfer
- `spotify_queue` — get, add
- `spotify_search` — search the catalog
- `spotify_playlists` — list, get, create, add_items, remove_items, update_details
- `spotify_albums` — get, tracks
- `spotify_library` — list/save/remove with `kind: "tracks"|"albums"`
Playback-mutating actions require Spotify Premium; search/library/playlist ops work on Free.
## Canonical patterns (minimize tool calls)
### "Play <artist/track/album>"
One search, then play by URI. Do NOT loop through search results describing them unless the user asked for options.
```
spotify_search({"query": "miles davis kind of blue", "types": ["album"], "limit": 1})
→ got album URI spotify:album:1weenld61qoidwYuZ1GESA
spotify_playback({"action": "play", "context_uri": "spotify:album:1weenld61qoidwYuZ1GESA"})
```
For "play some <artist>" (no specific song), prefer `types: ["artist"]` and play the artist context URI — Spotify handles smart shuffle. If the user says "the song" or "that track", search `types: ["track"]` and pass `uris: [track_uri]` to play.
### "What's playing?" / "What am I listening to?"
Single call — don't chain get_state after get_currently_playing.
```
spotify_playback({"action": "get_currently_playing"})
```
If it returns 204/empty (`is_playing: false`), tell the user nothing is playing. Don't retry.
### "Pause" / "Skip" / "Volume 50"
Direct action, no preflight inspection needed.
```
spotify_playback({"action": "pause"})
spotify_playback({"action": "next"})
spotify_playback({"action": "set_volume", "volume_percent": 50})
```
### "Add to my <playlist name> playlist"
1. `spotify_playlists list` to find the playlist ID by name
2. Get the track URI (from currently playing, or search)
3. `spotify_playlists add_items` with the playlist_id and URIs
```
spotify_playlists({"action": "list"})
→ found "Late Night Jazz" = 37i9dQZF1DX4wta20PHgwo
spotify_playback({"action": "get_currently_playing"})
→ current track uri = spotify:track:0DiWol3AO6WpXZgp0goxAV
spotify_playlists({"action": "add_items",
"playlist_id": "37i9dQZF1DX4wta20PHgwo",
"uris": ["spotify:track:0DiWol3AO6WpXZgp0goxAV"]})
```
### "Create a playlist called X and add the last 3 songs I played"
```
spotify_playback({"action": "recently_played", "limit": 3})
spotify_playlists({"action": "create", "name": "Focus 2026"})
→ got playlist_id back in response
spotify_playlists({"action": "add_items", "playlist_id": <id>, "uris": [<3 uris>]})
```
### "Save / unsave / is this saved?"
Use `spotify_library` with the right `kind`.
```
spotify_library({"kind": "tracks", "action": "save", "uris": ["spotify:track:..."]})
spotify_library({"kind": "albums", "action": "list", "limit": 50})
```
### "Transfer playback to my <device>"
```
spotify_devices({"action": "list"})
→ pick the device_id by matching name/type
spotify_devices({"action": "transfer", "device_id": "<id>", "play": true})
```
## Critical failure modes
**`403 Forbidden — No active device found`** on any playback action means Spotify isn't running anywhere. Tell the user: "Open Spotify on your phone/desktop/web player first, start any track for a second, then retry." Don't retry the tool call blindly — it will fail the same way. You can call `spotify_devices list` to confirm; an empty list means no active device.
**`403 Forbidden — Premium required`** means the user is on Free and tried to mutate playback. Don't retry; tell them this action needs Premium. Reads still work (search, playlists, library, get_state).
**`204 No Content` on `get_currently_playing`** is NOT an error — it means nothing is playing. The tool returns `is_playing: false`. Just report that to the user.
**`429 Too Many Requests`** = rate limit. Wait and retry once. If it keeps happening, you're looping — stop.
**`401 Unauthorized` after a retry** — refresh token revoked. Tell the user to run `hermes auth spotify` again.
## URI and ID formats
Spotify uses three interchangeable ID formats. The tools accept all three and normalize:
- URI: `spotify:track:0DiWol3AO6WpXZgp0goxAV` (preferred)
- URL: `https://open.spotify.com/track/0DiWol3AO6WpXZgp0goxAV`
- Bare ID: `0DiWol3AO6WpXZgp0goxAV`
When in doubt, use full URIs. Search results return URIs in the `uri` field — pass those directly.
Entity types: `track`, `album`, `artist`, `playlist`, `show`, `episode`. Use the right type for the action — `spotify_playback.play` with a `context_uri` expects album/playlist/artist; `uris` expects an array of track URIs.
## What NOT to do
- **Don't call `get_state` before every action.** Spotify accepts play/pause/skip without preflight. Only inspect state when the user asked "what's playing" or you need to reason about device/track.
- **Don't describe search results unless asked.** If the user said "play X", search, grab the top URI, play it. They'll hear it's wrong if it's wrong.
- **Don't retry on `403 Premium required` or `403 No active device`.** Those are permanent until user action.
- **Don't use `spotify_search` to find a playlist by name** — that searches the public Spotify catalog. User playlists come from `spotify_playlists list`.
- **Don't mix `kind: "tracks"` with album URIs** in `spotify_library` (or vice versa). The tool normalizes IDs but the API endpoint differs.

View file

@ -134,6 +134,7 @@ masks = processor.image_processor.post_process_masks(
### Model architecture
<!-- ascii-guard-ignore -->
```
SAM Architecture:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
@ -144,6 +145,7 @@ SAM Architecture:
Image Embeddings Prompt Embeddings Masks + IoU
(computed once) (per prompt) predictions
```
<!-- ascii-guard-ignore-end -->
### Model variants

View file

@ -0,0 +1,42 @@
"""Resolve HERMES_HOME for standalone skill scripts.
Skill scripts may run outside the Hermes process (e.g. system Python,
nix env, CI) where ``hermes_constants`` is not importable. This module
provides the same ``get_hermes_home()`` and ``display_hermes_home()``
contracts as ``hermes_constants`` without requiring it on ``sys.path``.
When ``hermes_constants`` IS available it is used directly so that any
future enhancements (profile resolution, Docker detection, etc.) are
picked up automatically. The fallback path replicates the core logic
from ``hermes_constants.py`` using only the stdlib.
All scripts under ``google-workspace/scripts/`` should import from here
instead of duplicating the ``HERMES_HOME = Path(os.getenv(...))`` pattern.
"""
from __future__ import annotations
import os
from pathlib import Path
try:
from hermes_constants import display_hermes_home as display_hermes_home
from hermes_constants import get_hermes_home as get_hermes_home
except (ModuleNotFoundError, ImportError):
def get_hermes_home() -> Path:
"""Return the Hermes home directory (default: ~/.hermes).
Mirrors ``hermes_constants.get_hermes_home()``."""
val = os.environ.get("HERMES_HOME", "").strip()
return Path(val) if val else Path.home() / ".hermes"
def display_hermes_home() -> str:
"""Return a user-friendly ``~/``-shortened display string.
Mirrors ``hermes_constants.display_hermes_home()``."""
home = get_hermes_home()
try:
return "~/" + str(home.relative_to(Path.home()))
except ValueError:
return str(home)

View file

@ -31,7 +31,14 @@ from datetime import datetime, timedelta, timezone
from email.mime.text import MIMEText
from pathlib import Path
HERMES_HOME = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
# Ensure sibling modules (_hermes_home) are importable when run standalone.
_SCRIPTS_DIR = str(Path(__file__).resolve().parent)
if _SCRIPTS_DIR not in sys.path:
sys.path.insert(0, _SCRIPTS_DIR)
from _hermes_home import get_hermes_home
HERMES_HOME = get_hermes_home()
TOKEN_PATH = HERMES_HOME / "google_token.json"
CLIENT_SECRET_PATH = HERMES_HOME / "google_client_secret.json"

View file

@ -10,9 +10,12 @@ import sys
from datetime import datetime, timezone
from pathlib import Path
# Ensure sibling modules (_hermes_home) are importable when run standalone.
_SCRIPTS_DIR = str(Path(__file__).resolve().parent)
if _SCRIPTS_DIR not in sys.path:
sys.path.insert(0, _SCRIPTS_DIR)
def get_hermes_home() -> Path:
return Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
from _hermes_home import get_hermes_home
def get_token_path() -> Path:

View file

@ -21,6 +21,8 @@ Agent workflow:
6. Run --check to verify. Done.
"""
from __future__ import annotations # allow PEP 604 `X | None` on Python 3.9+
import argparse
import json
import os
@ -28,13 +30,12 @@ import subprocess
import sys
from pathlib import Path
try:
from hermes_constants import display_hermes_home, get_hermes_home
except ModuleNotFoundError:
HERMES_AGENT_ROOT = Path(__file__).resolve().parents[4]
if HERMES_AGENT_ROOT.exists():
sys.path.insert(0, str(HERMES_AGENT_ROOT))
from hermes_constants import display_hermes_home, get_hermes_home
# Ensure sibling modules (_hermes_home) are importable when run standalone.
_SCRIPTS_DIR = str(Path(__file__).resolve().parent)
if _SCRIPTS_DIR not in sys.path:
sys.path.insert(0, _SCRIPTS_DIR)
from _hermes_home import display_hermes_home, get_hermes_home
HERMES_HOME = get_hermes_home()
TOKEN_PATH = HERMES_HOME / "google_token.json"
@ -111,7 +112,11 @@ def install_deps():
return True
except subprocess.CalledProcessError as e:
print(f"ERROR: Failed to install dependencies: {e}")
print(f"Try manually: {sys.executable} -m pip install {' '.join(REQUIRED_PACKAGES)}")
print(
"On environments without pip (e.g. Nix), install the optional extra instead:"
)
print(" pip install 'hermes-agent[google]'")
print(f"Or manually: {sys.executable} -m pip install {' '.join(REQUIRED_PACKAGES)}")
return False

View file

@ -22,6 +22,7 @@ End-to-end pipeline for producing publication-ready ML/AI research papers target
This is **not a linear pipeline** — it is an iterative loop. Results trigger new experiments. Reviews trigger new analysis. The agent must handle these feedback loops.
<!-- ascii-guard-ignore -->
```
┌─────────────────────────────────────────────────────────────┐
│ RESEARCH PAPER PIPELINE │
@ -41,6 +42,7 @@ This is **not a linear pipeline** — it is an iterative loop. Results trigger n
│ │
└─────────────────────────────────────────────────────────────┘
```
<!-- ascii-guard-ignore-end -->
---

View file

@ -904,9 +904,15 @@ class TestRegisterSessionMcpServers:
]
with patch("tools.mcp_tool.register_mcp_servers", return_value=["mcp_srv_search"]), \
patch("model_tools.get_tool_definitions", return_value=fake_tools):
patch("model_tools.get_tool_definitions", return_value=fake_tools) as mock_defs:
await agent._register_session_mcp_servers(state, [server])
mock_defs.assert_called_once_with(
enabled_toolsets=["hermes-acp", "mcp-srv"],
disabled_toolsets=None,
quiet_mode=True,
)
assert state.agent.enabled_toolsets == ["hermes-acp", "mcp-srv"]
assert state.agent.tools == fake_tools
assert state.agent.valid_tool_names == {"mcp_srv_search", "terminal"}
# _invalidate_system_prompt should have been called

View file

@ -138,6 +138,43 @@ class TestListAndCleanup:
class TestPersistence:
"""Verify that sessions are persisted to SessionDB and can be restored."""
def test_create_session_includes_registered_mcp_toolsets(self, tmp_path, monkeypatch):
captured = {}
def fake_resolve_runtime_provider(requested=None, **kwargs):
return {
"provider": "openrouter",
"api_mode": "chat_completions",
"base_url": "https://openrouter.example/v1",
"api_key": "***",
"command": None,
"args": [],
}
def fake_agent(**kwargs):
captured.update(kwargs)
return SimpleNamespace(model=kwargs.get("model"), enabled_toolsets=kwargs.get("enabled_toolsets"))
monkeypatch.setattr("hermes_cli.config.load_config", lambda: {
"model": {"provider": "openrouter", "default": "test-model"},
"mcp_servers": {
"olympus": {"command": "python", "enabled": True},
"exa": {"url": "https://exa.ai/mcp"},
"disabled": {"command": "python", "enabled": False},
},
})
monkeypatch.setattr(
"hermes_cli.runtime_provider.resolve_runtime_provider",
fake_resolve_runtime_provider,
)
db = SessionDB(tmp_path / "state.db")
with patch("run_agent.AIAgent", side_effect=fake_agent):
manager = SessionManager(db=db)
manager.create_session(cwd="/work")
assert captured["enabled_toolsets"] == ["hermes-acp", "mcp-olympus", "mcp-exa"]
def test_create_session_writes_to_db(self, manager):
state = manager.create_session(cwd="/project")
db = manager._get_db()

View file

@ -0,0 +1,165 @@
"""Tests for Bug #12905 fixes in agent/anthropic_adapter.py — macOS Keychain support."""
import json
import platform
from unittest.mock import patch, MagicMock
import pytest
from agent.anthropic_adapter import (
_read_claude_code_credentials_from_keychain,
read_claude_code_credentials,
)
class TestReadClaudeCodeCredentialsFromKeychain:
"""Bug 4: macOS Keychain support for Claude Code >=2.1.114."""
def test_returns_none_on_linux(self):
"""Keychain reading is Darwin-only; must return None on other platforms."""
with patch("agent.anthropic_adapter.platform.system", return_value="Linux"):
assert _read_claude_code_credentials_from_keychain() is None
def test_returns_none_on_windows(self):
with patch("agent.anthropic_adapter.platform.system", return_value="Windows"):
assert _read_claude_code_credentials_from_keychain() is None
def test_returns_none_when_security_command_not_found(self):
"""OSError from missing security binary must be handled gracefully."""
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run",
side_effect=OSError("security not found")):
assert _read_claude_code_credentials_from_keychain() is None
def test_returns_none_on_nonzero_exit_code(self):
"""security returns non-zero when the Keychain entry doesn't exist."""
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run") as mock_run:
mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="")
assert _read_claude_code_credentials_from_keychain() is None
def test_returns_none_for_empty_stdout(self):
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run") as mock_run:
mock_run.return_value = MagicMock(returncode=0, stdout="", stderr="")
assert _read_claude_code_credentials_from_keychain() is None
def test_returns_none_for_non_json_payload(self):
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run") as mock_run:
mock_run.return_value = MagicMock(returncode=0, stdout="not valid json", stderr="")
assert _read_claude_code_credentials_from_keychain() is None
def test_returns_none_when_password_field_is_missing_claude_ai_oauth(self):
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run") as mock_run:
mock_run.return_value = MagicMock(
returncode=0,
stdout=json.dumps({"someOtherService": {"accessToken": "tok"}}),
stderr="",
)
assert _read_claude_code_credentials_from_keychain() is None
def test_returns_none_when_access_token_is_empty(self):
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run") as mock_run:
mock_run.return_value = MagicMock(
returncode=0,
stdout=json.dumps({"claudeAiOauth": {"accessToken": "", "refreshToken": "x"}}),
stderr="",
)
assert _read_claude_code_credentials_from_keychain() is None
def test_parses_valid_keychain_entry(self):
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run") as mock_run:
mock_run.return_value = MagicMock(
returncode=0,
stdout=json.dumps({
"claudeAiOauth": {
"accessToken": "kc-access-token-abc",
"refreshToken": "kc-refresh-token-xyz",
"expiresAt": 9999999999999,
}
}),
stderr="",
)
creds = _read_claude_code_credentials_from_keychain()
assert creds is not None
assert creds["accessToken"] == "kc-access-token-abc"
assert creds["refreshToken"] == "kc-refresh-token-xyz"
assert creds["expiresAt"] == 9999999999999
assert creds["source"] == "macos_keychain"
class TestReadClaudeCodeCredentialsPriority:
"""Bug 4: Keychain must be checked before the JSON file."""
def test_keychain_takes_priority_over_json_file(self, tmp_path, monkeypatch):
"""When both Keychain and JSON file have credentials, Keychain wins."""
# Set up JSON file with "older" token
json_cred_file = tmp_path / ".claude" / ".credentials.json"
json_cred_file.parent.mkdir(parents=True)
json_cred_file.write_text(json.dumps({
"claudeAiOauth": {
"accessToken": "json-token",
"refreshToken": "json-refresh",
"expiresAt": 9999999999999,
}
}))
monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
# Mock Keychain to return a "newer" token
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run") as mock_run:
mock_run.return_value = MagicMock(
returncode=0,
stdout=json.dumps({
"claudeAiOauth": {
"accessToken": "keychain-token",
"refreshToken": "keychain-refresh",
"expiresAt": 9999999999999,
}
}),
stderr="",
)
creds = read_claude_code_credentials()
# Keychain token should be returned, not JSON file token
assert creds is not None
assert creds["accessToken"] == "keychain-token"
assert creds["source"] == "macos_keychain"
def test_falls_back_to_json_when_keychain_returns_none(self, tmp_path, monkeypatch):
"""When Keychain has no entry, JSON file is used as fallback."""
json_cred_file = tmp_path / ".claude" / ".credentials.json"
json_cred_file.parent.mkdir(parents=True)
json_cred_file.write_text(json.dumps({
"claudeAiOauth": {
"accessToken": "json-fallback-token",
"refreshToken": "json-refresh",
"expiresAt": 9999999999999,
}
}))
monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run") as mock_run:
# Simulate Keychain entry not found
mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="")
creds = read_claude_code_credentials()
assert creds is not None
assert creds["accessToken"] == "json-fallback-token"
assert creds["source"] == "claude_code_credentials_file"
def test_returns_none_when_neither_keychain_nor_json_has_creds(self, tmp_path, monkeypatch):
"""No credentials anywhere — must return None cleanly."""
monkeypatch.setattr("agent.anthropic_adapter.Path.home", lambda: tmp_path)
with patch("agent.anthropic_adapter.platform.system", return_value="Darwin"), \
patch("agent.anthropic_adapter.subprocess.run") as mock_run:
mock_run.return_value = MagicMock(returncode=1, stdout="", stderr="")
creds = read_claude_code_credentials()
assert creds is None

View file

@ -19,6 +19,7 @@ from agent.auxiliary_client import (
_read_codex_access_token,
_get_provider_chain,
_is_payment_error,
_normalize_aux_provider,
_try_payment_fallback,
_resolve_auto,
)
@ -54,6 +55,17 @@ def codex_auth_dir(tmp_path, monkeypatch):
return codex_dir
class TestNormalizeAuxProvider:
def test_maps_github_copilot_aliases(self):
assert _normalize_aux_provider("github") == "copilot"
assert _normalize_aux_provider("github-copilot") == "copilot"
assert _normalize_aux_provider("github-models") == "copilot"
def test_maps_github_copilot_acp_aliases(self):
assert _normalize_aux_provider("github-copilot-acp") == "copilot-acp"
assert _normalize_aux_provider("copilot-acp-agent") == "copilot-acp"
class TestReadCodexAccessToken:
def test_valid_auth_store(self, tmp_path, monkeypatch):
hermes_home = tmp_path / "hermes"
@ -1203,3 +1215,201 @@ class TestAnthropicCompatImageConversion:
}]
result = _convert_openai_images_to_anthropic(messages)
assert result[0]["content"][0]["source"]["media_type"] == "image/jpeg"
class _AuxAuth401(Exception):
status_code = 401
def __init__(self, message="Provided authentication token is expired"):
super().__init__(message)
class _DummyResponse:
def __init__(self, text="ok"):
self.choices = [MagicMock(message=MagicMock(content=text))]
class _FailingThenSuccessCompletions:
def __init__(self):
self.calls = 0
def create(self, **kwargs):
self.calls += 1
if self.calls == 1:
raise _AuxAuth401()
return _DummyResponse("sync-ok")
class _AsyncFailingThenSuccessCompletions:
def __init__(self):
self.calls = 0
async def create(self, **kwargs):
self.calls += 1
if self.calls == 1:
raise _AuxAuth401()
return _DummyResponse("async-ok")
class TestAuxiliaryAuthRefreshRetry:
def test_call_llm_refreshes_codex_on_401_for_vision(self):
failing_client = MagicMock()
failing_client.base_url = "https://chatgpt.com/backend-api/codex"
failing_client.chat.completions = _FailingThenSuccessCompletions()
fresh_client = MagicMock()
fresh_client.base_url = "https://chatgpt.com/backend-api/codex"
fresh_client.chat.completions.create.return_value = _DummyResponse("fresh-sync")
with (
patch(
"agent.auxiliary_client.resolve_vision_provider_client",
side_effect=[("openai-codex", failing_client, "gpt-5.2-codex"), ("openai-codex", fresh_client, "gpt-5.2-codex")],
),
patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
):
resp = call_llm(
task="vision",
provider="openai-codex",
model="gpt-5.2-codex",
messages=[{"role": "user", "content": "hi"}],
)
assert resp.choices[0].message.content == "fresh-sync"
mock_refresh.assert_called_once_with("openai-codex")
def test_call_llm_refreshes_codex_on_401_for_non_vision(self):
stale_client = MagicMock()
stale_client.base_url = "https://chatgpt.com/backend-api/codex"
stale_client.chat.completions.create.side_effect = _AuxAuth401("stale codex token")
fresh_client = MagicMock()
fresh_client.base_url = "https://chatgpt.com/backend-api/codex"
fresh_client.chat.completions.create.return_value = _DummyResponse("fresh-non-vision")
with (
patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("openai-codex", "gpt-5.2-codex", None, None, None)),
patch("agent.auxiliary_client._get_cached_client", side_effect=[(stale_client, "gpt-5.2-codex"), (fresh_client, "gpt-5.2-codex")]),
patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
):
resp = call_llm(
task="compression",
provider="openai-codex",
model="gpt-5.2-codex",
messages=[{"role": "user", "content": "hi"}],
)
assert resp.choices[0].message.content == "fresh-non-vision"
mock_refresh.assert_called_once_with("openai-codex")
assert stale_client.chat.completions.create.call_count == 1
assert fresh_client.chat.completions.create.call_count == 1
def test_call_llm_refreshes_anthropic_on_401_for_non_vision(self):
stale_client = MagicMock()
stale_client.base_url = "https://api.anthropic.com"
stale_client.chat.completions.create.side_effect = _AuxAuth401("anthropic token expired")
fresh_client = MagicMock()
fresh_client.base_url = "https://api.anthropic.com"
fresh_client.chat.completions.create.return_value = _DummyResponse("fresh-anthropic")
with (
patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("anthropic", "claude-haiku-4-5-20251001", None, None, None)),
patch("agent.auxiliary_client._get_cached_client", side_effect=[(stale_client, "claude-haiku-4-5-20251001"), (fresh_client, "claude-haiku-4-5-20251001")]),
patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
):
resp = call_llm(
task="compression",
provider="anthropic",
model="claude-haiku-4-5-20251001",
messages=[{"role": "user", "content": "hi"}],
)
assert resp.choices[0].message.content == "fresh-anthropic"
mock_refresh.assert_called_once_with("anthropic")
assert stale_client.chat.completions.create.call_count == 1
assert fresh_client.chat.completions.create.call_count == 1
@pytest.mark.asyncio
async def test_async_call_llm_refreshes_codex_on_401_for_vision(self):
failing_client = MagicMock()
failing_client.base_url = "https://chatgpt.com/backend-api/codex"
failing_client.chat.completions = _AsyncFailingThenSuccessCompletions()
fresh_client = MagicMock()
fresh_client.base_url = "https://chatgpt.com/backend-api/codex"
fresh_client.chat.completions.create = AsyncMock(return_value=_DummyResponse("fresh-async"))
with (
patch(
"agent.auxiliary_client.resolve_vision_provider_client",
side_effect=[("openai-codex", failing_client, "gpt-5.2-codex"), ("openai-codex", fresh_client, "gpt-5.2-codex")],
),
patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
):
resp = await async_call_llm(
task="vision",
provider="openai-codex",
model="gpt-5.2-codex",
messages=[{"role": "user", "content": "hi"}],
)
assert resp.choices[0].message.content == "fresh-async"
mock_refresh.assert_called_once_with("openai-codex")
def test_refresh_provider_credentials_force_refreshes_anthropic_oauth_and_evicts_cache(self, monkeypatch):
stale_client = MagicMock()
cache_key = ("anthropic", False, None, None, None)
monkeypatch.setenv("ANTHROPIC_TOKEN", "")
monkeypatch.setenv("CLAUDE_CODE_OAUTH_TOKEN", "")
monkeypatch.setenv("ANTHROPIC_API_KEY", "")
with (
patch("agent.auxiliary_client._client_cache", {cache_key: (stale_client, "claude-haiku-4-5-20251001", None)}),
patch("agent.anthropic_adapter.read_claude_code_credentials", return_value={
"accessToken": "expired-token",
"refreshToken": "refresh-token",
"expiresAt": 0,
}),
patch("agent.anthropic_adapter.refresh_anthropic_oauth_pure", return_value={
"access_token": "fresh-token",
"refresh_token": "refresh-token-2",
"expires_at_ms": 9999999999999,
}) as mock_refresh_oauth,
patch("agent.anthropic_adapter._write_claude_code_credentials") as mock_write,
):
from agent.auxiliary_client import _refresh_provider_credentials
assert _refresh_provider_credentials("anthropic") is True
mock_refresh_oauth.assert_called_once_with("refresh-token", use_json=False)
mock_write.assert_called_once_with("fresh-token", "refresh-token-2", 9999999999999)
stale_client.close.assert_called_once()
@pytest.mark.asyncio
async def test_async_call_llm_refreshes_anthropic_on_401_for_non_vision(self):
stale_client = MagicMock()
stale_client.base_url = "https://api.anthropic.com"
stale_client.chat.completions.create = AsyncMock(side_effect=_AuxAuth401("anthropic token expired"))
fresh_client = MagicMock()
fresh_client.base_url = "https://api.anthropic.com"
fresh_client.chat.completions.create = AsyncMock(return_value=_DummyResponse("fresh-async-anthropic"))
with (
patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("anthropic", "claude-haiku-4-5-20251001", None, None, None)),
patch("agent.auxiliary_client._get_cached_client", side_effect=[(stale_client, "claude-haiku-4-5-20251001"), (fresh_client, "claude-haiku-4-5-20251001")]),
patch("agent.auxiliary_client._refresh_provider_credentials", return_value=True) as mock_refresh,
):
resp = await async_call_llm(
task="compression",
provider="anthropic",
model="claude-haiku-4-5-20251001",
messages=[{"role": "user", "content": "hi"}],
)
assert resp.choices[0].message.content == "fresh-async-anthropic"
mock_refresh.assert_called_once_with("anthropic")
assert stale_client.chat.completions.create.await_count == 1
assert fresh_client.chat.completions.create.await_count == 1

View file

@ -100,6 +100,26 @@ class TestResolveProviderClientMainAlias:
assert client is not None
assert "beans.local" in str(client.base_url)
def test_main_resolves_github_copilot_alias(self, tmp_path):
_write_config(tmp_path, {
"model": {"default": "gpt-5.4", "provider": "github-copilot"},
})
with (
patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={
"api_key": "ghu_test_token",
"base_url": "https://api.githubcopilot.com",
}),
patch("agent.auxiliary_client.OpenAI") as mock_openai,
):
mock_openai.return_value = MagicMock()
from agent.auxiliary_client import resolve_provider_client
client, model = resolve_provider_client("main", "gpt-5.4")
assert client is not None
assert model == "gpt-5.4"
assert mock_openai.called
class TestResolveProviderClientNamedCustom:
"""resolve_provider_client should resolve named custom providers directly."""
@ -252,3 +272,158 @@ class TestVisionPathApiMode:
mock_gcc.assert_called_once()
_, kwargs = mock_gcc.call_args
assert kwargs.get("api_mode") == "chat_completions"
class TestProvidersDictApiModeAnthropicMessages:
"""Regression guard for #15033.
Named providers declared under the ``providers:`` dict with
``api_mode: anthropic_messages`` must route auxiliary calls through
the Anthropic Messages API (via AnthropicAuxiliaryClient), not
through an OpenAI chat-completions client.
The bug had two halves: the providers-dict branch of
``_get_named_custom_provider`` dropped the ``api_mode`` field, and
``resolve_provider_client``'s named-custom branch never read it.
"""
def test_providers_dict_propagates_api_mode(self, tmp_path, monkeypatch):
monkeypatch.setenv("MYRELAY_API_KEY", "sk-test")
_write_config(tmp_path, {
"providers": {
"myrelay": {
"name": "myrelay",
"base_url": "https://example-relay.test/anthropic",
"key_env": "MYRELAY_API_KEY",
"api_mode": "anthropic_messages",
"default_model": "claude-opus-4-7",
},
},
})
from hermes_cli.runtime_provider import _get_named_custom_provider
entry = _get_named_custom_provider("myrelay")
assert entry is not None
assert entry.get("api_mode") == "anthropic_messages"
assert entry.get("base_url") == "https://example-relay.test/anthropic"
assert entry.get("api_key") == "sk-test"
def test_providers_dict_invalid_api_mode_is_dropped(self, tmp_path):
_write_config(tmp_path, {
"providers": {
"weird": {
"name": "weird",
"base_url": "https://example.test",
"api_mode": "bogus_nonsense",
"default_model": "x",
},
},
})
from hermes_cli.runtime_provider import _get_named_custom_provider
entry = _get_named_custom_provider("weird")
assert entry is not None
assert "api_mode" not in entry
def test_providers_dict_without_api_mode_is_unchanged(self, tmp_path):
_write_config(tmp_path, {
"providers": {
"localchat": {
"name": "localchat",
"base_url": "http://127.0.0.1:1234/v1",
"api_key": "local-key",
"default_model": "llama-3",
},
},
})
from hermes_cli.runtime_provider import _get_named_custom_provider
entry = _get_named_custom_provider("localchat")
assert entry is not None
assert "api_mode" not in entry
def test_resolve_provider_client_returns_anthropic_client(self, tmp_path, monkeypatch):
"""Named custom provider with api_mode=anthropic_messages must
route through AnthropicAuxiliaryClient."""
monkeypatch.setenv("MYRELAY_API_KEY", "sk-test")
_write_config(tmp_path, {
"providers": {
"myrelay": {
"name": "myrelay",
"base_url": "https://example-relay.test/anthropic",
"key_env": "MYRELAY_API_KEY",
"api_mode": "anthropic_messages",
"default_model": "claude-opus-4-7",
},
},
})
from agent.auxiliary_client import (
resolve_provider_client,
AnthropicAuxiliaryClient,
AsyncAnthropicAuxiliaryClient,
)
sync_client, sync_model = resolve_provider_client("myrelay", async_mode=False)
assert isinstance(sync_client, AnthropicAuxiliaryClient), (
f"expected AnthropicAuxiliaryClient, got {type(sync_client).__name__}"
)
assert sync_model == "claude-opus-4-7"
async_client, async_model = resolve_provider_client("myrelay", async_mode=True)
assert isinstance(async_client, AsyncAnthropicAuxiliaryClient), (
f"expected AsyncAnthropicAuxiliaryClient, got {type(async_client).__name__}"
)
assert async_model == "claude-opus-4-7"
def test_aux_task_override_routes_named_provider_to_anthropic(self, tmp_path, monkeypatch):
"""The full chain: auxiliary.<task>.provider: myrelay with
api_mode anthropic_messages must produce an Anthropic client."""
monkeypatch.setenv("MYRELAY_API_KEY", "sk-test")
_write_config(tmp_path, {
"providers": {
"myrelay": {
"name": "myrelay",
"base_url": "https://example-relay.test/anthropic",
"key_env": "MYRELAY_API_KEY",
"api_mode": "anthropic_messages",
"default_model": "claude-opus-4-7",
},
},
"auxiliary": {
"flush_memories": {
"provider": "myrelay",
"model": "claude-sonnet-4.6",
},
},
"model": {"provider": "openrouter", "default": "anthropic/claude-sonnet-4.6"},
})
from agent.auxiliary_client import (
get_async_text_auxiliary_client,
get_text_auxiliary_client,
AnthropicAuxiliaryClient,
AsyncAnthropicAuxiliaryClient,
)
async_client, async_model = get_async_text_auxiliary_client("flush_memories")
assert isinstance(async_client, AsyncAnthropicAuxiliaryClient)
assert async_model == "claude-sonnet-4.6"
sync_client, sync_model = get_text_auxiliary_client("flush_memories")
assert isinstance(sync_client, AnthropicAuxiliaryClient)
assert sync_model == "claude-sonnet-4.6"
def test_provider_without_api_mode_still_uses_openai(self, tmp_path):
"""Named providers that don't declare api_mode should still go
through the plain OpenAI-wire path (no regression)."""
_write_config(tmp_path, {
"providers": {
"localchat": {
"name": "localchat",
"base_url": "http://127.0.0.1:1234/v1",
"api_key": "local-key",
"default_model": "llama-3",
},
},
})
from agent.auxiliary_client import resolve_provider_client
from openai import OpenAI, AsyncOpenAI
sync_client, _ = resolve_provider_client("localchat", async_mode=False)
# sync returns the raw OpenAI client
assert isinstance(sync_client, OpenAI)
async_client, _ = resolve_provider_client("localchat", async_mode=True)
assert isinstance(async_client, AsyncOpenAI)

View file

@ -1230,3 +1230,210 @@ class TestEmptyTextBlockFix:
from agent.bedrock_adapter import _convert_content_to_converse
blocks = _convert_content_to_converse("Hello")
assert blocks[0]["text"] == "Hello"
# ---------------------------------------------------------------------------
# Stale-connection detection and per-region client invalidation
# ---------------------------------------------------------------------------
class TestInvalidateRuntimeClient:
"""Per-region eviction used to discard dead/stale bedrock-runtime clients."""
def test_evicts_only_the_target_region(self):
from agent.bedrock_adapter import (
_bedrock_runtime_client_cache,
invalidate_runtime_client,
reset_client_cache,
)
reset_client_cache()
_bedrock_runtime_client_cache["us-east-1"] = "dead-client"
_bedrock_runtime_client_cache["us-west-2"] = "live-client"
evicted = invalidate_runtime_client("us-east-1")
assert evicted is True
assert "us-east-1" not in _bedrock_runtime_client_cache
assert _bedrock_runtime_client_cache["us-west-2"] == "live-client"
def test_returns_false_when_region_not_cached(self):
from agent.bedrock_adapter import invalidate_runtime_client, reset_client_cache
reset_client_cache()
assert invalidate_runtime_client("eu-west-1") is False
class TestIsStaleConnectionError:
"""Classifier that decides whether an exception warrants client eviction."""
def test_detects_botocore_connection_closed_error(self):
from agent.bedrock_adapter import is_stale_connection_error
from botocore.exceptions import ConnectionClosedError
exc = ConnectionClosedError(endpoint_url="https://bedrock.example")
assert is_stale_connection_error(exc) is True
def test_detects_botocore_endpoint_connection_error(self):
from agent.bedrock_adapter import is_stale_connection_error
from botocore.exceptions import EndpointConnectionError
exc = EndpointConnectionError(endpoint_url="https://bedrock.example")
assert is_stale_connection_error(exc) is True
def test_detects_botocore_read_timeout(self):
from agent.bedrock_adapter import is_stale_connection_error
from botocore.exceptions import ReadTimeoutError
exc = ReadTimeoutError(endpoint_url="https://bedrock.example")
assert is_stale_connection_error(exc) is True
def test_detects_urllib3_protocol_error(self):
from agent.bedrock_adapter import is_stale_connection_error
from urllib3.exceptions import ProtocolError
exc = ProtocolError("Connection broken")
assert is_stale_connection_error(exc) is True
def test_detects_library_internal_assertion_error(self):
"""A bare AssertionError raised from inside urllib3/botocore signals
a corrupted connection-pool invariant and should trigger eviction."""
from agent.bedrock_adapter import is_stale_connection_error
# Fabricate an AssertionError whose traceback's last frame belongs
# to a module named "urllib3.connectionpool". We do this by exec'ing
# a tiny `assert False` under a fake globals dict — the resulting
# frame's ``f_globals["__name__"]`` is what the classifier inspects.
fake_globals = {"__name__": "urllib3.connectionpool"}
try:
exec("def _boom():\n assert False\n_boom()", fake_globals)
except AssertionError as exc:
assert is_stale_connection_error(exc) is True
else:
pytest.fail("AssertionError not raised")
def test_detects_botocore_internal_assertion_error(self):
"""Same as above but for a frame inside the botocore namespace."""
from agent.bedrock_adapter import is_stale_connection_error
fake_globals = {"__name__": "botocore.httpsession"}
try:
exec("def _boom():\n assert False\n_boom()", fake_globals)
except AssertionError as exc:
assert is_stale_connection_error(exc) is True
else:
pytest.fail("AssertionError not raised")
def test_ignores_application_assertion_error(self):
"""AssertionError from application code (not urllib3/botocore) should
NOT be classified as stale those are real test/code bugs."""
from agent.bedrock_adapter import is_stale_connection_error
try:
assert False, "test-only" # noqa: B011
except AssertionError as exc:
assert is_stale_connection_error(exc) is False
def test_ignores_unrelated_exceptions(self):
from agent.bedrock_adapter import is_stale_connection_error
assert is_stale_connection_error(ValueError("bad input")) is False
assert is_stale_connection_error(KeyError("missing")) is False
class TestCallConverseInvalidatesOnStaleError:
"""call_converse / call_converse_stream evict the cached client when the
boto3 call raises a stale-connection error so the next invocation
reconnects instead of reusing the dead socket."""
def test_converse_evicts_client_on_stale_error(self):
from agent.bedrock_adapter import (
_bedrock_runtime_client_cache,
call_converse,
reset_client_cache,
)
from botocore.exceptions import ConnectionClosedError
reset_client_cache()
dead_client = MagicMock()
dead_client.converse.side_effect = ConnectionClosedError(
endpoint_url="https://bedrock.example",
)
_bedrock_runtime_client_cache["us-east-1"] = dead_client
with pytest.raises(ConnectionClosedError):
call_converse(
region="us-east-1",
model="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "hi"}],
)
assert "us-east-1" not in _bedrock_runtime_client_cache, (
"stale client should have been evicted so the retry reconnects"
)
def test_converse_stream_evicts_client_on_stale_error(self):
from agent.bedrock_adapter import (
_bedrock_runtime_client_cache,
call_converse_stream,
reset_client_cache,
)
from botocore.exceptions import ConnectionClosedError
reset_client_cache()
dead_client = MagicMock()
dead_client.converse_stream.side_effect = ConnectionClosedError(
endpoint_url="https://bedrock.example",
)
_bedrock_runtime_client_cache["us-east-1"] = dead_client
with pytest.raises(ConnectionClosedError):
call_converse_stream(
region="us-east-1",
model="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "hi"}],
)
assert "us-east-1" not in _bedrock_runtime_client_cache
def test_converse_does_not_evict_on_non_stale_error(self):
"""Non-stale errors (e.g. ValidationException) leave the client cache alone."""
from agent.bedrock_adapter import (
_bedrock_runtime_client_cache,
call_converse,
reset_client_cache,
)
from botocore.exceptions import ClientError
reset_client_cache()
live_client = MagicMock()
live_client.converse.side_effect = ClientError(
error_response={"Error": {"Code": "ValidationException", "Message": "bad"}},
operation_name="Converse",
)
_bedrock_runtime_client_cache["us-east-1"] = live_client
with pytest.raises(ClientError):
call_converse(
region="us-east-1",
model="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "hi"}],
)
assert _bedrock_runtime_client_cache.get("us-east-1") is live_client, (
"validation errors do not indicate a dead connection — keep the client"
)
def test_converse_leaves_successful_client_in_cache(self):
from agent.bedrock_adapter import (
_bedrock_runtime_client_cache,
call_converse,
reset_client_cache,
)
reset_client_cache()
live_client = MagicMock()
live_client.converse.return_value = {
"output": {"message": {"role": "assistant", "content": [{"text": "hi"}]}},
"stopReason": "end_turn",
"usage": {"inputTokens": 1, "outputTokens": 1, "totalTokens": 2},
}
_bedrock_runtime_client_cache["us-east-1"] = live_client
call_converse(
region="us-east-1",
model="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": "hi"}],
)
assert _bedrock_runtime_client_cache.get("us-east-1") is live_client

View file

@ -376,17 +376,15 @@ class TestBedrockModelNameNormalization:
"apac.anthropic.claude-haiku-4-5", preserve_dots=True
) == "apac.anthropic.claude-haiku-4-5"
def test_preserve_false_mangles_as_documented(self):
"""Canary: with ``preserve_dots=False`` the function still
produces the broken all-hyphen form this is the shape that
Bedrock rejected and that the fix avoids. Keeping this test
locks in the existing behaviour of ``normalize_model_name`` so a
future refactor doesn't accidentally decouple the knob from its
effect."""
def test_bedrock_prefix_preserved_without_preserve_dots(self):
"""Bedrock inference profile IDs are auto-detected by prefix and
always returned unmangled -- ``preserve_dots`` is irrelevant for
these IDs because the dots are namespace separators, not version
separators. Regression for #12295."""
from agent.anthropic_adapter import normalize_model_name
assert normalize_model_name(
"global.anthropic.claude-opus-4-7", preserve_dots=False
) == "global-anthropic-claude-opus-4-7"
) == "global.anthropic.claude-opus-4-7"
def test_bare_foundation_model_id_preserved(self):
"""Non-inference-profile Bedrock IDs
@ -422,12 +420,11 @@ class TestBedrockBuildAnthropicKwargsEndToEnd:
f"{kwargs['model']!r}"
)
def test_bedrock_model_mangled_without_preserve_dots(self):
"""Inverse canary: without the flag, ``build_anthropic_kwargs``
still produces the broken form so the fix in
``_anthropic_preserve_dots`` is the load-bearing piece that
wires ``preserve_dots=True`` through to this builder for the
Bedrock case."""
def test_bedrock_model_preserved_without_preserve_dots(self):
"""Bedrock inference profile IDs survive ``build_anthropic_kwargs``
even without ``preserve_dots=True`` -- the prefix auto-detection
in ``normalize_model_name`` is the load-bearing piece.
Regression for #12295."""
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="global.anthropic.claude-opus-4-7",
@ -437,4 +434,157 @@ class TestBedrockBuildAnthropicKwargsEndToEnd:
reasoning_config=None,
preserve_dots=False,
)
assert kwargs["model"] == "global-anthropic-claude-opus-4-7"
assert kwargs["model"] == "global.anthropic.claude-opus-4-7"
class TestBedrockModelIdDetection:
"""Tests for ``_is_bedrock_model_id`` and the auto-detection that
makes ``normalize_model_name`` preserve dots for Bedrock IDs
regardless of ``preserve_dots``. Regression for #12295."""
def test_bare_bedrock_id_detected(self):
from agent.anthropic_adapter import _is_bedrock_model_id
assert _is_bedrock_model_id("anthropic.claude-opus-4-7") is True
def test_regional_us_prefix_detected(self):
from agent.anthropic_adapter import _is_bedrock_model_id
assert _is_bedrock_model_id("us.anthropic.claude-sonnet-4-5-v1:0") is True
def test_regional_global_prefix_detected(self):
from agent.anthropic_adapter import _is_bedrock_model_id
assert _is_bedrock_model_id("global.anthropic.claude-opus-4-7") is True
def test_regional_eu_prefix_detected(self):
from agent.anthropic_adapter import _is_bedrock_model_id
assert _is_bedrock_model_id("eu.anthropic.claude-sonnet-4-6") is True
def test_openrouter_format_not_detected(self):
from agent.anthropic_adapter import _is_bedrock_model_id
assert _is_bedrock_model_id("claude-opus-4.6") is False
def test_bare_claude_not_detected(self):
from agent.anthropic_adapter import _is_bedrock_model_id
assert _is_bedrock_model_id("claude-opus-4-7") is False
def test_bare_bedrock_id_preserved_without_flag(self):
"""The primary bug from #12295: ``anthropic.claude-opus-4-7``
sent to bedrock-mantle via auxiliary clients that don't pass
``preserve_dots=True``."""
from agent.anthropic_adapter import normalize_model_name
assert normalize_model_name(
"anthropic.claude-opus-4-7", preserve_dots=False
) == "anthropic.claude-opus-4-7"
def test_openrouter_dots_still_converted(self):
"""Non-Bedrock dotted model names must still be converted."""
from agent.anthropic_adapter import normalize_model_name
assert normalize_model_name("claude-opus-4.6") == "claude-opus-4-6"
def test_bare_bedrock_id_survives_build_kwargs(self):
"""End-to-end: bare Bedrock ID through ``build_anthropic_kwargs``
without ``preserve_dots=True`` -- the auxiliary client path."""
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="anthropic.claude-opus-4-7",
messages=[{"role": "user", "content": "hi"}],
tools=None,
max_tokens=1024,
reasoning_config=None,
preserve_dots=False,
)
assert kwargs["model"] == "anthropic.claude-opus-4-7"
# ---------------------------------------------------------------------------
# auxiliary_client Bedrock resolution — fix for #13919
# ---------------------------------------------------------------------------
# Before the fix, resolve_provider_client("bedrock", ...) fell through to the
# "unhandled auth_type" warning and returned (None, None), breaking all
# auxiliary tasks (compression, memory, summarization) for Bedrock users.
class TestAuxiliaryClientBedrockResolution:
"""Verify resolve_provider_client handles Bedrock's aws_sdk auth type."""
def test_bedrock_returns_client_with_credentials(self, monkeypatch):
"""With valid AWS credentials, Bedrock should return a usable client."""
monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
monkeypatch.setenv("AWS_REGION", "us-west-2")
mock_anthropic_bedrock = MagicMock()
with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
return_value=mock_anthropic_bedrock):
from agent.auxiliary_client import resolve_provider_client, AnthropicAuxiliaryClient
client, model = resolve_provider_client("bedrock", None)
assert client is not None, (
"resolve_provider_client('bedrock') returned None — "
"aws_sdk auth type is not handled"
)
assert isinstance(client, AnthropicAuxiliaryClient)
assert model is not None
assert client.api_key == "aws-sdk"
assert "us-west-2" in client.base_url
def test_bedrock_returns_none_without_credentials(self, monkeypatch):
"""Without AWS credentials, Bedrock should return (None, None) gracefully."""
with patch("agent.bedrock_adapter.has_aws_credentials", return_value=False):
from agent.auxiliary_client import resolve_provider_client
client, model = resolve_provider_client("bedrock", None)
assert client is None
assert model is None
def test_bedrock_uses_configured_region(self, monkeypatch):
"""Bedrock client base_url should reflect AWS_REGION."""
monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
monkeypatch.setenv("AWS_REGION", "eu-central-1")
with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
return_value=MagicMock()):
from agent.auxiliary_client import resolve_provider_client
client, _ = resolve_provider_client("bedrock", None)
assert client is not None
assert "eu-central-1" in client.base_url
def test_bedrock_respects_explicit_model(self, monkeypatch):
"""When caller passes an explicit model, it should be used."""
monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
return_value=MagicMock()):
from agent.auxiliary_client import resolve_provider_client
_, model = resolve_provider_client(
"bedrock", "us.anthropic.claude-sonnet-4-5-20250929-v1:0"
)
assert "claude-sonnet" in model
def test_bedrock_async_mode(self, monkeypatch):
"""Async mode should return an AsyncAnthropicAuxiliaryClient."""
monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
return_value=MagicMock()):
from agent.auxiliary_client import resolve_provider_client, AsyncAnthropicAuxiliaryClient
client, model = resolve_provider_client("bedrock", None, async_mode=True)
assert client is not None
assert isinstance(client, AsyncAnthropicAuxiliaryClient)
def test_bedrock_default_model_is_haiku(self, monkeypatch):
"""Default auxiliary model for Bedrock should be Haiku (fast, cheap)."""
monkeypatch.setenv("AWS_ACCESS_KEY_ID", "AKIAIOSFODNN7EXAMPLE")
monkeypatch.setenv("AWS_SECRET_ACCESS_KEY", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
with patch("agent.anthropic_adapter.build_anthropic_bedrock_client",
return_value=MagicMock()):
from agent.auxiliary_client import resolve_provider_client
_, model = resolve_provider_client("bedrock", None)
assert "haiku" in model.lower()

View file

@ -144,3 +144,60 @@ class CopilotACPClientSafetyTests(unittest.TestCase):
if __name__ == "__main__":
unittest.main()
# ── HOME env propagation tests (from PR #11285) ─────────────────────
from unittest.mock import patch as _patch
import pytest
def _make_home_client(tmp_path):
return CopilotACPClient(
api_key="copilot-acp",
base_url="acp://copilot",
acp_command="copilot",
acp_args=["--acp", "--stdio"],
acp_cwd=str(tmp_path),
)
def _fake_popen_capture(captured):
def _fake(cmd, **kwargs):
captured["cmd"] = cmd
captured["kwargs"] = kwargs
raise FileNotFoundError("copilot not found")
return _fake
def test_run_prompt_prefers_profile_home_when_available(monkeypatch, tmp_path):
hermes_home = tmp_path / "hermes"
profile_home = hermes_home / "home"
profile_home.mkdir(parents=True)
monkeypatch.delenv("HOME", raising=False)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
captured = {}
client = _make_home_client(tmp_path)
with _patch("agent.copilot_acp_client.subprocess.Popen", side_effect=_fake_popen_capture(captured)):
with pytest.raises(RuntimeError, match="Could not start Copilot ACP command"):
client._run_prompt("hello", timeout_seconds=1)
assert captured["kwargs"]["env"]["HOME"] == str(profile_home)
def test_run_prompt_passes_home_when_parent_env_is_clean(monkeypatch, tmp_path):
monkeypatch.delenv("HOME", raising=False)
monkeypatch.delenv("HERMES_HOME", raising=False)
captured = {}
client = _make_home_client(tmp_path)
with _patch("agent.copilot_acp_client.subprocess.Popen", side_effect=_fake_popen_capture(captured)):
with pytest.raises(RuntimeError, match="Could not start Copilot ACP command"):
client._run_prompt("hello", timeout_seconds=1)
assert "env" in captured["kwargs"]
assert captured["kwargs"]["env"]["HOME"]

View file

@ -1102,3 +1102,271 @@ def test_load_pool_does_not_seed_qwen_oauth_when_no_token(tmp_path, monkeypatch)
assert not pool.has_credentials()
assert pool.entries() == []
def test_nous_seed_from_singletons_preserves_obtained_at_timestamps(tmp_path, monkeypatch):
"""Regression test for #15099 secondary issue.
When ``_seed_from_singletons`` materialises a device_code pool entry from
the ``providers.nous`` singleton, it must carry the mint/refresh
timestamps (``obtained_at``, ``agent_key_obtained_at``, ``expires_in``,
etc.) into the pool entry. Without them, freshness-sensitive consumers
(self-heal hooks, pool pruning by age) treat just-minted credentials as
older than they actually are and evict them.
"""
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
_write_auth_store(
tmp_path,
{
"version": 1,
"providers": {
"nous": {
"access_token": "at_XXXXXXXX",
"refresh_token": "rt_YYYYYYYY",
"client_id": "hermes-cli",
"portal_base_url": "https://portal.nousresearch.com",
"inference_base_url": "https://inference.nousresearch.com/v1",
"token_type": "Bearer",
"scope": "openid profile",
"obtained_at": "2026-04-24T10:00:00+00:00",
"expires_at": "2026-04-24T11:00:00+00:00",
"expires_in": 3600,
"agent_key": "sk-nous-AAAA",
"agent_key_id": "ak_123",
"agent_key_expires_at": "2026-04-25T10:00:00+00:00",
"agent_key_expires_in": 86400,
"agent_key_reused": False,
"agent_key_obtained_at": "2026-04-24T10:00:05+00:00",
"tls": {"insecure": False, "ca_bundle": None},
},
},
},
)
from agent.credential_pool import load_pool
pool = load_pool("nous")
entries = pool.entries()
device_entries = [e for e in entries if e.source == "device_code"]
assert len(device_entries) == 1, f"expected single device_code entry; got {len(device_entries)}"
e = device_entries[0]
# Direct dataclass fields — must survive the singleton → pool copy.
assert e.access_token == "at_XXXXXXXX"
assert e.refresh_token == "rt_YYYYYYYY"
assert e.expires_at == "2026-04-24T11:00:00+00:00"
assert e.agent_key == "sk-nous-AAAA"
assert e.agent_key_expires_at == "2026-04-25T10:00:00+00:00"
# Extra fields — this is what regressed. These must be carried through
# via ``extra`` dict or __getattr__, NOT silently dropped.
assert e.obtained_at == "2026-04-24T10:00:00+00:00", (
f"obtained_at was dropped during seed; got {e.obtained_at!r}. This breaks "
f"downstream pool-freshness consumers (#15099)."
)
assert e.agent_key_obtained_at == "2026-04-24T10:00:05+00:00"
assert e.expires_in == 3600
assert e.agent_key_id == "ak_123"
assert e.agent_key_expires_in == 86400
assert e.agent_key_reused is False
class TestLeastUsedStrategy:
"""Regression: least_used strategy must increment request_count on select."""
def test_request_count_increments(self):
"""Each select() call should increment the chosen entry's request_count."""
from unittest.mock import patch as _patch
from agent.credential_pool import CredentialPool, PooledCredential, STRATEGY_LEAST_USED
entries = [
PooledCredential(provider="test", id="a", label="a", auth_type="api_key",
source="a", access_token="tok-a", priority=0, request_count=0),
PooledCredential(provider="test", id="b", label="b", auth_type="api_key",
source="b", access_token="tok-b", priority=1, request_count=0),
]
with _patch("agent.credential_pool.get_pool_strategy", return_value=STRATEGY_LEAST_USED):
pool = CredentialPool("test", entries)
# First select should pick entry with lowest count (both 0 → first)
e1 = pool.select()
assert e1 is not None
count_after_first = e1.request_count
assert count_after_first == 1, f"Expected 1 after first select, got {count_after_first}"
# Second select should pick the OTHER entry (now has lower count)
e2 = pool.select()
assert e2 is not None
assert e2.id != e1.id or e2.request_count == 2, (
"least_used should alternate or increment"
)
# ── PR #10160 salvage: Nous OAuth cross-process sync tests ─────────────────
def test_sync_nous_entry_from_auth_store_adopts_newer_tokens(tmp_path, monkeypatch):
"""When auth.json has a newer refresh token, the pool entry should adopt it."""
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
_write_auth_store(
tmp_path,
{
"version": 1,
"active_provider": "nous",
"providers": {
"nous": {
"portal_base_url": "https://portal.example.com",
"inference_base_url": "https://inference.example.com/v1",
"client_id": "hermes-cli",
"token_type": "Bearer",
"scope": "inference:mint_agent_key",
"access_token": "access-OLD",
"refresh_token": "refresh-OLD",
"expires_at": "2026-03-24T12:00:00+00:00",
"agent_key": "agent-key-OLD",
"agent_key_expires_at": "2026-03-24T13:30:00+00:00",
}
},
},
)
from agent.credential_pool import load_pool
pool = load_pool("nous")
entry = pool.select()
assert entry is not None
assert entry.refresh_token == "refresh-OLD"
# Simulate another process refreshing the token in auth.json
_write_auth_store(
tmp_path,
{
"version": 1,
"active_provider": "nous",
"providers": {
"nous": {
"portal_base_url": "https://portal.example.com",
"inference_base_url": "https://inference.example.com/v1",
"client_id": "hermes-cli",
"token_type": "Bearer",
"scope": "inference:mint_agent_key",
"access_token": "access-NEW",
"refresh_token": "refresh-NEW",
"expires_at": "2026-03-24T12:30:00+00:00",
"agent_key": "agent-key-NEW",
"agent_key_expires_at": "2026-03-24T14:00:00+00:00",
}
},
},
)
synced = pool._sync_nous_entry_from_auth_store(entry)
assert synced is not entry
assert synced.access_token == "access-NEW"
assert synced.refresh_token == "refresh-NEW"
assert synced.agent_key == "agent-key-NEW"
assert synced.agent_key_expires_at == "2026-03-24T14:00:00+00:00"
def test_sync_nous_entry_noop_when_tokens_match(tmp_path, monkeypatch):
"""When auth.json has the same refresh token, sync should be a no-op."""
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
_write_auth_store(
tmp_path,
{
"version": 1,
"active_provider": "nous",
"providers": {
"nous": {
"portal_base_url": "https://portal.example.com",
"inference_base_url": "https://inference.example.com/v1",
"client_id": "hermes-cli",
"token_type": "Bearer",
"scope": "inference:mint_agent_key",
"access_token": "access-token",
"refresh_token": "refresh-token",
"expires_at": "2026-03-24T12:00:00+00:00",
"agent_key": "agent-key",
"agent_key_expires_at": "2026-03-24T13:30:00+00:00",
}
},
},
)
from agent.credential_pool import load_pool
pool = load_pool("nous")
entry = pool.select()
assert entry is not None
synced = pool._sync_nous_entry_from_auth_store(entry)
assert synced is entry
def test_nous_exhausted_entry_recovers_via_auth_store_sync(tmp_path, monkeypatch):
"""An exhausted Nous entry should recover when auth.json has newer tokens."""
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
from agent.credential_pool import load_pool, STATUS_EXHAUSTED
from dataclasses import replace as dc_replace
_write_auth_store(
tmp_path,
{
"version": 1,
"active_provider": "nous",
"providers": {
"nous": {
"portal_base_url": "https://portal.example.com",
"inference_base_url": "https://inference.example.com/v1",
"client_id": "hermes-cli",
"token_type": "Bearer",
"scope": "inference:mint_agent_key",
"access_token": "access-OLD",
"refresh_token": "refresh-OLD",
"expires_at": "2026-03-24T12:00:00+00:00",
"agent_key": "agent-key",
"agent_key_expires_at": "2026-03-24T13:30:00+00:00",
}
},
},
)
pool = load_pool("nous")
entry = pool.select()
assert entry is not None
# Mark entry as exhausted (simulating a failed refresh)
exhausted = dc_replace(
entry,
last_status=STATUS_EXHAUSTED,
last_status_at=time.time(),
last_error_code=401,
)
pool._replace_entry(entry, exhausted)
pool._persist()
# Simulate another process having successfully refreshed
_write_auth_store(
tmp_path,
{
"version": 1,
"active_provider": "nous",
"providers": {
"nous": {
"portal_base_url": "https://portal.example.com",
"inference_base_url": "https://inference.example.com/v1",
"client_id": "hermes-cli",
"token_type": "Bearer",
"scope": "inference:mint_agent_key",
"access_token": "access-FRESH",
"refresh_token": "refresh-FRESH",
"expires_at": "2026-03-24T12:30:00+00:00",
"agent_key": "agent-key-FRESH",
"agent_key_expires_at": "2026-03-24T14:00:00+00:00",
}
},
},
)
available = pool._available_entries(clear_expired=True)
assert len(available) == 1
assert available[0].refresh_token == "refresh-FRESH"
assert available[0].last_status is None

View file

@ -56,6 +56,7 @@ class TestFailoverReason:
"overloaded", "server_error", "timeout",
"context_overflow", "payload_too_large",
"model_not_found", "format_error",
"provider_policy_blocked",
"thinking_signature", "long_context_tier", "unknown",
}
actual = {r.value for r in FailoverReason}
@ -308,6 +309,59 @@ class TestClassifyApiError:
assert result.retryable is True
assert result.should_fallback is False
# ── Provider policy-block (OpenRouter privacy/guardrail) ──
def test_404_openrouter_policy_blocked(self):
# Real OpenRouter error when the user's account privacy setting
# excludes the only endpoint serving a model (e.g. DeepSeek V4 Pro
# which is hosted only by DeepSeek, and their endpoint may log
# inputs). Must NOT classify as model_not_found — the model
# exists, falling back won't help (same account setting applies),
# and the error body already tells the user where to fix it.
e = MockAPIError(
"No endpoints available matching your guardrail restrictions "
"and data policy. Configure: https://openrouter.ai/settings/privacy",
status_code=404,
)
result = classify_api_error(e)
assert result.reason == FailoverReason.provider_policy_blocked
assert result.retryable is False
assert result.should_fallback is False
def test_400_openrouter_policy_blocked(self):
# Defense-in-depth: if OpenRouter ever returns this as 400 instead
# of 404, still classify it distinctly rather than as format_error
# or model_not_found.
e = MockAPIError(
"No endpoints available matching your data policy",
status_code=400,
)
result = classify_api_error(e)
assert result.reason == FailoverReason.provider_policy_blocked
assert result.retryable is False
assert result.should_fallback is False
def test_message_only_openrouter_policy_blocked(self):
# No status code — classifier should still catch the fingerprint
# via the message-pattern fallback.
e = Exception(
"No endpoints available matching your guardrail restrictions "
"and data policy"
)
result = classify_api_error(e)
assert result.reason == FailoverReason.provider_policy_blocked
def test_404_model_not_found_still_works(self):
# Regression guard: the new policy-block check must not swallow
# genuine model_not_found 404s.
e = MockAPIError(
"openrouter/nonexistent-model is not a valid model ID",
status_code=404,
)
result = classify_api_error(e)
assert result.reason == FailoverReason.model_not_found
assert result.should_fallback is True
# ── Payload too large ──
def test_413_payload_too_large(self):
@ -1040,3 +1094,37 @@ class TestSSLTransientPatterns:
result = classify_api_error(e)
assert result.reason == FailoverReason.timeout
assert result.retryable is True
# ── Test: RateLimitError without status_code (Copilot/GitHub Models) ──────────
class TestRateLimitErrorWithoutStatusCode:
"""Regression tests for the Copilot/GitHub Models edge case where the
OpenAI SDK raises RateLimitError but does not populate .status_code."""
def _make_rate_limit_error(self, status_code=None):
"""Create an exception whose class name is 'RateLimitError' with
an optionally missing status_code, mirroring the OpenAI SDK shape."""
cls = type("RateLimitError", (Exception,), {})
e = cls("You have exceeded your rate limit.")
e.status_code = status_code # None simulates the Copilot case
return e
def test_rate_limit_error_without_status_code_classified_as_rate_limit(self):
"""RateLimitError with status_code=None must classify as rate_limit."""
e = self._make_rate_limit_error(status_code=None)
result = classify_api_error(e, provider="copilot", model="gpt-4o")
assert result.reason == FailoverReason.rate_limit
def test_rate_limit_error_with_status_code_429_classified_as_rate_limit(self):
"""RateLimitError that does set status_code=429 still classifies correctly."""
e = self._make_rate_limit_error(status_code=429)
result = classify_api_error(e, provider="copilot", model="gpt-4o")
assert result.reason == FailoverReason.rate_limit
def test_other_error_without_status_code_not_forced_to_rate_limit(self):
"""A non-RateLimitError with missing status_code must NOT be forced to 429."""
cls = type("APIError", (Exception,), {})
e = cls("something went wrong")
e.status_code = None
result = classify_api_error(e, provider="copilot", model="gpt-4o")
assert result.reason != FailoverReason.rate_limit

View file

@ -0,0 +1,166 @@
"""Tests for Gemini free-tier detection and blocking."""
from __future__ import annotations
from unittest.mock import MagicMock, patch
import pytest
from agent.gemini_native_adapter import (
gemini_http_error,
is_free_tier_quota_error,
probe_gemini_tier,
)
def _mock_response(status: int, headers: dict | None = None, text: str = "") -> MagicMock:
resp = MagicMock()
resp.status_code = status
resp.headers = headers or {}
resp.text = text
return resp
def _run_probe(resp: MagicMock) -> str:
with patch("agent.gemini_native_adapter.httpx.Client") as MC:
inst = MagicMock()
inst.post.return_value = resp
MC.return_value.__enter__.return_value = inst
return probe_gemini_tier("fake-key")
class TestProbeGeminiTier:
"""Verify the tier probe classifies keys correctly."""
def test_free_tier_via_rpd_header_flash(self):
# gemini-2.5-flash free tier: 250 RPD
resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "250"}, "{}")
assert _run_probe(resp) == "free"
def test_free_tier_via_rpd_header_pro(self):
# gemini-2.5-pro free tier: 100 RPD
resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "100"}, "{}")
assert _run_probe(resp) == "free"
def test_free_tier_via_rpd_header_flash_lite(self):
# flash-lite free tier: 1000 RPD (our upper bound)
resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "1000"}, "{}")
assert _run_probe(resp) == "free"
def test_paid_tier_via_rpd_header(self):
# Tier 1 starts at 1500+ RPD
resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "1500"}, "{}")
assert _run_probe(resp) == "paid"
def test_free_tier_via_429_body(self):
body = (
'{"error":{"code":429,"message":"Quota exceeded for metric: '
'generativelanguage.googleapis.com/generate_content_free_tier_requests, '
'limit: 20"}}'
)
resp = _mock_response(429, {}, body)
assert _run_probe(resp) == "free"
def test_paid_429_has_no_free_tier_marker(self):
body = '{"error":{"code":429,"message":"rate limited"}}'
resp = _mock_response(429, {}, body)
assert _run_probe(resp) == "paid"
def test_successful_200_without_rpd_header_is_paid(self):
resp = _mock_response(200, {}, '{"candidates":[]}')
assert _run_probe(resp) == "paid"
def test_401_returns_unknown(self):
resp = _mock_response(401, {}, '{"error":{"code":401}}')
assert _run_probe(resp) == "unknown"
def test_404_returns_unknown(self):
resp = _mock_response(404, {}, '{"error":{"code":404}}')
assert _run_probe(resp) == "unknown"
def test_network_error_returns_unknown(self):
with patch(
"agent.gemini_native_adapter.httpx.Client",
side_effect=Exception("dns failure"),
):
assert probe_gemini_tier("fake-key") == "unknown"
def test_empty_key_returns_unknown(self):
assert probe_gemini_tier("") == "unknown"
assert probe_gemini_tier(" ") == "unknown"
assert probe_gemini_tier(None) == "unknown" # type: ignore[arg-type]
def test_malformed_rpd_header_falls_through(self):
# Non-integer header value shouldn't crash; 200 with no usable header -> paid.
resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "abc"}, "{}")
assert _run_probe(resp) == "paid"
def test_openai_compat_suffix_stripped(self):
"""Base URLs ending in /openai get normalized to the native endpoint."""
resp = _mock_response(200, {"x-ratelimit-limit-requests-per-day": "1500"}, "{}")
with patch("agent.gemini_native_adapter.httpx.Client") as MC:
inst = MagicMock()
inst.post.return_value = resp
MC.return_value.__enter__.return_value = inst
probe_gemini_tier(
"fake",
"https://generativelanguage.googleapis.com/v1beta/openai",
)
# Verify the post URL does NOT contain /openai
called_url = inst.post.call_args[0][0]
assert "/openai/" not in called_url
assert called_url.endswith(":generateContent")
class TestIsFreeTierQuotaError:
def test_detects_free_tier_marker(self):
assert is_free_tier_quota_error(
"Quota exceeded for metric: generate_content_free_tier_requests"
)
def test_case_insensitive(self):
assert is_free_tier_quota_error("QUOTA: FREE_TIER_REQUESTS")
def test_no_free_tier_marker(self):
assert not is_free_tier_quota_error("rate limited")
def test_empty_string(self):
assert not is_free_tier_quota_error("")
def test_none(self):
assert not is_free_tier_quota_error(None) # type: ignore[arg-type]
class TestGeminiHttpErrorFreeTierGuidance:
"""gemini_http_error should append free-tier guidance for free-tier 429s."""
class _FakeResp:
def __init__(self, status: int, text: str):
self.status_code = status
self.headers: dict = {}
self.text = text
def test_free_tier_429_appends_guidance(self):
body = (
'{"error":{"code":429,"message":"Quota exceeded for metric: '
"generativelanguage.googleapis.com/generate_content_free_tier_requests, "
'limit: 20","status":"RESOURCE_EXHAUSTED"}}'
)
err = gemini_http_error(self._FakeResp(429, body))
msg = str(err)
assert "free tier" in msg.lower()
assert "aistudio.google.com/apikey" in msg
def test_paid_429_has_no_billing_url(self):
body = '{"error":{"code":429,"message":"Rate limited","status":"RESOURCE_EXHAUSTED"}}'
err = gemini_http_error(self._FakeResp(429, body))
assert "aistudio.google.com/apikey" not in str(err)
def test_non_429_has_no_billing_url(self):
body = '{"error":{"code":400,"message":"bad request","status":"INVALID_ARGUMENT"}}'
err = gemini_http_error(self._FakeResp(400, body))
assert "aistudio.google.com/apikey" not in str(err)
def test_401_has_no_billing_url(self):
body = '{"error":{"code":401,"message":"API key invalid","status":"UNAUTHENTICATED"}}'
err = gemini_http_error(self._FakeResp(401, body))
assert "aistudio.google.com/apikey" not in str(err)

View file

@ -234,6 +234,19 @@ def test_native_client_accepts_injected_http_client():
assert client._http is injected
def test_native_client_rejects_empty_api_key_with_actionable_message():
"""Empty/whitespace api_key must raise at construction, not produce a cryptic
Google GFE 'Error 400 (Bad Request)!!1' HTML page on the first request."""
from agent.gemini_native_adapter import GeminiNativeClient
for bad in ("", " ", None):
with pytest.raises(RuntimeError) as excinfo:
GeminiNativeClient(api_key=bad) # type: ignore[arg-type]
msg = str(excinfo.value)
assert "GOOGLE_API_KEY" in msg and "GEMINI_API_KEY" in msg
assert "aistudio.google.com" in msg
@pytest.mark.asyncio
async def test_async_native_client_streams_without_requiring_async_iterator_from_sync_client():
from agent.gemini_native_adapter import AsyncGeminiNativeClient

Some files were not shown because too many files have changed in this diff Show more