Commit graph

3701 commits

Author SHA1 Message Date
Teknium
19071529f6
fix(lsp): shift baseline diagnostics into post-edit coordinates (#25978)
Pre-existing diagnostics below an edit point used to surface as 'LSP
diagnostics introduced by this edit' whenever the edit deleted or
inserted lines.  The delta-filter key included the diagnostic's
range, so the same logical error reported at a different line in
the post-edit snapshot looked like a brand new diagnostic.

Concrete case: deleting 14 lines in cli.py caused Pyright errors at
lines 9873, 10590, 12413, 13004 (unrelated to the edit) to be
reported as introduced by it.

Fix: build a piecewise-linear line-shift map (via difflib's
SequenceMatcher) from pre and post content, and remap baseline
diagnostics into post-edit coordinates before the set-difference.
Diagnostics in deleted regions drop out cleanly; diagnostics below
the edit shift by the right amount; diagnostics above are untouched.
The strict (range-aware) equality key stays — so a genuinely new
instance of an identical error class at a different line still
surfaces as new.

Pieces:
- agent/lsp/range_shift.py — build_line_shift, shift_diagnostic_range,
  shift_baseline.  Pure functions, no LSP state.
- agent/lsp/manager.py — LSPService.get_diagnostics_sync gains an
  optional line_shift kwarg; baseline is shift_baseline'd before
  computing the seen-set.  _diag_key keeps the strict range key.
- tools/file_operations.py — write_file captures pre_content for any
  LSP-handled extension (not just LINTERS_INPROC) and passes pre/post
  to _maybe_lsp_diagnostics, which builds the shift map.
- New _lsp_handles_extension helper guards the pre_content read.

Trade-offs preserved:
- Genuinely new same-class errors at different lines still surface
  (content-only key would have swallowed them).
- Pre-existing errors at unshifted positions still get filtered
  (covered by the strict-key path with no shift).
- Best-effort: when pre_content can't be captured (file didn't
  exist, permissions), the unshifted comparison still catches
  most pre-existing errors; the edge case it misses is a new file
  with a non-empty baseline, which is structurally impossible.
2026-05-14 15:56:07 -07:00
teknium1
4abfb6bc24 feat(discord): default history backfill on, expand to per-user + threads
Follow-up to snav's PR #25463 contribution: flip default to on, broaden
scope so backfill fires whenever require_mention gates the bot (not just
shared-session channels).

Why:
- The mention-gate creates a session-transcript gap regardless of whether
  the channel is shared or per-user. In per-user sessions, Alice's session
  is still missing other participants' messages and her own pre-mention
  messages — backfill fills both gaps.
- Threads naturally scope to thread-only history because discord.py's
  channel.history() on a thread returns only that thread's messages.
- DMs still skip — every DM triggers the bot, so the session transcript
  is already complete.

Changes:
- hermes_cli/config.py: discord.history_backfill default → true
- gateway/platforms/discord.py: drop the _is_shared gate, keep _is_dm
  skip and _needed_mention gate; env var DISCORD_HISTORY_BACKFILL
  default → 'true'
- cli-config.yaml.example + website docs: update defaults and prose;
  add the DISCORD_HISTORY_BACKFILL / _LIMIT env var rows that were
  documented in the PR description but missing from the env-var table
- tests/gateway/test_discord_free_response.py:
  - flip test_discord_per_user_channel_does_not_backfill →
    test_discord_per_user_channel_backfills_too (new behavior)
  - add test_discord_dm_does_not_backfill (DM skip is invariant)
  - give FakeThread a no-op history() so existing thread tests don't hit
    a fake discord.Forbidden when backfill now fires on threads too

Tests: 160/160 in target files; 400/400 across all tests/gateway/ -k discord.
2026-05-14 15:50:57 -07:00
snav
e84fe483bc feat(discord): channel history backfill for multi-user sessions
Adds optional channel-context backfill for Discord shared-channel sessions
so the agent can see recent messages it missed between its own turns
(typically when require_mention=true filters out most traffic).

Previously the agent only saw the @mention message that triggered it, which
led to disorienting replies in active multi-user channels where the
conversation context was invisible. With backfill enabled, a configurable
number of recent messages are fetched per-turn and prepended to the trigger
message as a context block, kept separate from sender-prefix logic so
attribution remains clean.

This re-opens the work from #13063 (approved by @OutThisLife on 2026-04-20,
closed when I closed the branch to address the simpolism:main head-branch
issue plus an ordering bug I caught later in live use). Filing against the
freshly-rewritten problem statement in #13054 so the design is grounded in
the failure mode rather than the implementation shape.

The implementation follows the **push-mode last-self-anchored** design from
the two options laid out in #13054. See the issue for the trade-off
discussion vs pull-mode (#13120 was an earlier closed PR using that shape).
Treating this as a reference implementation — happy to rewrite as
last-trigger anchoring or as a hybrid with #13120 if maintainers prefer.

Changes:

- gateway/platforms/discord.py:
  - new `_discord_history_backfill()` / `_discord_history_backfill_limit()`
    helpers (config.extra > env > default), mirroring the existing
    `_discord_require_mention()` shape
  - new `_fetch_channel_context()` that scans `channel.history()` backwards
    from the trigger to the bot's last message (or limit), formats as
    `[Recent channel messages] / [name] msg / ...`, respects DISCORD_ALLOW_BOTS,
    skips system messages
  - per-channel `_last_self_message_id` cache to narrow the fetch window
    on hot paths (avoids full history scan when the bot has spoken recently)
  - **IMPORTANT**: passes `oldest_first=False` explicitly to `channel.history()`.
    discord.py 2.x silently flips the default to True when `after=` is supplied,
    which would select the EARLIEST N messages after our last response instead
    of the LATEST N before the trigger. In high-traffic windows this would
    return stale tool traces and drop the actual final answer the user is
    asking about. See regression test below. Caught in live use during a
    Codex tool-trace burst on May 13 2026.
- gateway/config.py: discord_history_backfill + discord_history_backfill_limit
  settings + yaml→env bridge
- gateway/platforms/base.py: channel_context field on MessageEvent
- gateway/run.py: prepend channel_context after sender-prefix so the
  [sender name] tag applies to the trigger message alone, not to the backfill
- hermes_cli/config.py: defaults for new discord.history_backfill and
  discord.history_backfill_limit keys
- cli-config.yaml.example: documented defaults
- tests/gateway/test_discord_free_response.py: 7 new tests covering
  cold-start backfill, self-message stop boundary, other-bot filtering,
  cache hot-path narrowing, stale-cache fallback, shared-channel +
  per-user backfill paths, and the ordering regression test
  (`test_fetch_channel_context_cache_uses_latest_window_when_after_set`)
- tests/gateway/test_config.py: yaml→env bridge tests
- tests/gateway/test_session.py: prefix-order edge cases
- website/docs/user-guide/messaging/discord.md: env vars + config keys +
  usage docs

Tested on Ubuntu 24.04 — empirically validated in my own multi-bot Discord
research server for the past three weeks.

Fixes #13054
Supersedes #13063 (closed)
2026-05-14 15:50:57 -07:00
Teknium
ccb5aae0d2
feat(proxy): local OpenAI-compatible proxy for OAuth providers (#25969)
Adds 'hermes proxy start' — a local HTTP server that lets external apps
(OpenViking, Karakeep, Open WebUI, ...) use a Hermes-managed provider
subscription as their LLM endpoint. The proxy attaches the user's real
OAuth-resolved credentials to each forwarded request, refreshing them
automatically; the client can send any bearer (it gets stripped).

Ships with one adapter — Nous Portal. The UpstreamAdapter ABC and
registry in hermes_cli/proxy/adapters/ are designed for additional
OAuth providers to plug in by name without server changes.

Commands:
  hermes proxy start [--provider nous] [--host 127.0.0.1] [--port 8645]
  hermes proxy status
  hermes proxy providers

Allowed Portal paths: /v1/chat/completions, /v1/completions,
/v1/embeddings, /v1/models. Anything else returns 404 with a clear
error pointing at the allowed list.

aiohttp is gated like gateway/platforms/api_server.py (try-import,
clean runtime error if missing). No new core dependency.

Tests: 24 unit tests + 1 separate E2E that spawns the real subprocess
and verifies the upstream receives the right bearer with the client's
header stripped.
2026-05-14 15:40:48 -07:00
Teknium
2844c888f1
fix(cli): clamp scrollback box widths + suppress status bar after resize (#25975)
When the terminal shrinks, already-printed box-drawing rules (response,
reasoning, streaming TTS, background-task Panels) reflow into multiple
narrower rows — visible as duplicated horizontal separators / ghost
lines in scrollback. Similarly, prompt_toolkit redraws a fresh status
bar on SIGWINCH on top of one the terminal just reflowed, producing
double-bar artifacts on column shrink.

Two surgical changes:

1. Decorative scrollback boxes now use a new
   `HermesCLI._scrollback_box_width()` helper that clamps to
   `max(32, min(width, 56))`. The live TUI footer is unaffected and still
   uses the full width. Covers: streaming response box (open + close),
   reasoning box (open + close, both streaming and post-stream paths),
   streaming-TTS box close, final-response Rich Panel, and the
   background-task Rich Panel.

2. `_recover_after_resize()` now also sets a new
   `_status_bar_suppressed_after_resize` flag so the dynamic status bar
   and both input separator rules stay hidden until the next user input.
   The flag is cleared in the process loop the moment the user submits
   their next prompt, restoring chrome cleanly.

Tests:
- New `test_input_rules_hide_after_resize_until_next_input` covers the
  flag's effect on rule heights.
- New `test_scrollback_box_width_caps_to_resize_safe_value` covers the
  helper at floor / cap / mid-range / overflow.
- Existing resize-recovery test extended to assert the flag flips.

Refs: #18449 #19280 #22976
Salvage of #24403.

Co-authored-by: Szymonclawd <szymonclawd@mac.home>
2026-05-14 15:22:44 -07:00
LeonSGP43
ac64d0c2ca fix: preserve ansi output history on resize replay 2026-05-14 15:14:29 -07:00
Xu Zhizhong
06c6c1f0f2 fix(cli): batch resize history replay 2026-05-14 15:11:51 -07:00
Teknium
fe83c4001b
fix(codex-app-server): attach redacted stderr tail to generic failures (#25929)
When codex app-server fails outside the OAuth-classified path
(non-auth turn/start errors, plain TimeoutErrors, generic turn-ended
status, subprocess silently exits, hard deadline timeout), the user
got a bare 'Internal error' / 'turn/start failed: ...' with no
context. Diagnosing config/provider/auth-bridge issues forced a
re-run with verbose codex flags.

Add a _format_error_with_stderr helper that appends the last few
stderr lines via agent.redact.redact_sensitive_text(force=True),
and use it at every catch-all error site:

- ensure_started() failures (codex init / thread/start) now return
  a TurnResult.error with should_retire=True instead of bubbling
- non-OAuth turn/start CodexAppServerError / TimeoutError
- subprocess-died branch (previously dumped raw stderr_blob[-300:]
  with no redaction — a leak risk)
- turn ended with non-completed status
- hard turn-timeout deadline

OAuth-classified failures and the post-tool quiet watchdog already
produce clean hints and stay unchanged. The redactor catches sk-*,
gh*_*, Authorization: Bearer, query-string tokens, JWTs, private
keys, etc., so provider error payloads can't leak into chat output
or trajectories.

Inspired by openclaw#80718, adapted for our app-server transport.
2026-05-14 14:55:23 -07:00
helix4u
a28add199d fix(agent): keep image tool results from poisoning text-only sessions 2026-05-14 14:52:15 -07:00
VTRiot
bc42e62b17 fix(gateway): prevent duplicate final send when only cosmetic edit failed
When the stream consumer's got_done handler successfully delivers the
final response content via _send_or_edit but the subsequent edit
(e.g. cursor removal) fails, final_response_sent remains False even
though the user has already received the final answer. The gateway's
fallback send path then re-delivers the same content, causing the
user to see the response twice on Telegram.

Introduce a new _final_content_delivered flag on the stream consumer,
set by the got_done handler when the final content has reached the
user. The _run_agent suppression logic now treats this flag as an
additional signal (alongside final_response_sent and
response_previewed) that final delivery is already complete.

This preserves the existing behavior for intermediate-text-only
streams (where already_sent=True but no final content has been
delivered) — those still receive the gateway's fallback send, matching
the test expectation in test_partial_stream_output_does_not_set_already_sent.

Adds TestFinalContentDeliveredSuppression with two cases covering
both the suppression (content delivered + edit failed) and the
non-suppression (intermediate text only) branches.
2026-05-14 14:51:07 -07:00
luyao618
b4b8509fe8 fix(gateway): load streaming config from nested gateway.streaming key
`hermes config set gateway.streaming.*` writes the streaming block
nested under a `gateway:` key in config.yaml, but the config loader
only checked for a top-level `streaming:` key — silently ignoring
the nested variant.

Fall back to `yaml_cfg['gateway']['streaming']` when the top-level
key is absent, matching the pattern already used for other nested
config sections.

Closes #25676
2026-05-14 14:51:07 -07:00
ethernet
cd64bed55e
Merge pull request #21012 from stephenschoettler/fix/ci-pr-check-unblock
fix(ci): unblock shared PR checks
2026-05-14 16:16:42 -04:00
Teknium
9ed751b967
fix(whatsapp): drop status broadcasts and channel newsletters before agent dispatch (#25845)
WhatsApp pseudo-chats (Status updates / Stories, Channels / Newsletters,
broadcast lists) were being routed through the full agent pipeline. A
user's gateway.log showed the agent replying to a contact's Story
('status@broadcast') with 345 chars plus title-generation cost, which
also shows up in the contact's status feed.

Drop these JIDs at _should_process_message() before the policy gate so
they're filtered regardless of dm_policy or allowlist state. Covers:
- status@broadcast (Stories)
- *@newsletter (Channels)
- *@broadcast (broadcast lists, future-proofing)

The bridge.js already filters these on the fromMe outbound path, but
inbound events on self-chat mode skipped that check.

Tests:
- status@broadcast dropped on open policy
- broadcast filter wins over allowlisted senders
- real DMs still pass through
- helper unit cases (case-insensitive, whitespace-tolerant)

26/26 tests/gateway/test_whatsapp_group_gating.py pass; 59/59 adjacent
WhatsApp test suites pass.
2026-05-14 09:59:03 -07:00
Teknium
78b842c995
fix(install): support non-sudo service-user installs on apt distros (#25814)
The Debian/Ubuntu branch of install_node_deps() ran 'npx playwright install
--with-deps chromium' unconditionally. Playwright invokes sudo interactively
to apt-install Chromium's system libraries, which blocks the installer for
non-sudo users (systemd service accounts, unprivileged operator users) on
an unsatisfiable password prompt.

Changes:
- install.sh: gate --with-deps behind a sudo capability check on the apt
  branch (matches the existing Arch/pacman branch pattern). Non-sudo users
  fall back to 'npx playwright install chromium' alone and the installer
  prints the exact 'sudo npx playwright install-deps chromium' command an
  administrator can run separately.
- install.sh: add --skip-browser (alias --no-playwright) to skip the
  Playwright step entirely for headless installs that don't need browser
  automation. Mirrors the existing --no-venv / --skip-setup shape.
- installation.md: add a 'Non-Sudo / System Service User Installs' section
  covering the admin/service-user split, the --skip-browser flag, and the
  ~/.local/bin PATH gotcha (the root cause of the 'No module named dotenv'
  error users hit when running the repo source 'hermes' script with system
  Python instead of the venv launcher).
- test_install_sh_browser_install.py: regression coverage for the
  --skip-browser flag and the sudo-gate on the apt branch.

Reported by @ssilver in Discord.
2026-05-14 09:05:31 -07:00
EthanGuo-coder
26933c2f59 fix(agent/gemini-cloudcode): seed delta defaults for reasoning-only stream chunks
_make_stream_chunk built delta_kwargs with only `role`, so a reasoning-only
chunk produced a SimpleNamespace without a `.content` attribute. Downstream
consumers that read `delta.content` then raised AttributeError on Gemini 2.5
Flash, where the thinking delta arrives before any content delta.

Seed `content`, `tool_calls`, `reasoning`, and `reasoning_content` as None
up front, matching the pattern already used in gemini_native_adapter.py.
Key-present arguments still override the defaults.

Fixes #24974
References: Related open PR #24984 (luyao618) applies the same 1-line fix; this PR adds a regression test that #24984 omits
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-14 08:03:56 -07:00
Teknium
72b5dd8658
fix(update): refresh lazy-installed backends on hermes update (#25766)
Pyproject's [all] extra was slimmed down in May 2026 — ~20 optional
backends moved to tools/lazy_deps.py and only install on first use.
hermes update runs uv pip install -e .[all] which doesn't touch any of
them, so pin bumps in LAZY_DEPS (CVE response, transitive fixes) were
silently ignored on already-activated backends.

Two changes:

1. _is_satisfied() now parses the spec and checks the installed version
   against the constraint via packaging.specifiers. Previously it
   returned True the moment the package name was importable, which made
   ensure() a name-presence gate rather than a version-pin gate.

2. New active_features() / refresh_active_features() pair: lists every
   feature with at least one of its packages currently installed, then
   re-runs ensure() on each. Refresh is invoked at the end of
   _cmd_update_impl, right after the [all] install completes. Cold
   backends (never activated) stay quiet — no churn for them.

Output during update is one summary block:
  → Refreshing 4 active lazy backend(s)...
    ↑ 1 refreshed: provider.anthropic
    ✓ 3 already current
or
    ⚠ memory.honcho failed to refresh: <pip stderr>

Failures never raise out of update — backends keep their previously-
installed version and we tell the user to rerun once upstream is fixed.
security.allow_lazy_installs=false is honored: features get marked
"skipped" with the reason shown.

Tests: 18 new unit tests covering version-aware satisfaction (exact pin,
range, extras blocks, missing package, malformed spec), active feature
discovery, and refresh status reporting. All 61 lazy_deps tests pass.
2026-05-14 08:03:40 -07:00
wesleysimplicio
436a0a271e test(toolsets): lock web search into default platform coverage
Adds regression tests pinning web search into the WhatsApp and api-server
default platform-coverage toolsets. Pure test additions, no runtime change.

Salvage of the test-addition commit from #25692 by @wesleysimplicio.
(The AUTHOR_MAP fixup commit from the same PR landed separately as
529ec85c7.)
2026-05-14 08:03:33 -07:00
zccyman
f26098e22f fix(gateway): enable text-intercept for multi-choice clarify fallback (#25567) 2026-05-14 07:59:12 -07:00
AsoTora
1247ff2dca fix: stop retrying initial MCP auth failures 2026-05-14 07:58:43 -07:00
Teknium
12f755c9eb
fix(codex-runtime): retire wedged sessions + post-tool watchdog + OAuth refresh classify (#25769)
Mirrors openclaw beta.8's app-server resilience fixes so a stuck codex
subprocess can't burn the full turn deadline and so users get a
`codex login` pointer instead of raw RPC errors when their token expires.

- TurnResult.should_retire signals the caller to drop+respawn codex.
- Deadline-hit path and dead-subprocess detection set should_retire so
  the next turn doesn't ride a CPU-spinning or auth-broken process.
- Post-tool watchdog (post_tool_quiet_timeout=90s): if a tool item
  completes and codex goes silent past the threshold without further
  output or turn/completed, fast-fail instead of waiting the full 600s.
  Resets on any non-tool activity so normal think-after-tool flows are
  not affected.
- <turn_aborted> and <turn_aborted/> in agent text are treated as
  terminal — some codex builds tear down a turn that way without
  emitting turn/completed.
- _classify_oauth_failure() inspects RPC error message + stderr tail
  for invalid_grant / token refresh / 401 / etc. and rewrites
  user-facing errors to 'run codex login'. Conservative: generic
  failures still surface verbatim. Fires at turn/start failure,
  turn/completed failure, and dead-subprocess paths.
- thread/start cross-fill: tolerate thread.id, thread.sessionId,
  top-level sessionId/threadId so future codex schema drift doesn't
  KeyError us at handshake.
- run_agent.py: when run_turn returns should_retire=True OR raises,
  close + null self._codex_session so the next turn respawns.

Tests: +30 cases across session + integration suites.
  tests/agent/transports/test_codex_app_server_session.py 50/50 pass
  tests/run_agent/test_codex_app_server_integration.py 27/27 pass
  Broader codex scope (transports + cli runtime/migration) 376/376 pass
2026-05-14 07:55:09 -07:00
binhnt92
63991bbd97 fix(memory): skip OpenViking upload symlinks 2026-05-14 07:48:03 -07:00
teknium1
26deeea830 fix(telegram): restore model-switch success path + author map
The cherry-picked PR over-indented the edit_message_text block for
the mm: (model selected → switch) success path so the confirmation
edit lived inside the preceding 'except Exception as exc' branch and
only fired when the callback raised. Dedent the try/except back to
12-space indent so it runs after the callback succeeds, restoring
the original flow that removes the inline buttons and shows the
'Switched to ...' confirmation.

Add a regression test (test_model_selected_edits_message_on_success)
that asserts edit_message_text is awaited and the result text is
routed through format_message (MARKDOWN_V2 + backtick survival).

Add phuongvm to scripts/release.py AUTHOR_MAP.
2026-05-14 07:47:52 -07:00
Phuong Lambert
a694040520 fix(telegram): escape dynamic markdown in callback flows
Use MarkdownV2 formatting for Telegram callback follow-ups and interactive prompts where dynamic names or user text can break legacy Markdown parsing. Add regression coverage for reload-mcp, model picker, approval callbacks, and update prompts.
2026-05-14 07:47:52 -07:00
teknium1
1dca6a6960 feat(discord): render clarify choices as buttons
Brings Discord to parity with Telegram on the clarify tool's interactive
UX. Overrides BasePlatformAdapter.send_clarify on DiscordAdapter to attach
a button view when choices are present.

  - ClarifyChoiceView: one discord.ui.Button per choice (max 24, Discord's
    25-component view cap leaves one slot for Other) plus a final
    'Other (type answer)' button.
  - Numeric click -> tools.clarify_gateway.resolve_gateway_clarify(
    clarify_id, choice_text) using the canonical choice text from the
    gateway entry (falls back to the button label if the entry vanished).
  - Other click -> tools.clarify_gateway.mark_awaiting_text(clarify_id) so
    the gateway's text-intercept captures the next user message in this
    session as the response.
  - Auth via the shared _component_check_auth helper (same OR-semantics as
    ExecApprovalView / SlashConfirmView / UpdatePromptView / ModelPickerView).
  - Open-ended (no choices) path renders the prompt as a plain embed and
    relies on the existing text-intercept resolution.
  - Single-use: first valid click disables every button and updates the
    embed footer with who answered and what they chose.

No changes to BasePlatformAdapter.send_clarify or the gateway's
clarify_callback wiring -- the existing scaffolding already drives all
adapters; Discord just inherits the default text fallback today and gains
buttons by virtue of this override.

Test conftest extended: _FakeEmbed gains add_field() / set_footer() stubs
so tests can construct embedded views without monkey-patching per-test.

Original PR: #19249 by @LeonSGP43. This is a reshape of the contributor's
work onto current main's clarify infrastructure (clarify_id + entry-based
resolution shared with Telegram, instead of a parallel on_answer-closure
mechanism). The button view structure and UX shape are preserved.

Tests: 14 new tests in tests/gateway/test_discord_clarify_buttons.py.
391/391 existing Discord gateway tests still pass.

Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
2026-05-14 07:26:43 -07:00
Tranquil-Flow
c75e1a03f9 fix(install): preserve pip entry point when re-running on symlinked install
setup_path() writes the user-facing hermes shim with `cat >`, which
follows existing symlinks. Older installs created
`$command_link_dir/hermes` as a symlink to `$HERMES_BIN`
(`venv/bin/hermes`), so re-running install.sh stomped the pip entry
point with a bash shim that exec'd itself in an infinite loop.

`rm -f` the link target before writing so the shim lands at
`$command_link_dir/hermes` and the venv entry point is left intact.

Adds a regression test that reproduces the symlink-stomp end-to-end
(creates the symlink, drives the real shim-write block from setup_path,
asserts the venv pip script body survives and the shim is now a regular
file). Both new assertions fail on origin/main and pass with the fix.

Closes #21454.
2026-05-14 07:08:45 -07:00
kshitijk4poor
0f0e20ef81 test(novita): cache pricing, add provider test coverage, AUTHOR_MAP entry
Follow-up to Alex-wuhu's NovitaAI provider commit. Adds:

- _pricing_cache hit/write in _fetch_novita_pricing (was missing — every
  pricing fetch was re-hitting the network), mirroring the
  fetch_ai_gateway_pricing pattern. force_refresh now also propagates
  from get_pricing_for_provider.
- TestNovitaProvider in tests/hermes_cli/test_api_key_providers.py
  covering profile load, alias resolution, registry auto-registration,
  model list parity between main.py and models.py, _URL_TO_PROVIDER,
  _PROVIDER_PREFIXES, context_size in _CONTEXT_LENGTH_KEYS, pricing
  unit conversion, and pricing cache behavior.
- AUTHOR_MAP entry for yanglongwei06@gmail.com → @Alex-yang00.
2026-05-13 23:51:15 -07:00
magic524
8199ec3803 fix(gateway): keep QQBot reconnect loop alive 2026-05-13 23:13:25 -07:00
raymaylee
00ad3d3c9c fix: show context compaction status 2026-05-13 23:11:43 -07:00
kfa-ai
bd33a48a58 feat(whatsapp): surface quoted reply metadata 2026-05-13 23:11:20 -07:00
mr.Shu
31b4721791 fix: simplify ACP approval bridging
Previously ACP dangerous-command approvals mixed an invalid ACP
payload shape with partial Hermes option mapping, and the callback
plumbing was shared across worker threads. This commit uses ACP
tool-call updates, preserves Hermes once/session/always semantics,
and scopes approval callbacks to the current worker thread.

- Build permission requests with `update_tool_call` and unique
  `perm-check-*` ids in `acp_adapter/permissions.py`
- Keep ACP option mapping explicit and fail closed on unknown outcomes
  or request failures
- Set approval callbacks inside the ACP executor worker and read them
  from thread-local state in `tools/terminal_tool.py`
- Replace duplicated ACP bridge coverage with focused tests in
  `tests/acp/test_permissions.py` and add a thread-local callback test
2026-05-13 22:59:39 -07:00
teknium1
35ce94a2f8 fix(tests): correct skin engine test API call
The salvaged regression test called skin.get_spinner_list() which
doesn't exist on SkinConfig. Replace with direct dict access on
skin.spinner — same intent (verify default empty spinner is preserved
when user override is invalid).
2026-05-13 22:55:52 -07:00
Dusk1e
5f234d4057 fix(cli): harden skin yaml parsing for invalid section types 2026-05-13 22:55:52 -07:00
Teknium
8f19078c6a
feat(goals): /subgoal — user-added criteria appended to active /goal (#25449)
* feat(goals): /subgoal — user-added criteria appended to active /goal

Layers a /subgoal command on top of the existing freeform Ralph judge
loop. The user can append extra criteria mid-loop; the judge factors
them into its done/continue verdict and the continuation prompt
surfaces them to the agent. No new tool, no agent self-judging — the
existing judge model just sees a richer prompt.

Forms:
  /subgoal                  show current subgoals
  /subgoal <text>           append a criterion
  /subgoal remove <n>       drop subgoal n (1-based)
  /subgoal clear            wipe all subgoals

How it integrates:

- GoalState gains `subgoals: List[str]` (default []), backwards-compat
  for existing state_meta rows.
- judge_goal accepts an optional subgoals kwarg; non-empty switches to
  JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE which lists them as
  numbered criteria and asks 'is the goal AND every additional
  criterion satisfied?'
- next_continuation_prompt picks CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE
  when non-empty so the agent sees what to target.
- /subgoal is allowed mid-run on the gateway since it only touches the
  state the judge reads at turn boundary — no race with the running
  turn.
- Status line shows '... , N subgoals' when present.

Surface:
- hermes_cli/goals.py — field, prompt blocks, manager methods, judge weave
- hermes_cli/commands.py — /subgoal CommandDef
- cli.py — _handle_subgoal_command
- gateway/run.py — _handle_subgoal_command + mid-run dispatch
- tests/hermes_cli/test_goals.py — 15 new tests (backcompat, mutation,
  persistence, prompt template selection, judge-prompt content via mock,
  status-line rendering)

77 goal-related tests passing across goals + cli + gateway + tui.

* fix(goals): slash commands don't preempt the goal-continuation hook

Two findings from live-testing /subgoal:

1. Slash commands queued while the agent is running landed in
   _pending_input (same queue as real user messages). The goal hook's
   'is a real user message pending?' check returned True and silently
   skipped — but the slash command consumes its queue slot via
   process_command() which never re-fires the goal hook, so the loop
   stalls indefinitely. Now the hook peeks the queue and only defers
   when a non-slash payload is present.

2. The with-subgoals judge prompt was too soft — opus 4.7 said 'done,
   implying all requirements met' without verifying. Tightened to
   demand specific per-criterion evidence (file contents, output line,
   command result) and explicitly reject phrases like 'implying it was
   done.'

Live verified: /subgoal injected mid-loop now correctly forces the
judge to refuse done until the new criterion is met. Agent gets the
continuation prompt with subgoals listed, updates the script, judge
confirms done with specific evidence cited.
2026-05-13 22:55:09 -07:00
Dusk1e
8db544b4d0 fix(clipboard): reject non-png clipboard images when png normalization fails 2026-05-13 22:54:21 -07:00
teknium1
c872f07c47 fix(tests): exercise profile-mode HERMES_HOME for honcho fallback
The cherry-picked tests from #6173 set HERMES_HOME outside Path.home()/.hermes,
which forces get_default_hermes_root() down its Docker branch and returns
HERMES_HOME directly — so _get_default_hermes_home() never resolves to the
~/.hermes directory the tests were trying to assert about.

Rewire both tests to use the real profile layout (HERMES_HOME pointing at
~/.hermes/profiles/<name>) so _get_default_hermes_home() resolves back to
~/.hermes and the default-profile fallback is actually exercised.
2026-05-13 22:53:01 -07:00
Billard
d18618f48f fix(honcho): respect HOME-anchored default profile fallback 2026-05-13 22:53:01 -07:00
kshitijk4poor
4ca5e72444 fix(web): preserve top-level error envelope on unconfigured systems
Surfaced by local E2E behavior-parity testing of PR vs origin/main: the
plugin-migrated dispatchers were quietly changing the error envelope
shape returned to function-calling models on unconfigured systems.

Two findings, both from per-result error wrapping bleeding into the
pre-flight configuration error path:

1. **search**: ``firecrawl.search()`` caught the
   ``ValueError("Web tools are not configured...")`` from
   ``_get_firecrawl_client()`` and returned it as
   ``{"success": False, "error": ...}``, losing the legacy
   ``{"error": "Error searching web: ..."}`` envelope that
   ``tool_error()`` emits on main. Models that special-case the
   ``error`` key still detect the failure, but the prefix is part of
   the legacy contract some users rely on.

2. **crawl**: ``firecrawl.crawl()`` caught the same pre-flight
   ``ValueError`` and wrapped it as a per-page error inside
   ``results[0]``. Main short-circuits on ``check_firecrawl_api_key()``
   BEFORE dispatching, so its unconfigured response is
   ``{"success": False, "error": "web_crawl requires Firecrawl..."}``
   at the top level. The PR's per-page burying hid the failure inside
   ``results[]`` where models that check ``result.get("error")`` would
   miss it.

Fix:
- ``plugins/web/firecrawl/provider.py``: pull
  ``_get_firecrawl_client()`` outside the broad ``try`` in
  ``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate
  to the dispatcher's top-level exception handler. In-flight SDK
  errors still get wrapped as ``{"success": False, ...}``.
- ``tools/web_tools.py``: mirror main's upstream availability gate in
  ``web_crawl_tool``. When the resolved crawl provider is
  ``is_available()==False``, short-circuit BEFORE dispatching with the
  same top-level error shape main emits.
- ``tests/tools/test_web_providers.py``: 2 regression tests
  (``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so
  future plugin work can't undo this.

Verified via local subprocess-based parity test (14/14 scenarios match
origin/main shape exactly) and full 210/210 web test suite green.
2026-05-13 22:31:28 -07:00
kshitijk4poor
21e3a863bb feat(web): firecrawl plugin natively supports crawl; delete legacy inline path
The web-provider migration originally left firecrawl crawl as the only
provider-specific code remaining inline in tools/web_tools.py (~250
lines of Firecrawl-specific crawl orchestration that didn't fit the
plugin's existing surface). This commit closes that gap.

What this adds
--------------
1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)``
   - Accepts the same kwargs as the dispatcher passes to any crawl
     provider (``instructions``, ``depth``, ``limit``); Firecrawl's
     /crawl endpoint ignores ``instructions`` and ``depth`` so we log
     and drop with a clear info message.
   - Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the
     gateway event loop isn't blocked on a multi-page crawl.
   - Preserves the response-shape normalization across pydantic /
     typed-object / dict variants that the legacy inline code did.
   - Preserves per-page website-policy re-check (catches blocked
     redirects after the SDK returns).
   - Returns the same {"results": [...]} shape so the dispatcher's
     shared LLM-summarization post-processing path works unchanged.
   - Sets supports_crawl() to True so the dispatcher routes through
     the plugin instead of the legacy fallthrough.

2. tools/web_tools.py: delete the entire legacy firecrawl crawl block
   that used to run after "No registered provider supports crawl" —
   ~270 lines including:
   - check_firecrawl_api_key gate + typed error
   - inline SSRF + website-policy seed-URL gate (dispatcher already
     does this)
   - Firecrawl client setup with crawl_params
   - 100+ lines of pydantic/dict/typed-object normalization
   - Per-page LLM-processing loop (kept in the dispatcher's shared
     post-processing path; that's where it always belonged)
   - trimming + base64 image cleanup (still done in the dispatcher's
     shared path)

   Replaced with a single typed-error branch when no crawl-capable
   provider is available: "web_crawl has no available backend. Set
   FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set
   TAVILY_API_KEY for Tavily."

Test updates
------------
- tests/tools/test_website_policy.py:
  - test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL
    gate still runs on web_tools.check_website_access (no change to
    that patch), but the firecrawl client lockdown moved to the
    plugin module — patch firecrawl_provider._get_firecrawl_client
    instead of web_tools._get_firecrawl_client. The dispatcher
    short-circuits before the plugin runs, so the test still passes.
  - test_web_crawl_blocks_redirected_final_url: patch the per-page
    policy gate at plugins.web.firecrawl.provider.check_website_access
    (where it now runs) AND on web_tools (where the seed-URL gate
    still runs). Patch firecrawl_provider._get_firecrawl_client for
    the FakeCrawlClient injection. Both checks flow through the same
    fake_check function.
- tests/plugins/web/test_web_search_provider_plugins.py:
  - Update parametrized capability-flag spec: firecrawl supports_crawl
    is now True.
  - Add test_firecrawl_crawl_returns_error_dict_when_unconfigured —
    verifies inspect.iscoroutinefunction(p.crawl) is True and that
    the async crawl returns a per-page error dict (not a raise) when
    FIRECRAWL_API_KEY is missing.

Verified
--------
- 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl
  crawl test from this commit = 218 with the test deduplication).
- Compile-clean (py_compile passes on both files).
- Provider capabilities matrix confirmed end-to-end:
    name        search  extract  crawl   async-extract?  async-crawl?
    firecrawl   True    True     True    True            True
    tavily      True    True     True    False           False
  Both crawl-capable providers exercise the dispatcher's
  inspect.iscoroutinefunction async-or-sync detection.

Net diff
--------
- tools/web_tools.py: -254 lines (legacy inline crawl gone)
- plugins/web/firecrawl/provider.py: +185 lines (crawl method)
- test_website_policy.py: +14/-9 lines (patch locations)
- test_web_search_provider_plugins.py: +22/-1 lines (capability flag
  + new firecrawl crawl test)
- Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763
  before this commit, 2227 before the migration started).
2026-05-13 22:31:28 -07:00
kshitijk4poor
e8cee87e85 test(plugins): tests/plugins/web/ — coverage for the 7-plugin migration
Adds 44 focused tests under tests/plugins/web/ covering the surface that
the PR #25182 web-provider migration introduced. Complements the
existing tests/tools/ coverage which is dispatcher-centric; this file is
plugin-centric and tests each plugin + the registry directly.

Test classes (44 tests, ~1.1s on 4 workers)
-------------------------------------------

TestBundledPluginsRegister (16 tests)
  - All seven plugins present in the registry after
    _ensure_plugins_discovered()
  - Per-plugin parametrized capability-flag assertions
    (brave-free / ddgs / searxng: search-only;
     exa / parallel / firecrawl: search + extract;
     tavily: search + extract + crawl)
  - Every plugin exposes name + display_name properties
  - Every plugin returns a picker-compatible get_setup_schema() dict

TestIsAvailable (7 tests)
  - Each premium plugin reports is_available()==False when its env var is
    absent and True once set (brave-free / searxng / tavily / exa /
    parallel)
  - firecrawl recognizes either FIRECRAWL_API_KEY or FIRECRAWL_API_URL
    as a "configured" signal
  - ddgs is the always-on fallback and must not raise from is_available()

TestRegistryResolution (4 tests)
  - Option B semantics validated end-to-end:
    1. Explicit configured provider wins even when is_available()==False
       (dispatcher surfaces typed credential errors, no silent switch)
    2. Unknown/typo name falls back to first available legacy-preference
       provider
    3. Asking for extract via a search-only backend falls back to an
       extract-capable available provider (capability-incompatible
       branch in _resolve())
    4. No config + no credentials → None (or ddgs if installed)

TestAsyncExtractDispatch (4 tests)
  - parallel + firecrawl extract() are coroutine functions (async path
    in dispatcher uses await)
  - exa + tavily extract() are sync (dispatcher wraps in
    asyncio.to_thread)

TestErrorResponseShapes (7 tests)
  - Plugins return typed error dicts (success=False + "error" key) when
    credentials are missing, never raise
  - async extract() returns list of per-URL error dicts
  - tavily crawl() returns {"results": [{"error": ...}]} on missing
    credentials

Design notes
------------
- All tests use real imports of plugin modules — no mocking of provider
  classes themselves — so they catch drift in the ABC, registry, and
  glue layer simultaneously. Per the hermes-agent-dev skill's E2E
  testing guidance.
- The autouse _isolate_env fixture clears every web-provider env var
  before each test so is_available() reflects the test's setup.
- Resolution tests use the lower-level _resolve() directly rather than
  rebuilding the HERMES_HOME config dance — same observable behavior,
  no sys.modules.pop side-effects that would break the ABC isinstance
  check inside ctx.register_web_search_provider().
2026-05-13 22:31:28 -07:00
kshitijk4poor
39b4ebfcea refactor(web): delete legacy tools/web_providers/ directory + migrate ABC tests
Removes the legacy in-tree provider scaffolding that PR #25182 fully
replaced with the plugin architecture:

  tools/web_providers/__init__.py        (6 lines)
  tools/web_providers/base.py            (89 lines — old ABCs)
  tools/web_providers/ARCHITECTURE.md    (73 lines — old design doc)

These were the staging-ground ABCs and provider modules that the
plugin migration absorbed. All seven web providers now implement the
single :class:`agent.web_search_provider.WebSearchProvider` ABC and
live under ``plugins/web/<vendor>/``. Nothing else in the tree imports
``tools.web_providers`` — verified via grep before deletion.

Test migration (tests/tools/test_web_providers.py)
--------------------------------------------------
Rewrote ``TestWebProviderABCs`` to test the new unified ABC at
:mod:`agent.web_search_provider`:

  - test_cannot_instantiate_abc_directly — abstract ``name`` + ``is_available``
  - test_concrete_search_only_provider_works — exercise default
    ``supports_extract=False`` / ``supports_crawl=False`` flags
  - test_concrete_multi_capability_provider_works — exercise all three
    capabilities, async extract supported (declared sync here for
    simplicity; real plugins like parallel + firecrawl use async)
  - test_search_only_provider_skips_extract_and_crawl — verify
    ``supports_*()`` flags default to False so search-only providers
    don't have to implement extract() or crawl()

The 9 other tests in the file (per-capability backend selection,
DEFAULT_CONFIG merge, dispatcher routing) test public helpers in
``tools.web_tools`` that still exist and pass unchanged.

agent/web_search_provider.py docstring updated to reflect that the
legacy ABCs no longer exist; the response-shape contract is preserved
bit-for-bit so external consumers see no behavioral change.

Net diff
--------
- tools/web_providers/ removed (-168 lines)
- tests/tools/test_web_providers.py rewritten ABC section (+78/-30 net,
  same coverage, new API)
- agent/web_search_provider.py docstring (-3/+5 lines)

Verified
--------
- 173/173 targeted web tests pass
- 12/12 ABC contract tests pass with the new interface
- No remaining grep hits for ``tools.web_providers`` outside of
  intentional historical references in plugin docstrings.
2026-05-13 22:31:28 -07:00
kshitijk4poor
5e54330e27 fix(web): preserve firecrawl crawl + website-policy gate after migration
Two regressions discovered by running the full tests/tools/ suite after
the dispatcher cutover, both fixed in this commit:

1. web_crawl_tool incorrectly errored "search-only" for firecrawl
---------------------------------------------------------------------
The cutover treated any provider with supports_crawl()==False as a
search-only backend and returned the typed search-only error. But
firecrawl can crawl via the legacy multi-page-extract path inside
web_crawl_tool — it just doesn't expose supports_crawl on the plugin
(adding native firecrawl crawl is a clean follow-up).

Fix: only emit the search-only error when the provider supports
NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the
provider supports extract but not crawl (firecrawl), fall through to
the legacy firecrawl-via-extract path below.

2. firecrawl plugin's check_website_access wasn't patchable
---------------------------------------------------------------------
The plugin imported `from tools.website_policy import check_website_access`
INSIDE the extract() function body, so monkeypatching the name on
plugins.web.firecrawl.provider had no effect — the inner import re-bound
the name on every call.

Fix: hoist the import to module level. Cheap (website_policy itself
has no heavy deps) and makes the standard
monkeypatch.setattr(firecrawl_provider, "check_website_access", ...)
pattern work.

Test updates (tests/tools/test_website_policy.py — 4 tests):
  - test_web_extract_short_circuits_blocked_url
  - test_web_extract_blocks_redirected_final_url
    Both: patch the gate at plugins.web.firecrawl.provider (where it
    runs after migration) and force the firecrawl plugin to be the
    active extract provider via FIRECRAWL_API_KEY.
  - test_web_crawl_short_circuits_blocked_url
  - test_web_crawl_blocks_redirected_final_url
    Both: unchanged — the dispatcher-level gate at tools.web_tools.py
    line 1651 still uses the imported `check_website_access` name and
    the firecrawl-fallthrough path is exercised as before.

Verified: 22/22 tests/tools/test_website_policy.py pass.
2026-05-13 22:31:28 -07:00
kshitijk4poor
b05253ceed refactor(web): dispatch all three tools through web_search_registry
Cuts over web_search_tool, web_extract_tool, and web_crawl_tool in
tools/web_tools.py to dispatch through agent.web_search_registry
instead of the legacy hardcoded if-elif backend chains.

Per-tool changes:

  web_search_tool (sync)
    Replace 5 backend branches (parallel, exa, registry-3-providers,
    tavily, firecrawl-fallthrough) with a single registry path:
      1. _get_search_backend() resolves the configured name
      2. _wsp_get_provider(name) for explicit-config-wins semantics
      3. get_active_search_provider() fallback for typo / unknown name
      4. provider.search(query, limit) — sync for all 7 providers

  web_extract_tool (async)
    Replace 4 backend branches (parallel-async, exa-sync, tavily-sync,
    search-only-error, firecrawl-perurl-loop) with:
      1. Same provider resolution as search.
      2. When configured backend IS registered but doesn't support
         extract (search-only providers like brave-free), surface a
         typed "search-only" error matching the legacy text — tests
         assert that wording.
      3. inspect.iscoroutinefunction(provider.extract) detects sync vs
         async: parallel + firecrawl are async; exa + tavily are sync.
         Sync extracts run in asyncio.to_thread() so we don't block.

  web_crawl_tool (async)
    Replace tavily-specific branch + search-only-error block with:
      1. _wsp_get_provider(backend) — explicit config first
      2. Search-only typed error when the configured name doesn't
         support crawl (matches legacy phrasing)
      3. get_active_crawl_provider() fallback otherwise
      4. provider.crawl(url, **kwargs) — async-or-sync dispatch as above
      5. Response post-processing (LLM summarization, trimming) stays
         unchanged — it's not provider-specific.
    When no plugin advertises supports_crawl, falls through to the
    existing Firecrawl-via-web-summarize path below (unchanged).

Test updates (2 tests in tests/tools/test_web_tools_config.py):
  - test_web_search_clamps_limit_before_backend_call:
      patch("tools.web_tools._parallel_search") -> patch the registry
      provider returned by agent.web_search_registry.get_provider
  - test_search_error_response_does_not_expose_diagnostics:
      patch("tools.web_tools._get_firecrawl_client") -> same pattern

Tests unchanged (still pass):
  - All TestXBackendWiring classes (test _get_backend / _is_backend_available
    config-resolution, independent of dispatch)
  - All TestXSearchOnlyErrors classes (test the search-only error path
    via web_extract_tool / web_crawl_tool — error text preserved)
  - 141 passing web tests total, 0 regressions.

Dead-code cleanup deferred to a follow-up commit so this diff stays
focused on the cutover. After this commit:
  - tools.web_tools._exa_search / _exa_extract / _parallel_search /
    _parallel_extract / _tavily_request / _normalize_tavily_* /
    _get_firecrawl_client / _extract_web_search_results /
    _extract_scrape_payload / _to_plain_object / _normalize_result_list
    are no longer called by the dispatchers, but still exist.
  - The config-resolution layer (_get_backend, _is_backend_available,
    _is_tool_gateway_ready, _has_direct_firecrawl_config) IS still in
    use and must stay.
  - The Firecrawl proxy and check_firecrawl_api_key are still imported
    by integration tests and patched by unit tests — must stay (or be
    re-exported from the plugin).
2026-05-13 22:31:28 -07:00
kshitijk4poor
6b219f5af6 refactor(web): remove legacy in-tree provider modules
Deletes tools/web_providers/{brave_free,ddgs,searxng}.py — the three
providers that moved to plugins/web/ in prior commits. tools/web_tools.py
no longer imports them (registry dispatch as of d8735963f), so removing
them is purely a cleanup pass.

Also migrates the existing tests to the new import paths:
  tests/tools/test_web_providers_brave_free.py
  tests/tools/test_web_providers_ddgs.py
  tests/tools/test_web_providers_searxng.py

Mechanical rewrites:
  - `from tools.web_providers.X import YSearchProvider`
      -> `from plugins.web.X.provider import YWebSearchProvider`
  - `.is_configured()` -> `.is_available()`        (legacy method  -> new method)
  - `.provider_name()` -> `.name`                  (legacy method  -> new property)
  - `from tools.web_providers.base import WebSearchProvider`
      -> `from agent.web_search_provider import WebSearchProvider`
      (the subclass-check asserts membership in the new plugin-facing ABC)
  - `sys.modules.delitem("tools.web_providers.ddgs")` updated to point at
    `plugins.web.ddgs.provider` (cache-busting for lazy ddgs imports)

The TestXBackendWiring / TestXSearchOnlyErrors classes (covering
_is_backend_available, _get_backend, check_web_api_key, and the
"search-only" error paths in web_extract/web_crawl) are untouched —
those still test web_tools.py's backend-selection logic, which continues
to recognize the names "brave-free" / "ddgs" / "searxng" even after the
modules behind them moved to plugins.

tools/web_providers/base.py is intentionally NOT deleted by this commit
— it's the parent ABC of the legacy modules and shares its name with
agent/web_search_provider.py::WebSearchProvider. Removing it surfaces the
naming collision (see PR description Finding 0); the real migration PR
deletes it in the same commit that drops the _WEB_PLUGIN_SKIPLIST
guards in hermes_cli/tools_config.py.

Test results:
  bash scripts/run_tests.sh tests/tools/test_web_providers_*.py
  -> 65 passed in 3.41s (all rewritten unit tests + unchanged integration tests)
  bash scripts/run_tests.sh tests/tools/test_web_*.py
  -> 141 passed in 4.70s (full web test set, post-deletion)
2026-05-13 22:31:28 -07:00
kshitijk4poor
efc32ab639 refactor(inventory): extract shared ConfigContext + build_models_payload
Three call-sites in the codebase each duplicated the same config-slice
+ list_authenticated_providers + post-processing pattern:

- hermes_cli/web_server.py /api/model/options
- tui_gateway/server.py model.options JSON-RPC
- tui_gateway/server.py model.save_key JSON-RPC

This consolidates them onto hermes_cli/inventory.py:

  load_picker_context() -> ConfigContext
      Replaces the 17-LOC config-slice (model.{default,name,provider,
      base_url}, providers:, custom_providers:) every consumer did
      inline.

  ConfigContext.with_overrides(*, current_provider=, current_model=,
                               current_base_url=) -> ConfigContext
      Truthy-only overlay for TUI agent-session state on top of disk
      config. Empty getattr(agent, ...) attrs MUST NOT clobber disk.

  build_models_payload(ctx, *, include_unconfigured, picker_hints,
                       canonical_order, max_models) -> dict
      Single payload builder. Delegates curation to
      list_authenticated_providers (does not call provider_model_ids
      per row \u2014 that pulls non-agentic models). picker_hints +
      canonical_order produce the TUI ModelPickerDialog shape;
      defaults match the dashboard's existing /api/model/options
      contract.

Two latent bugs fixed by consolidation:

1. The dashboard read cfg.get('custom_providers') directly, missing
   the v12+ keyed providers: form. Now both surfaces go through
   get_compatible_custom_providers().

2. The TUI's canonical-merge keyed on is_user_defined to decide order.
   Section 3 of list_authenticated_providers sets is_user_defined=True
   on rows from the providers: config dict even when the slug is
   canonical \u2014 that silently demoted them to the picker tail.
   _reorder_canonical now keys on slug membership instead.

Stats: +666 / -145 (net +521). Module 240 LOC; 18 behavior tests.

This PR replaces the rejected #23369 (which bundled the consolidation
with new scriptable CLI surfaces \u2014 hermes models list/status, hermes
providers list \u2014 and a JSON contract that have no external user
demand). Just the refactor; the CLI surface is deferred to a separate
PR gated on actual demand.

Refs #23359.
2026-05-13 22:31:11 -07:00
teknium1
4ceab16893 fix(compression): keep default protect_first_n at 3 + align ABC
Follow-up on the salvaged feat commit:

- Keep the constructor / config / yaml-example default at 3 so existing
  gateway and CLI users see no behavioural change. PR #13754 (which this
  builds on) had lowered the default to 2 to chase pre-feature parity in
  the system-prompt-present case, at the cost of quietly halving the
  protected head for the gateway path (which strips the system prompt
  before calling compress()). With the new "system prompt is implicit"
  semantics, default 3 gives every caller a stable head shape.
- agent/context_engine.py: bring the ABC's protect_first_n docstring in
  line with the new semantics so plugin context engines interpret the
  config key the same way the built-in compressor does.
- tests: adjust the default-value test (3, not 2) and a stale comment;
  per-test protect_first_n=2/3/1 values added in PR #13754 stay as-is
  since those tests fix concrete head shapes.
2026-05-13 22:25:16 -07:00
snav
dee71a31e5 feat(compression): make protect_first_n configurable
The number of head messages preserved verbatim across context compactions
was previously hardcoded to 3 in AIAgent.__init__. Expose it as
`compression.protect_first_n` in config, matching the existing
`protect_last_n` pattern.

Motivation: users who rely on rolling compaction for long-running sessions
had the opening user/assistant exchange pinned as head forever, which
doesn't always match how they want the session framed after many
compactions. Lowering to 1 preserves the system prompt + first non-system
message; lowering to 0 preserves only the system prompt and lets the
entire first exchange age out naturally through the summary.

Semantics: `protect_first_n` counts non-system head messages protected
**in addition to** the system prompt, which is always implicitly protected
when present. Same meaning across both code paths:

  protect_first_n=0 → system prompt only (or nothing if no system message)
  protect_first_n=2 → system prompt + first 2 non-system messages (default)

This unifies the CLI path (which reads messages with the system prompt at
position 0) and the gateway path (where the gateway /compress handler
strips the system prompt before calling compress() — see
gateway/run.py L9150-9154 on the parent fork). Previously these two paths
disagreed:

  CLI path:     protect_first_n=1 → protect system prompt only
  Gateway path: protect_first_n=1 → protect first USER turn forever

In practice on long-running gateway sessions the old semantics pinned
whatever stale aside happened to be the first user message, reinserting
it into every compaction summary indefinitely.

Default chosen as 2 (not 3) so that the effective protected head count
remains 3 messages in the common case — assuming a system prompt is
present, default protection becomes system + 2 non-system = 3 total,
matching the pre-feature behaviour where `protect_first_n` was hardcoded
to protect 3 messages total. Sessions without a system prompt will see a
small behaviour change (2 protected head messages instead of 3), but this
is the rare path and the new semantics make the system-prompt-present
case the well-defined one.

Changes:

- agent/context_compressor.py: redefine protect_first_n as the count of
  non-system head messages protected beyond the implicit system-prompt
  guarantee; both paths converge. Constructor default updated to 2.
- hermes_cli/config.py: add `compression.protect_first_n` default (2),
  matching the new semantics. `show_config` label tweaked to
  'Protect first: N non-system head messages' for clarity.
- run_agent.py: read protect_first_n from config; 0 is now valid (system
  prompt is always implicitly protected).
- cli-config.yaml.example: document the new key and rationale.
- tests/agent/test_context_compressor.py: cover default, override, the
  end-to-end `protect_first_n=0` and `protect_first_n=1` behaviour,
  the no-system-prompt (gateway) path, and the new shared-semantics
  regression test.

Fixes #13751
Tested on Ubuntu 24.04.
2026-05-13 22:25:16 -07:00
snav
d863773c81 feat(discord): add thread_require_mention for multi-bot threads
By default, once Hermes participates in a Discord thread (auto-created on
@mention or replied in once) it auto-responds to every subsequent message
in that thread without requiring further @mentions. That's the right default
for one-on-one conversations and isolated channel threads.

But it's a confirmed footgun in multi-bot threads. When a user invokes one
bot per turn — addressing Codex first, then Hermes — every other bot in the
thread also fires on every message, burning credits and spamming the channel.
Author has hit this personally in active multi-bot research-team threads.

Add a new `discord.thread_require_mention` config key (env:
`DISCORD_THREAD_REQUIRE_MENTION`), default `false` to preserve existing
behavior. When `true`, the in-thread mention shortcut is disabled and
threads are gated the same way channels are. Explicit @mentions still pass
through as expected.

Mirrors the existing helper shape (config.extra > env > default) and the
existing yaml→env bridge pattern used by `require_mention`.

Changes:

- gateway/platforms/discord.py: new `_discord_thread_require_mention()`
  helper; in_bot_thread shortcut now AND's with `not _discord_thread_require_mention()`
- gateway/config.py: bridge `discord.thread_require_mention` from config.yaml
  to `DISCORD_THREAD_REQUIRE_MENTION` env var (mirrors the existing
  `require_mention` bridge two lines above)
- hermes_cli/config.py: add `thread_require_mention: False` default to
  DEFAULT_CONFIG['discord']
- tests/gateway/test_discord_free_response.py: 4 new tests covering default
  behaviour (in-thread shortcut still works), enabled behaviour (mention
  required in threads), enabled+mentioned (mention still passes through),
  and yaml-via-config.extra path. Also clears DISCORD_* env vars in the
  `adapter` fixture so process-env state from the contributor's shell
  doesn't leak into per-test behaviour.
- tests/gateway/test_config.py: 2 new tests covering the yaml→env bridge
  (both the apply-from-yaml and env-precedence-over-yaml paths)
- website/docs/user-guide/messaging/discord.md: document the new env var
  + config key with multi-bot rationale; cross-link from `auto_thread`
  section

Tested on Ubuntu 24.04.
2026-05-13 22:21:43 -07:00
simpolism
d557544560 fix(discord): keep free-response channels inline
Free-response channels are intended as lightweight chat surfaces — the bot
responds to every message without requiring an @mention. But the auto-thread
gate only checked DISCORD_NO_THREAD_CHANNELS, not DISCORD_FREE_RESPONSE_CHANNELS,
so every message in a free-response channel still spawned a brand-new thread.
That turns a chat channel into a thread-spawning machine: 1 thread per message.

The user-facing docs at website/docs/user-guide/messaging/discord.md already
describe the intended behavior ("Free-response channels also skip auto-threading
— the bot replies inline rather than spinning off a new thread per message"),
so this is a code-vs-docs gap, not a design change.

Fix: OR is_free_channel into skip_thread alongside the existing no_thread_channels
check. One-line production change.

Regression test added at tests/gateway/test_discord_free_response.py:
test_discord_free_response_channel_skips_auto_thread asserts that a message
in a free-response channel never calls _auto_create_thread.  Reverting the
one-line fix causes the test to fail with 'Expected mock to not have been
awaited. Awaited 1 times.' — i.e. the test demonstrates the bug concretely.
2026-05-13 22:21:18 -07:00
kshitijk4poor
3633c8690b refactor(plugins): add apply_yaml_config_fn registry hook
Lets platform plugins own their YAML→env config bridge instead of forcing
core gateway/config.py to know every platform's schema.

The hook receives the full parsed config.yaml and the platform's own
sub-dict, may mutate os.environ (env > YAML precedence preserved via the
standard `not os.getenv(...)` guards), and may return a dict to merge
into PlatformConfig.extra. It runs during load_gateway_config() after
the existing generic shared-key loop and before _apply_env_overrides(),
mirroring the env_enablement_fn dispatch pattern (#21306, #21331).

Pure addition — no behavior change for existing platforms. Each of the
eight platforms with hardcoded YAML→env blocks today (discord, telegram,
whatsapp, slack, dingtalk, mattermost, matrix, feishu, ~252 LOC in
gateway/config.py) can migrate in independent follow-up PRs; the
hardcoded blocks remain functional in the meantime, and their
`not os.getenv(...)` guards make them no-ops for any env var the hook
already set.

Test coverage: 10 new tests in tests/gateway/test_platform_registry.py
covering field default, callable acceptance, env mutation, extras
merge, both signature args, exception swallowing, missing/non-dict
sections, and env > YAML precedence.

Refs #3823, #24356.
Closes #24836.
2026-05-13 22:20:30 -07:00
Teknium
d5775fe988
feat(codex-runtime): skip unavailable plugins during migration (#25437)
Followup to PR #24182 — caught when scanning OpenClaw for recent codex
fixes we hadn't considered. OpenClaw learned the hard way (#80815) that
migrating plugins which codex itself reports as unavailable produces
config that fails at activation time.

Our /codex-runtime codex_app_server enable path queries codex's
plugin/list and migrates everything where installed=true. We were
trusting codex's installation state and ignoring its availability
field. So a plugin that's installed=true but availability=UNAVAILABLE
(broken local install) or REQUIRES_AUTH (OAuth expired or never
completed) would get an [plugins."<n>@openai-curated"] entry in
~/.codex/config.toml — and the user's first codex turn after enabling
the runtime would fail because codex refuses to activate it.

Fix: filter on availability in _query_codex_plugins(). Only emit
plugins where availability is empty (older codex versions without the
field — preserve backward compat) or explicitly AVAILABLE.

Tests:
  test_plugin_discovery_skips_unavailable_plugins — verifies 4 cases:
    - good-plugin (installed=True, availability=AVAILABLE) → migrated
    - broken-plugin (installed=True, availability=UNAVAILABLE) → skipped
    - auth-pending (installed=True, availability=REQUIRES_AUTH) → skipped
    - legacy-plugin (installed=True, no availability field) → migrated
      (older codex versions; preserve backward compat)

Docs:
  Added bullet to 'What's NOT migrated' list in the docs page calling
  out the availability filter and why.

Other OpenClaw codex PRs I reviewed but did NOT apply (with reasoning):
  - #81591 (load Codex for selectable models): we resolve runtime
    per-call already, no startup-time gating to fix
  - #81510 (cron compatibility): we documented cron as untested; their
    fix is for OpenClaw-specific cron orchestration shape
  - #81223 (rotate incompatible context-engine threads): we don't
    have a Lossless context engine equivalent
  - #80688 (constrain sandbox): we don't have an outer-sandbox concept
  - #80616 (release on turn_aborted): we already handle status=
    interrupted in turn/completed correctly
  - #80278 (expose activeModel in plugin SDK): not our surface
  - #80792 (default destructive_actions on): we don't expose that knob

56 codex-runtime migration tests still green (+1 new).
2026-05-13 22:20:27 -07:00