Commit graph

4678 commits

Author SHA1 Message Date
kshitijk4poor
7a315bd702 fix(tools): preserve live session cwd in terminal_tool, and keep ACP update_cwd authoritative
terminal_tool re-sent the init-time/config cwd on every command, clobbering
session-local `cd` state: the environment tracked the new directory in
`env.cwd`, but foreground/background calls forced the old cwd back. A small
`_resolve_command_cwd` resolver now applies the precedence
`workdir > live env.cwd > config/override cwd` to:
  - foreground `env.execute(...)`
  - background `process_registry.spawn_local(...)`
  - background `process_registry.spawn_via_env(...)`

Additionally, syncing the cwd onto the live cached env when a `cwd` override is
(re-)registered. Preferring live `env.cwd` would otherwise demote the ACP
`update_cwd` override (registered via `register_task_env_overrides` on
`session/load` / `session/resume`) below an already-set `env.cwd`, silently
ignoring an editor's mid-session project-root change once any command had run.
`register_task_env_overrides` now pushes a new cwd onto the cached env so an
explicit ACP cwd change wins, while ordinary in-session `cd` tracking is
preserved.

Regression coverage:
  - foreground/background commands follow live `env.cwd`
  - explicit `workdir` still overrides everything
  - registering a cwd override updates the live env cwd (ACP authority)
  - no-op when no live env exists; non-cwd overrides leave env.cwd untouched

Based on #35510 by @Dusk1e.

Co-authored-by: Dusk1e <yusufalweshdemir@gmail.com>
2026-05-31 23:50:40 +05:30
Teknium
1044d9f25d
fix(gateway): /stop can interrupt a sibling participant's run in a per-user thread (#35959)
In a per-user thread (thread_sessions_per_user=True), each participant
gets an isolated session key (...:{thread_id}:{user_id}). A run another
user started lives under a different key, so the caller's own /stop found
nothing and replied 'no active task to stop'.

When /stop finds no run under the caller's own key, fall back to
interrupting any running agent(s) sharing the caller's thread prefix
({chat_id}:{thread_id}), gated on _is_user_authorized. Thread-only — the
fallback returns [] for non-thread channels, and a prefix-collision guard
prevents thr1 from matching thr11.
2026-05-31 09:29:03 -07:00
Teknium
de4f40ed02
feat(setup): thin out setup — Quick Setup via Nous Portal + Full Setup defaults (#35723)
* feat(setup): Quick Setup routes through Nous Portal (OAuth + model + messaging)

First-time quick setup now goes straight to the Nous Portal provider
instead of showing the full provider picker. Runs the device-code OAuth
login, selects a Nous model, configures the terminal backend, and offers
messaging setup — applying recommended defaults for everything else.

- Rename menu entry to 'Quick Setup (Nous Portal)'.
- _run_first_time_quick_setup now calls _model_flow_nous (handles both the
  logged-out OAuth+model-select path and the logged-in curated picker),
  then re-syncs config from disk to avoid the #4172 stale-overwrite.
- Terminal / defaults / messaging steps unchanged.

* feat(setup): thin out Full Setup with happy defaults

Full Setup no longer asks for every config knob — anything with an
obvious default is applied silently and stays tunable via the per-section
commands (hermes setup agent|terminal|tts, hermes auth add).

- Model section: drop the same-provider rotation pool, vision-backend
  picker, and TTS provider sub-flows. Vision auto-detects from the main
  provider; TTS defaults to Edge; rotation lives in hermes auth add.
- Terminal section: keep the backend picker (Local default) and any
  required credentials (Modal token, SSH host/user/key, Daytona key),
  but stop prompting for container image, CPU/mem/disk resources, gateway
  cwd, and sudo password — all use defaults.
- Agent Settings: removed from the wizard. First installs get recommended
  defaults silently; existing installs keep their tuned values.
- New defaults: max_turns 90 -> 150, session_reset both -> none.
- Tests: reconfigure tests assert agent settings are no longer prompted
  on existing installs; drop 3 tests covering the deleted in-setup
  rotation flow.
2026-05-31 09:13:06 -07:00
fesalfayed
64628ea89b fix(anthropic): demote dead thinking signature when orphan-strip mutates the latest turn
Extended-thinking Claude models (4.6+, e.g. Opus 4.8) emit a signed `thinking`
block on assistant turns that also carry parallel `tool_use` blocks. Anthropic
signs that block against the full, original turn content.

When a parallel tool batch is interrupted before every `tool_result` returns,
`_strip_orphaned_tool_blocks` removes the unanswered `tool_use` on replay — which
mutates the turn. The latest-assistant branch of `_manage_thinking_signatures`
then replays the now-stale signed thinking block verbatim, and Anthropic rejects
the request with a non-retryable HTTP 400:

    messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
    assistant message cannot be modified. These blocks must remain as they were
    in the original response.

Because the poisoned turn is rebuilt from the persisted store every turn, the
gateway crash-loops with no self-recovery (a soft session reset does not clear
it). The drifting content index in the error is the changing count of stripped
`tool_use` blocks across rebuilds.

Fix: when orphan-stripping removes a `tool_use` from a turn that also holds a
thinking/redacted_thinking block, flag the turn. `_manage_thinking_signatures`
then demotes every thinking block on that latest turn to a plain text block
(preserving the reasoning text) instead of replaying a signature that can no
longer validate. An intact turn is unaffected — its signed thinking is still
replayed verbatim. The internal flag is stripped before the payload is sent.

Adds two regression tests:
- demotion when an orphaned parallel tool_use is stripped
- control: signed thinking preserved verbatim when nothing is stripped
2026-05-31 06:14:34 -07:00
Teknium
2b5268f716
revert: drop cumulative-resend tool-arg heuristic from shared streaming path (#35718) (#35860)
PR #35718 added a per-slot "cumulative-resend" latch to the universal
streaming tool-call accumulator to fix DeepSeek / Baidu Qianfan (#35592).
The latch fires when a delta is a strict superset of the accumulated
buffer (len(_new) > len(_prev) and _new.startswith(_prev)) and then
REPLACES the buffer instead of appending.

That superset test is not an unambiguous cumulative signature. A normal
incremental stream can emit a single fragment that restates an already-
accumulated prefix — trivially common in large code-patch arguments with
repeated lines / indentation — which trips the latch and clobbers the
accumulated buffer, corrupting the tool call. Observed in the wild on
Anthropic Opus (the primary model) building a large patch: corrupted /
short arguments → finish_reason='length' dead-end → session killed.

A guessing heuristic that can silently clobber a tool-call buffer has no
place on the path every provider and model shares. Reverting restores the
known-good plain `+=` accumulator. The #35592 narrow provider bug should
be re-addressed provider-gated so it is structurally impossible to touch
Anthropic / OpenAI incremental streams, rather than via a heuristic on the
shared path.

Reverts ca03486b6.
2026-05-31 06:14:32 -07:00
Teknium
f2d4cf4f76
fix(cli): clamp post-compression token sentinel in status bar (#35858)
The status bar read context_compressor.last_prompt_tokens directly with
an 'or 0' guard that only catches 0/None. Right after a compression the
compressor parks last_prompt_tokens at the -1 sentinel
(awaiting_real_usage_after_compression) until the next API call reports
real usage. -1 is truthy, so it sailed through and rendered as '-1/200K'
and '-1%' for that one transitional turn.

Clamp negative token/context-length values to 0 in the status-bar
snapshot so the gap reads as empty context until real usage arrives.
2026-05-31 06:03:01 -07:00
Teknium
1fc7bdc5e6
feat(tools): always show Nous Tool Gateway backends, login on select (#35792)
* feat(tools): always show Nous Tool Gateway backends, login on select

The Nous-managed Tool Gateway rows in `hermes tools` (Firecrawl, OpenAI
TTS, Browser Use, FAL image/video) were hidden unless the user was already
logged into Nous Portal with paid access. Now they are always listed.
Selecting one runs an inline Nous Portal device-code OAuth + entitlement
check — auth only, no inference-provider switch and no bulk 'enable all
tools' prompt (that stays in `hermes model`). The row only activates the
gateway once paid access is confirmed.

- _visible_providers: stop hiding managed_nous_feature rows (incl. those
  also flagged requires_nous_auth); pure pre-auth UX rows still gate on login
- nous_subscription.ensure_nous_portal_access(): auth + entitlement gate
  that preserves the user's active inference provider
- _configure_provider / _reconfigure_provider: run the inline gate for
  managed backends; write config only when entitled
- picker marker: 'via Nous Portal (login on select)' for logged-out users
- _hidden_nous_gateway_message: now a no-op (rows are never hidden)

* docs: hermes tools is a first-class Tool Gateway entry point

The Tool Gateway docs framed `hermes setup --portal` / `hermes model` as
the activation path and only mentioned `hermes tools` for mixing in your
own keys. With the inline-login change, picking a Nous-managed backend in
`hermes tools` is a complete path on its own — it logs you into Nous
Portal on select if needed, without switching your inference provider or
prompting to enable every other tool.

- tool-gateway.md: Get started now lists three peer entry points; new
  paragraph explaining login-on-select and the no-prompt fast path when
  OAuth is already active
- nous-portal.md + run-hermes-with-nous-portal.md: note that managed rows
  appear logged-out and trigger inline login on select
2026-05-31 03:39:17 -07:00
kshitijk4poor
087be00733 fix(cli): migrate setup model/provider pickers off simple_term_menu to curses
The setup provider->model sub-menu (and three sibling pickers) used
simple_term_menu.TerminalMenu, whose ESC and arrow-key handling was
unreliable across terminals — notably ESC failed to back out of the
model selection list on terminals that emit raw escape sequences (e.g.
Ghostty). The codebase already notes simple_term_menu 'conflicts with
/dev/tty' and causes 'ghost-duplication rendering', and a prior attempt
to migrate these (closed PR) confirmed the same root cause.

Route all four single-select pickers through the shared, already-hardened
curses_radiolist (which decodes raw CSI/SS3 escape sequences and handles
ESC consistently, fixed in #35776):

- auth.py _prompt_model_selection — model picker; the pricing column
  header and the unavailable-models block are passed as the radiolist
  description so they survive the curses screen clear. ESC now cancels.
- main.py _prompt_reasoning_effort_selection — reasoning-effort picker.
- main.py _model_flow_named_custom — named custom-provider model picker.
- main.py _remove_custom_provider — provider-removal picker.

simple_term_menu is no longer imported anywhere (only stale comments
referenced it; one in setup.py is corrected). The numbered-input
fallbacks are unchanged and still trigger on curses errors / non-TTY.

Tests: updated test_terminal_menu_fallbacks / test_reasoning_effort_menu
/ test_custom_provider_model_switch / test_model_provider_persistence to
drive the fallback via curses_radiolist errors instead of breaking
simple_term_menu. New test_setup_menu_curses_migration.py asserts each
picker routes through curses_radiolist, ESC cancels, and the pricing
header is preserved. Net -147/+183 (mostly the new test file; production
code shrinks by removing TerminalMenu boilerplate).
2026-05-31 03:19:37 -07:00
kshitijk4poor
3463c97a36 fix(cli): decode raw arrow-key escape sequences in curses menus
The setup wizard's provider/model pickers (curses_radiolist via
prompt_choice) bailed to the numbered "Select [1-N]" fallback the moment
a user pressed up or down. Root cause: even with keypad(True) — which
curses.wrapper sets — many terminals/terminfo entries deliver cursor keys
to getch() as raw CSI/SS3 byte sequences (e.g. 27, 91, 66 for arrow-down)
rather than the translated curses.KEY_DOWN. The menus matched only
curses.KEY_UP/KEY_DOWN and treated the leading 27 (ESC) as cancel, so
navigation dropped into the text fallback and the trailing bytes leaked
into the next input().

Add a shared read_menu_key() helper that decodes CSI/SS3 escape sequences
into normalized NAV_* actions (only a lone ESC, with no continuation byte
within a short timeout, still cancels) and consumes the tail of unhandled
sequences so stray bytes can't corrupt later input(). Route all three
curses menus (checklist, radiolist, single_select) through it.

Add regression tests covering raw CSI/SS3 arrows, translated KEY_*
constants, vim keys, lone-ESC cancel, and full consumption of unhandled
sequences (Delete/Home/End).
2026-05-31 13:59:56 +05:30
Teknium
0cd7d54b00
feat(kanban): goal_mode cards run workers in a /goal loop (#35710)
* feat(kanban): goal_mode cards run workers in a /goal loop

A goal_mode card wraps its dispatched worker in the Ralph-style goal
loop behind /goal: after each turn an auxiliary judge checks the
worker's response against the card title+body, and if not done the
worker keeps going in the SAME session until the judge agrees, the
worker terminates the task itself, or the turn budget runs out (which
blocks the card for human review — never a silent exit).

- kanban_db: goal_mode + goal_max_turns columns (additive migration),
  Task fields, create_task params, INSERT wiring, created-event payload.
- kanban_tools: goal_mode/goal_max_turns on the kanban_create tool so
  orchestrators can opt cards in when fanning out.
- kanban CLI: --goal / --goal-max-turns on 'kanban create'.
- dashboard API: goal_mode/goal_max_turns on the create endpoint
  (auto-surfaced back via asdict).
- _default_spawn: sets HERMES_KANBAN_GOAL_MODE / _GOAL_MAX_TURNS only
  when the card opts in.
- goals.run_kanban_goal_loop: standalone, callback-injected loop engine
  (no SessionDB persistence; ephemeral worker). cli.py quiet path calls
  it after the worker's first turn when the env vars are set.
- Docs: orchestrator skill + kanban feature page.

Tests: DB roundtrip + legacy migration, spawn env gating, and the loop's
continuation/completion/budget-block/finalize-nudge branches. E2E run
against a real kanban DB confirms a budget-exhausted goal worker lands
in a sticky blocked state.

* feat(kanban/dashboard): goal-mode toggle in the create form

Wires the goal_mode card setting into the dashboard UI (the plugin's
hand-written IIFE bundle, no build step):

- InlineCreate: 'goal mode' checkbox after the skills field; checking it
  reveals an optional 'max turns' number input. Both reset on submit and
  only post goal_mode/goal_max_turns when enabled.
- TaskDrawer: a 'Goal mode: on (max N turns)' MetaRow so a card's
  goal-mode setting is visible after creation (auto-fed by asdict via the
  existing _task_dict).

Live-tested through the running dashboard with a browser: created a
goal-mode card with max-turns=8, confirmed it persisted to the kanban DB
(goal_mode=1, goal_max_turns=8) and rendered back in the drawer as
'on (max 8 turns)'. No JS console errors.
2026-05-31 01:16:33 -07:00
kshitijk4poor
32899279a7 fix(gateway): detach pending_watchers batch + normalize LRU caches + align test fixtures + AUTHOR_MAP
Self-review follow-up on top of the salvaged perf fixes:

- gateway/run.py (both watcher-drain sites): the salvaged O(n^2) fix
  (#32708) replaced `while pending_watchers: pop(0)` with iterate-then-
  `watchers.clear()`, but `watchers` aliased the registry's live list.
  A watcher appended by a concurrent session during the `await
  asyncio.sleep(0)` yield would be cleared without ever being scheduled.
  Detach the batch atomically (`pending_watchers = []`) before iterating.

- gateway/platforms/bluebubbles.py: normalize the salvaged _guid_cache
  LRU (#30523) to match feishu/codebase precedent — module-level
  `_GUID_CACHE_SIZE` constant, `while len > cap`, and drop the redundant
  post-insert `move_to_end` (a fresh insert is already most-recent).

- gateway/platforms/feishu.py: drop the same redundant post-insert
  `move_to_end` from the salvaged _message_text_cache LRU (#23706).

- scripts/release.py: add AUTHOR_MAP entries for the salvaged commits'
  authors (amathxbt #22155, ErnestHysa #32636/#32708) so the contributor
  audit passes when these commits land on main.

- tests/tools/test_tool_output_limits.py: autouse fixture resets the new
  module-level limits cache between tests.

- tests/gateway/test_feishu.py: hand-built adapter fixture seeded
  _message_text_cache as a plain dict; it's now an OrderedDict, so the
  fixture type had to match.
2026-05-31 00:50:19 -07:00
Teknium
ca03486b6a
fix(streaming): stop duplicating tool-call args from cumulative-resend providers (#35718)
DeepSeek / Baidu Qianfan stream tool-call arguments in cumulative mode:
each chunk resends the full arguments-so-far instead of the new fragment.
The stream accumulator blindly concatenated arg deltas with +=, turning
that into '{...}{...}{...}', which failed json.loads and got nuked to '{}'
— a silently corrupted tool call (#35592). Worse on multi-param tools
(search_files, session_search, memory replace) because longer args take
more chunks, giving more resend opportunities.

- Per-slot cumulative latch in the stream accumulator: a delta that is a
  strict superset of the accumulated buffer marks the slot cumulative and
  replaces (not appends); exact duplicates are dropped only after latching.
  Incremental fragments are untouched (default += path).
- Backstop _collapse_repeated_json_arguments() in the repair pipeline
  collapses pure identical-resend buffers (K exact repeats of a valid-JSON
  unit) for providers that resend the complete object from chunk 1. Only
  reached after json.loads already failed, so compliant single objects are
  never touched.

Not a gateway or DeepSeek-model bug — any OpenAI-wire provider in
cumulative streaming mode is affected.
2026-05-31 00:19:39 -07:00
Teknium
0ffbcbbe7d
fix(vision): cap embedded image size before it wedges a session (#35732)
Resize vision tool-result images down to a 4 MB embed cap at load time,
not just at the 20 MB hard ceiling. A 5-20 MB image previously sailed
through the native fast path and got baked into conversation history,
where Anthropic's 5 MB per-image base64 limit rejected every subsequent
turn with a 400 — and because history is immutable, retries could never
clear it, permanently wedging the session.

Also harden the reactive shrink-recovery: it now returns False (don't
retry) when any oversized image part can't be brought under target, so
the single retry isn't burned re-sending a payload that will fail
identically. Previously it returned True after shrinking *any* part,
even when the actual oversized culprit survived.
2026-05-31 00:12:09 -07:00
Teknium
d4e7b2fc19
fix(voice): allow /voice over SSH when a sound server is reachable (#35719)
SSH sessions hard-failed voice mode on the presence of SSH_* env vars
alone, even when a PulseAudio/PipeWire server is running on the host and
audio works (ffplay/aplay/pw-play -> pulseaudio). Probe the default
sound-server sockets (PULSE_SERVER unix path, PULSE_RUNTIME_PATH/native,
$XDG_RUNTIME_DIR/{pulse/native,pipewire-0}) and actually connect() so a
stale socket doesn't count; downgrade the SSH branch to a notice when
audio is reachable. Mirrors the existing Docker/WSL forwarding handling.

Fixes #35622
2026-05-31 00:11:52 -07:00
teknium1
bd72d333dc fix(gateway,cron): reuse existing _HERMES_GATEWAY marker; tighten cron regex
Some checks are pending
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Docker Build and Publish / build-amd64 (push) Waiting to run
Docker Build and Publish / build-arm64 (push) Waiting to run
Docker Build and Publish / merge (push) Blocked by required conditions
Lint (ruff + ty) / ruff + ty diff (push) Waiting to run
Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run
Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run
Nix / nix (macos-latest) (push) Waiting to run
Nix / nix (ubuntu-latest) (push) Waiting to run
OSV-Scanner / Scan lockfiles (push) Waiting to run
Tests / test (1) (push) Waiting to run
Tests / test (2) (push) Waiting to run
Tests / test (3) (push) Waiting to run
Tests / test (4) (push) Waiting to run
Tests / test (5) (push) Waiting to run
Tests / test (6) (push) Waiting to run
Tests / save-durations (push) Blocked by required conditions
Tests / e2e (push) Waiting to run
uv.lock check / uv lock --check (push) Waiting to run
Follow-up to the salvaged #30728:
- Gateway already exports _HERMES_GATEWAY=1 at startup (gateway/run.py) and
  cli.py already keys off it. Drop the redundant new HERMES_IN_GATEWAY var;
  guard stop/restart on _HERMES_GATEWAY instead. One marker for one fact.
- Drop the greedy \bgateway.*restart alternation from the cron lifecycle
  filter — it false-positived on legit prompts that merely mention an
  unrelated gateway + a restart (API/payment gateway monitoring). The
  specific 'hermes gateway (restart|stop|start)' pattern already covers the
  real command.
- Rework the two negative guard tests to sentinel the first downstream call
  so they don't drive real signal delivery (tripped the live-system guard).
- Add false-positive regression cases to test_safe_commands.
2026-05-30 23:05:56 -07:00
simokiihamaki
5cd6c1717d fix(gateway,cron): prevent agent restart loops via self-targeting gateway commands (#30719)
Three defenses against SIGTERM-respawn loops when agent schedules its
own gateway restart under launchd/systemd KeepAlive:

1. HERMES_IN_GATEWAY env var: gateway sets it at startup; stop/restart
   subcommands refuse to run when set (exit 1 with clear message).

2. Cron create payload filter: regex pre-flight rejects prompts/scripts
   containing hermes gateway restart/stop, launchctl kickstart/unload,
   systemctl restart/stop, and pkill patterns.

3. 30 new tests: pattern matching (14), cron block (5), gateway guard (4),
   safe command negatives (7).
2026-05-30 23:05:56 -07:00
Teknium
9b78f411c8
fix(security): neutralize file paths in mutation-verifier footer (#35584) (#35684)
The per-turn file-mutation verifier footer rendered failed-write paths as
bare absolute paths in the user-facing response. The gateway's
extract_local_files() scans response text for bare paths ending in a
deliverable extension (.yaml/.json/etc.), validates os.path.isfile(), and
auto-attaches matches as native uploads — so a denied write to
~/.hermes/config.yaml surfaced the path in the footer and got the
credential file silently uploaded to the messaging channel.

The gateway denylist (validate_media_delivery_path) already blocks the
config.yaml case after #35634. This is defense-in-depth at the source:
backtick-wrap every path the footer emits — both the bullet path and any
path echoed inside the tool's error preview (the protected-file denial
message embeds the path in single quotes, which do NOT block the
extractor regex). extract_local_files skips paths inside inline-code
spans, so wrapping defeats auto-attachment for ANY protected file while
keeping the path human-readable.

- run_agent.py: _format_file_mutation_failure_footer wraps bullet paths;
  new _neutralize_footer_paths backticks any remaining bare path (covers
  the preview echo). staticmethod -> classmethod (caller unaffected).
- tests: backtick-wrap assertion + end-to-end extract_local_files leak test.
2026-05-30 23:05:23 -07:00
Teknium
dc4de14377
fix(telegram): retry on httpx pool timeout instead of dropping the send (#35664)
When PTB's general httpx pool is exhausted, it converts httpx.PoolTimeout
into telegram.error.TimedOut whose message states the request was *not*
sent to Telegram. The send retry loop treated all non-connect TimedOut as
non-retryable, so a pool timeout raised immediately, skipped all 3 retry
attempts, and was returned as retryable=False -- silently dropping the
message (agent responses, cron reports, etc.).

A pool timeout means the request never left the process, making it the
safest case to retry. Add _looks_like_pool_timeout() and treat it like a
connect timeout in both the in-loop retry decision and the outer retryable
determination, so pool timeouts flow through the existing backoff loop and
stay retryable on exhaustion.

Reported-by: q3874758 (#35610)
2026-05-30 22:58:16 -07:00
LeonSGP43
02d1da49de Block Hermes root config in media delivery 2026-05-30 21:02:36 -07:00
Teknium
50db2d9c12
feat(models): add deepseek-v4-flash, trim variants, group curated lists by maker (#35659)
* feat(models): add deepseek-v4-flash to OpenRouter + Nous curated lists

deepseek/deepseek-v4-flash was already in the native deepseek provider
catalog but missing from the curated OpenRouter and Nous Portal picker
lists. Added it to both and regenerated the model-catalog.json manifest
(drift guard requires same-PR regeneration).

* refactor(models): trim redundant variants, group curated lists by maker

Remove claude-opus-4.7/4.6, gpt-5.4-nano, gpt-5.3-codex,
gemini-3-pro-image-preview, gemini-3.1-flash-lite-preview, grok-4.20,
and the older gemini-3-pro-preview (Nous). Reorder both OPENROUTER_MODELS
and _PROVIDER_MODELS[nous] into contiguous per-maker blocks with comment
headers. Regenerated model-catalog.json (openrouter 27, nous 20).

* feat(models): add gemini-3-pro-preview to OpenRouter + Nous curated lists

Adds google/gemini-3-pro-preview to both curated pickers (new on
OpenRouter, restored on Nous). Regenerated model-catalog.json
(openrouter 28, nous 21).

* test(models): use claude-opus-4.8 in OpenRouter fetch fixtures

The two TestFetchOpenRouterModels tests mocked a live OpenRouter
response with claude-opus-4.6 and relied on it surviving the curated-list
filter. Since 4.6 was removed from OPENROUTER_MODELS, those models got
filtered out and the recommended tag shifted. Swap the fixture to
claude-opus-4.8 (still curated, still first in the Anthropic block).
2026-05-30 20:57:01 -07:00
teknium1
fe62424ac4 test(redact): assert Discord mentions pass through unchanged
Rewrite TestDiscordMentions as negative assertions (mentions survive the
redactor) and clean up the orphaned comment + dangling whitespace left by
removing _DISCORD_MENTION_RE. Follow-up to the salvaged #32259 fix for #35611.
2026-05-30 20:48:41 -07:00
Teknium
9ed9af2f7d
fix(update): name new config options in migration prompt; skip prompt for pure version bumps (#35658)
The 'hermes update' config-migration prompt printed only counts ('1 new
config option available') then asked 'configure them now?' without ever
saying what the options were. Users said no because they couldn't tell what
they were agreeing to. For pure config-format version bumps (no new
env/config keys) it still asked the question, where saying yes just bumped
the version and looked like a no-op.

- List each new env var / config key by name + description before prompting
  (cap at 8, then '… and N more'). The data was already available; we just
  threw it away and printed a count.
- Pure version bump (no new options): apply the format migration
  non-interactively and print what happened, instead of asking a misleading
  yes/no.

Reported by ScottFive and Tt2021.
2026-05-30 20:42:37 -07:00
helix4u
355af2c20f fix(session): survive missing FTS5 runtimes 2026-05-30 18:59:08 -07:00
helix4u
bdfba45247 fix(gateway): stop system tips from auto-uploading local files 2026-05-30 18:58:46 -07:00
Teknium
b1a25404b6
perf(read_file): make compact gutter the only format; drop HERMES_READ_GUTTER (#35532)
The compact "<n>|content" gutter from #35368 is now the sole behavior.
Removes the HERMES_READ_GUTTER=padded escape hatch and its env lookup —
no legacy fixed-width path to maintain. Padding was pure token overhead
(~48% more tokens than bare content, ~16% more than compact) with no
measured accuracy gain in the original A/B.

- file_operations.py: drop env lookup + os import; gutter always f"{i}|{line}"
- tests: drop the padded env-override test; compact assertions retained
2026-05-30 14:38:30 -07:00
brooklyn!
5921d66785
fix(cli): stop OSC 11 bg probe from trapping users in a stray editor (#35441)
Over SSH the OSC 11 background-color query round-trip routinely exceeds
the 100ms read budget, so _query_osc11_background() gives up and the late
reply lands after prompt_toolkit has grabbed the tty. prompt_toolkit then
injects the OSC payload as typed text and reads its BEL terminator
(\x07 = Ctrl+G) as a keystroke — Ctrl+G is the open-external-editor
binding, dropping the user into vi with garbage and no obvious way out.

- Skip the OSC 11 probe on remote sessions (SSH_CONNECTION/CLIENT/TTY);
  fall back to COLORFGBG / env hints / the dark default.
- Restore the tty with TCSAFLUSH instead of TCSANOW so any partial/late
  reply is scrubbed from the input buffer before pt reads it.
2026-05-30 11:55:12 -05:00
Sylw3ster
6a72af044c fix(managed-gateway): keep tool availability scans off the Nous token-refresh path 2026-05-30 07:58:08 -07:00
Teknium
96643b4a52
fix(file-tools): anchor relative-path resolution to absolute base; report resolved path (#35399)
Relative paths in write_file/patch could resolve against the agent PROCESS cwd
instead of the terminal's working directory. In a git-worktree session with a
stale TERMINAL_CWD='.' (a relative base), early edits silently landed in the
MAIN checkout, verified there, and reported success — while the agent inspected
the worktree and saw nothing, misreading it as the patch tool no-op'ing.

- _resolve_base_dir(): resolution base is now ALWAYS absolute. A relative
  TERMINAL_CWD is anchored to the process cwd once, deterministically, instead
  of being left to resolve()-time cwd. Live terminal cwd stays authoritative.
- write_file/patch pass the resolved absolute path to the shell FileOps layer
  so the tool layer and shell layer can't disagree about which file is edited.
- Responses now report the absolute resolved_path and files_modified, so a
  wrong-cwd mismatch is visible on the first call.
- _path_resolution_warning(): emits a _warning when a relative path resolves
  OUTSIDE the live terminal cwd (e.g. a worktree session writing into main).

Validation: 11 new unit tests + 43 live E2E assertions (worktree routing,
mid-session cd, V4A patches, divergence warning, absolute paths, consecutive
patches); 466 existing file/path/terminal tests green.
2026-05-30 07:55:36 -07:00
Sylw3ster
0c6e133c04 perf(cli): stop eager MCP discovery from blocking agent-capable startup 2026-05-30 07:45:26 -07:00
Teknium
b47cb1bbf2
feat(kanban): file attachments on tasks (#35395)
Tasks can now carry file attachments (PDFs, images, source docs) that
workers read directly — closes the gap where source material had to be
pasted as a path into the task body.

- kanban_db: task_attachments table (additive), Attachment dataclass,
  add/list/get/delete accessors, attachments_root/task_attachments_dir
  path helpers (per-board, HERMES_KANBAN_ATTACHMENTS_ROOT override)
- build_worker_context: surfaces each attachment's absolute path so the
  worker (full file/terminal tool access) reads it via read_file/pdftotext
- dashboard API: POST/GET/DELETE attachment routes (multipart upload,
  25MB cap, traversal-safe filenames, root-containment check on download)
- dashboard UI: Attachments section in the task drawer — upload button,
  list with download, per-row remove
- docs + tests (13 cases: DB accessors, REST round-trip, traversal
  rejection, collision suffixing, worker-context surfacing)

Closes #35338
2026-05-30 07:41:04 -07:00
teknium1
20d073fd0b test: update extract_local_files Windows-path test for new matching behavior
test_windows_path_not_matched asserted the pre-fix POSIX-only behavior.
The Windows drive-letter support now intentionally matches these paths,
so replace it with parametrized positive cases plus a relative-path
negative guard, mirroring tests/gateway/test_platform_base.py.
2026-05-30 07:38:03 -07:00
teknium1
1b955450e3 test: use raw docstring in test_run_tool_media_re to silence escape warning 2026-05-30 07:38:03 -07:00
Tranquil-Flow
51d165a8e7 fix(gateway): support Windows absolute paths in MEDIA tag regex and extract_local_files (#34632)
The MEDIA_TAG_CLEANUP_RE and extract_local_files path regex both used
(?:~/|/) to anchor paths, which only matches Unix-style absolute and
home-relative paths. Two additional _TOOL_MEDIA_RE patterns in run.py
had the same limitation. Windows absolute paths (C:\Users\..., D:/...)
were silently ignored, causing MEDIA directive delivery to fail.

Add [A-Za-z]:[/\\] as a third anchor alternative in all four regex
locations (base.py x2, run.py x2). Also update path separators in
extract_local_files from / to [/\\] so it can traverse Windows
directory trees.

Revert accidental + quantifier in MEDIA_TAG_CLEANUP_RE lookahead
that changed match-one to match-one-or-more (unrelated to fix).

Fixes: #34632
2026-05-30 07:38:03 -07:00
Teknium
45465b0d5d
fix(gateway): never auto-pause platforms on transient network/DNS failures (#35387)
The per-platform reconnect watcher auto-paused a platform after 10
consecutive reconnect failures, setting next_retry=inf and requiring a
manual /platform resume to recover. But both pause sites only ever fire
on *retryable* failures — non-retryable errors (bad auth) already drop
out of the retry queue earlier. So a transient DNS outage that spanned
the watcher's backoff window would silently park the bot forever, even
after connectivity returned.

The watcher's own docstring already promised 'retryable failures keep
retrying at the backoff cap indefinitely' — the code contradicted it.

Remove the auto-pause from both reconnect-failure branches. Retryable
failures now retry at the 5-min backoff cap forever and self-heal once
the network recovers. The circuit breaker (_pause_failed_platform /
_resume_paused_platform) stays for manual /platform pause|resume.

Fixes #35284.
2026-05-30 07:33:34 -07:00
teknium1
cddb7283d9 fix(gateway): config.yaml path for WhatsApp/Weixin text-batch delays
Convert the salvaged text-debounce delays from HERMES_* env vars to
config.yaml (gateway.platforms.<name>.extra.text_batch_delay_seconds /
text_batch_split_delay_seconds), per the '.env is for secrets only'
policy. Adds a finite/non-negative guard so bad YAML values fall back to
the defaults instead of crashing asyncio.sleep().

- whatsapp.py / weixin.py: read delays via _coerce_float_extra(config.extra)
- update Weixin content-dedup regression test for the deferred dispatch path
- add text-debounce coverage (whatsapp + weixin): defaults, config override,
  bad-value fallback, env-var-ignored, burst-collapse, lone-message
- docs: WhatsApp + Weixin config keys
2026-05-30 07:33:15 -07:00
Teknium
234ac00937
fix(dashboard): allow insecure WS peers on explicit non-loopback binds (#35386)
The merged 0.0.0.0/:: insecure-bind fix (#35141) did not cover binding
directly to a specific non-loopback address (e.g. a Tailscale/LAN IP via
--host 100.64.0.10 --insecure). In that mode the dashboard HTML loaded but
every WebSocket upgrade was rejected by the loopback-only peer guard, so
/chat connected then silently received no data.

Generalize _ws_client_is_allowed to lift the loopback-only peer gate for
any explicit non-loopback bound host, not just the 0.0.0.0/:: wildcard.
DNS-rebinding stays blocked: _ws_host_origin_is_allowed already requires
the Host header to exactly match the bound interface for explicit binds,
mirroring _is_accepted_host on the HTTP layer.

Co-authored-by: pxdsgnco <14163800+pxdsgnco@users.noreply.github.com>
2026-05-30 07:33:02 -07:00
teknium
433bffff51 fix(cli): surface oneshot agent exceptions to stderr with rc=1
Layer an exception guard on top of the empty-response fix so a crash
inside the agent (e.g. OSError from prompt_toolkit/Vt100 when stdout is a
non-TTY pipe, per #30623) is surfaced on the real stderr with rc=1 instead
of crashing past the redirect_stderr block. KeyboardInterrupt/SystemExit
are re-raised so Ctrl-C and explicit exits still propagate.

Also map briancl2 in scripts/release.py AUTHOR_MAP for the cherry-picked
empty-response commit.

Adapts the exception-guard approach from sweetcornna's PR #33818.

Co-authored-by: sweetcornna <96944678+ymylive@users.noreply.github.com>
2026-05-30 07:31:48 -07:00
Brian LaFlamme
9fbde54b51 fix(cli): fail closed on empty oneshot responses 2026-05-30 07:31:48 -07:00
Teknium
92ad7cc62c
fix(browser): recover from CDP DOM-node serialization crash in browser_console (#35385)
browser_console(expression="document.body") returned the cryptic CDP error
"Object reference chain is too long" instead of a usable result.

With returnByValue=true, Chrome deep-serializes the eval result; for a live
DOM Node/NodeList/Window that serialization overruns CDP's recursion guard
and fails the whole call with a protocol-level error (not a JS exception),
which _browser_eval surfaced raw.

- browser_supervisor.evaluate_runtime: on that specific error, retry once
  with returnByValue=false so Chrome returns the node's description string —
  the same graceful path already used for document.querySelector() results.
- browser_tool._browser_eval (CLI subprocess fallback): the subprocess can't
  retry, so convert the reference-chain error into actionable guidance
  (extract a primitive / use JSON.stringify) instead of leaking it raw.

No expression rewriting — normal evals (1+41 -> 42) are untouched.
2026-05-30 07:31:25 -07:00
Teknium
42bbd221e8 fix(compressor): strip stale handoff prefix on resume; reconcile #26290+#32787 (#35344)
A handoff persisted under an older SUMMARY_PREFIX can be inherited into a
resumed lineage. _strip_summary_prefix only matched the current/legacy
literal, so on re-compaction the old 'resume exactly from Active Task'
directive stayed embedded in the body and kept hijacking replies to new,
unrelated user messages.

- Add _HISTORICAL_SUMMARY_PREFIXES (pre-#35344 prefix) and strip/recognize
  them in _strip_summary_prefix + _is_context_summary_content so resumed
  stale handoffs are re-normalized to the current latest-message-wins prefix.
- Reconcile the overlapping Active Task template edits from the salvaged
  #26290 (reverse-signal cancellation) and #32787 (capture open questions /
  decisions, don't write None too eagerly) — both intents kept.
- Regression coverage in tests/agent/test_resume_stale_active_task.py.
- AUTHOR_MAP entries for both salvaged contributors.
2026-05-30 07:29:21 -07:00
Zhipeng Li
020601d41e fix(compression): drop conflicting 'resume Active Task' directive in summary prefix
SUMMARY_PREFIX previously contained two contradictory directives:

1. "treat it as background reference, NOT as active instructions"
   "Do NOT answer questions or fulfill requests mentioned in this summary"
   "Respond ONLY to the latest user message that appears AFTER this summary"

2. "Your current task is identified in the '## Active Task' section of the
    summary — resume exactly from there."

When the latest user message contradicted Active Task (e.g. 'stop the
i18n refactor', 'never mind, look at grafana instead'), models tended to
follow (2) anyway because 'resume exactly' is a strong, unambiguous
directive — leading to repeated re-surfacing of already-cancelled work
across turns, even after explicit 'stop'/'don't keep bringing that up'
messages from the user.

This change:
- Removes the conflicting 'resume exactly from Active Task' clause.
- Makes the precedence explicit: latest user message is the single source
  of truth; it WINS on conflict; cancelled Active Task / In Progress /
  Pending User Asks / Remaining Work must be discarded entirely (no
  'wrap up the old task first').
- Names canonical reverse signals (stop, undo, roll back, never mind,
  just verify, topic change) so the model recognizes them as cancellation
  triggers, not background context.
- Updates the summarizer template instruction so the LLM doesn't
  mechanically copy a cancelled task into Active Task on the next
  compaction (it's instructed to copy the reverse signal verbatim).
- Preserves: REFERENCE ONLY framing, MEMORY.md/USER.md authority, and
  the 'don't repeat work already reflected in session state' clause.

Adds tests/agent/test_summary_prefix_semantics.py to pin invariants so
the conflict can't regress.

Tested:
- All compaction tests pass: tests/agent/test_context_compressor.py,
  tests/agent/test_context_compressor_summary_continuity.py,
  tests/run_agent/test_413_compression.py,
  tests/run_agent/test_compression_persistence.py,
  tests/run_agent/test_compression_boundary_hook.py,
  tests/cli/test_manual_compress.py — 117/117 passing.
- Tested on macOS.
2026-05-30 07:29:21 -07:00
teknium1
182739fcda test(interrupt): assert no leaked tid instead of no-op block
Follow-up on the #35309 regression test: the trailing `with _lock: pass`
asserted nothing. Replace it with a concrete assertion that
_interrupted_threads is empty after the worker exits, directly verifying
the leak the fix prevents.
2026-05-30 07:28:11 -07:00
liuhao1024
bede3cf12d fix(tools): wrap _run_tool cleanup in finally to prevent interrupt state leak
When _invoke_tool raises a BaseException (CancelledError, KeyboardInterrupt),
the cleanup code at the end of _run_tool was bypassed because it sat outside
the except block (which only catches Exception).  ThreadPoolExecutor recycles
thread IDs, so the leaked tid in _interrupted_threads poisons the next tool
scheduled on that thread — it instantly aborts with 'Interrupted'.

Move the discard + _set_interrupt(False) into a finally block so cleanup
runs regardless of how the worker exits.

Fixes #35309
2026-05-30 07:28:11 -07:00
Teknium
2b16b756a7
fix(gateway): recover model on post-interrupt turn; gate fallback status (#35381)
Empty model could reach the API on a recovery turn after stream_interrupt_abort,
failing HTTP 400 "No models provided" with no recovery — the session went
silent until the user manually re-sent (#35314).

- gateway/run.py: cache last-successfully-resolved model per session (+ a
  process-wide slot); when a fresh config read returns an empty model on a
  recovery turn, reuse the last-known-good instead of building model="".
- run_agent.py + agent/conversation_loop.py: only emit "trying fallback..."
  status when a fallback chain actually exists, so the UI stops announcing a
  fallback that will never run (also #17446).
- tests: empty-model recovery + _has_pending_fallback gate.
2026-05-30 07:28:06 -07:00
Teknium
ea6eaabd8f
perf(read_file): compact line-number gutter — ~14% fewer tokens per read (#35368)
read_file's gutter used a fixed-width zero/space-padded prefix
("     1|content"). The padding is pure token overhead: measured with
cl100k on real Hermes source, the padded gutter costs ~48% more tokens
than bare content and ~16% more than a compact "<n>|content" gutter,
because the leading spaces tokenize into extra tokens on every line.

Switched the default to the compact "<n>|content" form. An A/B
(Sonnet 4.6 via OpenRouter, 2 passes, 4-task battery, every claim
verified against ground truth) showed:
  - padded  : 4/4 PASS both passes
  - compact : 4/4 PASS both passes  ← keeps line-referencing + patch
  - none    : 3/4 PASS both passes  ← dropping numbers entirely made
              the model hand-count lines and answer off-by-one (33 vs 34)

So we keep the line numbers (the model genuinely uses them to reference
lines) but drop the wasteful padding — capturing ~14% of the read-token
cost with zero measured accuracy change. Dropping numbers entirely
(the larger 33% saving) is rejected: it regresses line-referencing.

patch/fuzzy_match never consumed the gutter (they match old_string text
and compute char offsets internally), so editing is unaffected. No
downstream parser keys on the fixed-width columns. HERMES_READ_GUTTER=
padded restores the legacy format for anyone relying on alignment.

Tests: updated the 3 format assertions to the compact gutter; added an
env-override test for the legacy padded format. 209 file-tool tests green.
2026-05-30 07:01:22 -07:00
Teknium
5f84c9144a
fix(file-tools): handle UTF-8 BOM in read_file / write_file / patch (#35278)
Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text
files. We had no awareness of it, so: read_file surfaced a phantom
U+FEFF as the first character; patch matches against the true first
line could miss; and a write/patch round-trip silently stripped the
marker, changing the file's byte signature.

Now:
- read_file / read_file_raw strip a single leading BOM so the model
  never sees it (only on the first chunk — the marker lives at byte 0).
- patch_replace strips the BOM before fuzzy-matching (so an exact
  first-line match works) and its post-write verification compares
  BOM-stripped content.
- write_file restores the BOM when the original file had one and the
  new content doesn't, mirroring the existing line-ending preservation
  (detect on disk via a cheap `head -c 3` probe or reuse pre_content,
  re-prepend across the edit). Guards against double-BOM.

Mid-content U+FEFF is left alone (it's data there, not a file marker).

Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read
strips, write preserves, no-BOM-when-original-had-none, no-double-BOM,
patch round-trip preserves, patch matches first line through a BOM,
plus helper unit tests. 208 file-tool tests green.
2026-05-30 06:25:50 -07:00
quen0xi
6d2727ef1c fix(discord): bridge explicit allow_from configuration to env var mapping 2026-05-30 05:23:55 -07:00
quen0xi
0bfe19ba17 fix(gateway): merge nested gateway.platforms configuration block 2026-05-30 05:23:55 -07:00
Teknium
61268ff7a9
feat(cli): add hermes prompt-size diagnostic (#35276)
Adds a 'hermes prompt-size' command that reports the fixed prompt budget
for a fresh session: system prompt total, skills index, memory, user
profile, prompt tiers, and tool-schema JSON bytes. Runs offline (dummy
credentials force the direct-construction path, no network call).

Lets users see which block dominates their per-call payload — the skills
index is often the largest single block when many skills are installed
(issue #34667). Zero model-tool footprint: it's a top-level CLI
subcommand, not an agent tool.

--platform <name> simulates a channel's platform hint; --json emits a
machine-readable breakdown.

Closes #34667
2026-05-30 02:53:42 -07:00
kshitijk4poor
cbf851ae1d perf(tui): stop slow/dead MCP servers from freezing TUI startup
The 'summoning hermes…' phase blocked on gateway.ready, which ran MCP
tool discovery inline. Any configured-but-unreachable MCP server burned
its full connect-retry backoff (1+2+4s ≈ 7s) before the composer
appeared — startup went from instant to ~7.5s of dead air for anyone
with a down stdio/http server in mcp_servers.

Move discovery into a background daemon thread so gateway.ready fires
immediately; tools register into the shared registry as servers connect,
and the agent isn't built until the first prompt. Measured spawn→ready:
~7500ms → ~115ms (dead twozero_td server in config).

Also drop rich.console + prompt_toolkit off banner.py's import path
(lazy-imported inside cprint/build_welcome_banner). tui_gateway.server
imports banner only to reach the lightweight prefetch_update_check
helper; the eager rich/pt imports added ~45ms before gateway.ready for
no benefit. tui_gateway.server import: ~115ms → ~69ms.
2026-05-30 02:53:37 -07:00