Commit graph

1856 commits

Author SHA1 Message Date
Dale Nguyen
dbbf102b8e fix(terminal): strip VIRTUAL_ENV/CONDA_PREFIX from terminal subprocess env
The Hermes gateway runs inside its own venv, so its process environment
carries VIRTUAL_ENV (and possibly CONDA_PREFIX). The terminal tool spawned
subprocesses inheriting those markers. When the agent ran `uv sync`,
`uv pip install`, `poetry install`, etc. in ANY other project directory,
those tools honored the inherited VIRTUAL_ENV and rebuilt/synced that
project's dependencies into the Hermes venv path — wiping Hermes' own runtime
deps (and, when the other project pinned a different Python, replacing the
interpreter), bricking the gateway on the next restart (#23473).

Strip VIRTUAL_ENV/CONDA_PREFIX in both subprocess-env construction points in
tools/environments/local.py — `_sanitize_subprocess_env` and `_make_run_env`
— via a shared `_ACTIVE_VENV_MARKER_VARS` constant. The Hermes venv stays
reachable because its bin dir is already first on PATH, so removing the
active-environment markers is safe and only prevents the cross-project clobber.

Adds TestActiveVenvMarkerStripping: end-to-end (markers in os.environ don't
reach the spawned subprocess) and unit coverage for both functions, plus a
guard on the marker constant.

Also adds the AUTHOR_MAP entry for the salvaged contributor.

Closes #23473
2026-06-28 01:04:20 +05:30
Brandon Zarnitz
9c81c938d3 fix(approval): honour tirith_fail_open=false on Tirith ImportError (#20733)
check_all_command_guards() swallowed ImportError from tools.tirith_security
with an unconditional pass, leaving tirith_result["action"] as "allow"
regardless of security.tirith_fail_open.  When an operator sets
tirith_fail_open: false they have explicitly opted into fail-closed
behaviour; a missing or broken Tirith module must not silently permit
command execution.

Inside the except ImportError handler, read the live security config.
When tirith_enabled is true and tirith_fail_open is false, synthesise a
"warn"-action Tirith result so the command flows through the normal
approval path (prompt the user, or block in cron/gateway contexts)
instead of bypassing it.  The default tirith_fail_open: true behaviour
is unchanged.

Adds three regression tests to tests/tools/test_approval.py:
- fail_open=true  + ImportError → silently allowed (no regression)
- fail_open=false + ImportError → approval callback invoked, command denied
- tirith_enabled=false           → always allowed regardless of fail_open

Fixes #20733

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

# Conflicts:
#	tests/tools/test_approval.py
2026-06-27 04:41:24 -07:00
Teknium
fe1c1c1121
fix(session_search): demote cron below interactive sessions in discover ranking (#53597)
Cron jobs accumulate large volumes of repetitive vocabulary (recurring
project names, dates, summaries) and out-number a user's interactive
sessions. Under bare BM25 they dominate the top FTS rows, so discover's
early-exit-at-N dedup collects only cron sessions and the user's own
conversations never surface — "recall blindness" (#19434).

- _order_for_recall() stable-sorts FTS rows so interactive sources rank
  above cron before lineage dedup; within each class BM25/recency order
  is preserved. Cron is demoted, not excluded, so it still surfaces when
  it is the only match.
- raise discover scan limit 50 -> 300 so buried interactive matches are
  in hand for the demotion pass.

Fixes the cron-flooding sub-bug of #19434. The split-brain sub-bug is
covered by #52798; the child-session sub-bug is superseded by in-place
compaction.
2026-06-27 04:41:22 -07:00
Teknium
cd592c105c
feat(send_message): native WhatsApp media delivery via Baileys bridge (#53598)
send_message with MEDIA:/path to a WhatsApp target previously dropped the
attachment: the WhatsApp branch never passed media_files, the plugin's
_standalone_send accepted the param but only POSTed text, and WhatsApp was
absent from the media-supported platform list.

- send_message_tool: add a Platform.WHATSAPP media block (mirrors Feishu) that
  routes media_files through the whatsapp plugin's standalone_sender_fn, and
  add whatsapp to the supported-media list strings.
- whatsapp adapter: _standalone_send now sends text first (skipped when the
  chunk is media-only), then uploads each file via the bridge /send-media
  endpoint with a mediaType derived from extension/is_voice/force_document, so
  images/videos/voice arrive as native bubbles instead of documents.
- _bridge_media_type classifier maps ext -> image|video|audio|document.

Closes #19105 (remaining send_message gap). Other items in the report
(inbound video paths, image_generate auto-deliver, history dedup, native
gateway bubbles) already landed on main.
2026-06-27 04:40:05 -07:00
Teknium
88c02469cc
fix(mcp): never permanently wedge the circuit breaker on a dead transport (#53599)
A long-running gateway session could permanently lose an MCP server: once a
stdio subprocess died (or transient drops accumulated over the session), the
run loop exhausted its reconnect budget and returned, orphaning the task. With
no listener for _reconnect_event, the circuit breaker's half-open probe could
never revive the server — every probe hit a dead/absent session, re-armed the
60s cooldown, and looped forever until a full gateway restart (#16788).

Root cause was split ownership of transport liveness between the run loop and
the tool handler, plus a permanent give-up path. Fixed by one invariant: a
non-shutdown server task is always reconnectable.

- run loop parks (deregisters phantom tools, then awaits _reconnect_event)
  instead of returning when the reconnect budget is exhausted, so the task
  stays alive as a dormant listener
- retry budget resets on every successful (re)connect, so a healthy
  long-lived server can't accumulate lifetime drops into a death sentence
- half-open probe with no live session signals a reconnect (reviving a
  parked/dead task and respawning a dead stdio subprocess) and returns a
  clean 'reconnecting' error instead of writing into a dead pipe
- breaker resets on successful session init across all transports
  (stdio/HTTP/SSE) — fully transport-agnostic, no PID/pipe polling

Builds on the closed-PR cluster for this issue: keeps #49255's deregister-on-
exhaustion insight and #21006's signal-don't-probe insight, discards the racy
os.kill PID machinery.

Co-authored-by: LeonSGP43 <LeonSGP43@users.noreply.github.com>
Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
2026-06-27 04:39:54 -07:00
teknium1
ab1f9b94c5 fix(telegram): accept @username chat_id in delivery paths (#13206)
TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed
all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid
literal for int()'. The Telegram Bot API accepts both a numeric chat_id and
an @username string; Hermes was force-coercing every chat_id with int().

Add normalize_telegram_chat_id() (returns int for numeric values, passes
@username strings through) and apply it at the Bot API send/edit sites in
the Telegram adapter and the send_message tool. Username targets are now
recognized as explicit targets in _parse_target_ref.

Reapplies the approach from #13274 (season179), whose branch predated the
gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py
relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah).

Co-authored-by: season179 <season.saw@gmail.com>
2026-06-27 04:01:58 -07:00
zapabob
e55ddc3e33 fix(mcp): suppress interactive OAuth stdin prompts during background discovery (#35927)
When an MCP server requires OAuth, the interactive `hermes` TUI froze on
startup: background MCP discovery hit the OAuth flow, which on an interactive
TTY spawns a daemon thread doing a blocking `sys.stdin.readline()` (the
"paste the redirect URL" fallback in mcp_oauth._wait_for_callback). That
thread competes with the TUI's own stdin reader for the same terminal, so
keystrokes get swallowed and the TUI appears frozen (up to the 300s OAuth
timeout). Reported symptom: "MCP OAuth: authorization required / Open this URL
... the tui is freezing, not respond to typing."

Add a thread-local `suppress_interactive_oauth()` context manager in
tools/mcp_oauth.py; `_is_interactive()` returns False while it's active, so the
stdin paste-thread and prompt are never created. Background discovery
(hermes_cli/mcp_startup.py, tui_gateway/entry.py) now runs discovery inside
that context, so OAuth-requiring servers soft-skip (raise
OAuthNonInteractiveError, already handled) instead of stealing the TUI's stdin.
A real `hermes mcp login` on the main thread is unaffected (thread-local).

Salvaged from #35945 by @zapabob (authorship preserved via cherry-pick;
resolved a conflict against main's new mcp_discovery_timeout / wait_for_mcp_
discovery refactor, keeping both). Verified E2E: with suppression the paste
prompt is NOT printed and no stdin thread spawns (raises OAuthNonInteractive
soft-skip); without it the prompt shows (the freeze). Mutation-verified
(removing the suppress check in _is_interactive fails the regression test).
76 tests pass, ruff clean.

Closes #35927.

SELF-REVIEW FIX: the original #35945 used threading.local(), which does NOT
propagate to the dedicated mcp-event-loop thread where OAuth actually runs
(discover_mcp_tools dispatches the connect via run_coroutine_threadsafe), so
the suppression was a NO-OP in production (the tests passed only by stubbing
out the cross-thread dispatch). Converted to a contextvars.ContextVar, which
asyncio copies onto the scheduled coroutine — empirically verified suppression
now holds on the mcp-event-loop thread through the real _run_on_mcp_loop path.
Added a cross-thread regression test (fails on threading.local, passes on the
ContextVar) so the no-op can't regress.
2026-06-27 04:59:23 +05:30
kshitijk4poor
a67ddf5983 fix: drop isinstance(str) guard so client.base_url fallback works with httpx.URL
The OpenAI SDK exposes client.base_url as an httpx.URL object, not str.
The isinstance(live_raw, str) guard made this branch dead code in
production. Use _normalized_runtime_url (which coerces via str()) so
the fallback actually fires.
2026-06-27 03:59:36 +05:30
xxxigm
25b7348457 fix(delegate): inherit subagent endpoint from parent active client
When parent_agent.base_url still carries a stale OpenRouter URL but the
live OpenAI client already points at local Ollama, subagents were routing
API calls to OpenRouter and failing with HTTP 401. Prefer _client_kwargs
and the mounted client base_url when they disagree with the surface field.
2026-06-27 03:59:36 +05:30
liuhao1024
515192c4b9 fix(tools): use start_new_session instead of preexec_fn to prevent SIGSEGV in multi-threaded processes
preexec_fn=os.setsid runs Python code in the forked child before exec,
which is unsafe in multi-threaded processes (CPython docs). When the
Desktop gateway loads native libraries (onnxruntime, BLAS, provider SDKs)
with active thread pools, the fork can SIGSEGV before the child execs.

Replace all preexec_fn usage with start_new_session=True, which provides
the same setsid/process-group semantics without running Python in the
fork. This is already the pattern used throughout hermes_cli/gateway.py
and hermes_cli/_subprocess_compat.py.

Fixes #46789
2026-06-27 03:08:41 +05:30
Teknium
525e1e775d
fix(skills): background review fork respects pinned skills (#53226)
The autonomous self-improvement review fork could still write to a pinned
skill — only external/bundled/hub-installed/protected-builtin skills were
guarded. The curator skips pinned skills from every auto-transition; the
review fork is the same kind of no-user-present actor and must too.

Adds a pin check to _background_review_write_guard so background-origin
edit/patch/delete/write_file/remove_file on a pinned skill are refused.
Stricter than the foreground _pinned_guard (delete-only) by design: with
no user in the loop there is no one to consent to an edit.

Fixes #25839
2026-06-26 12:49:33 -07:00
briandevans
3c8d3ecfa0 fix(approval): extend gateway-lifecycle guard to launchctl and pidof-based kills
The dangerous-command approval layer already blocks `hermes gateway
(stop|restart)`, `pkill/killall hermes|gateway`, and `kill ... $(pgrep ...)`.
A reporter noted on #33071 that the agent can still achieve the same
effect by driving launchd directly against the gateway's service label
(`launchctl stop ai.hermes.gateway`, `launchctl kickstart -k
system/ai.hermes.gateway`, etc.) or by substituting `pidof` for `pgrep`
in the kill-expansion form.

This widens the "Gateway lifecycle protection" block in
`tools/approval.py` to cover both vectors:

- `launchctl (stop|kickstart|bootout|unload|kill|disable|remove)`
  scoped to commands that target a Hermes label (`hermes`,
  `ai.hermes`). Read-only inspection (`launchctl print …`,
  `launchctl list`) and operations against unrelated labels remain
  unflagged.
- `kill ... $(pidof …)` and the backtick form, alongside the existing
  `pgrep` expansion. `pidof` is the BSD/Linux equivalent and is
  equally opaque to the `(pkill|killall) … hermes` name pattern.

Intentionally left out of scope: plain `kill -TERM <numeric_pid>` with
a PID looked up out-of-band. Catching that would require runtime PID
state and would break the existing
`TestPgrepKillExpansion::test_safe_kill_pid_not_flagged` contract,
which guarantees that a plain literal-PID `kill 12345` stays safe.
2026-06-26 11:38:28 -07:00
Teknium
3d735fe156
fix(skills-hub): surface per-tap providers (NVIDIA/OpenAI/...) in runtime search (#53191)
Natural-language skill search returned a short, arbitrary list and never
surfaced NVIDIA (or OpenAI/Anthropic/HuggingFace) skills. Two causes:

1. The runtime index collapses every GitHub tap into source="github", so
   there was no way to find or filter by provider at the CLI — the per-tap
   identity only existed in the docs-site catalog.
2. HermesIndexSource.search matched only name/description/tags (not the
   identifier or provider) and broke at the first `limit` hits in raw index
   order, burying the most relevant skills. `search` also defaulted to
   --limit 10 against an 86k-entry catalog.

Changes:
- GitHubSource stamps a per-tap provider label (extra.provider) on each
  skill via github_provider_for(); source stays "github" so dedup/floor/
  index-skip logic is untouched. Flows into the built index.
- HermesIndexSource.search now matches identifier + provider too, and
  collect-then-ranks (exact > prefix > whole-word > substring) instead of
  break-at-limit.
- --source nvidia|openai|anthropic|huggingface|voltagent|gstack|minimax
  provider filters for browse/search (narrows merged results by provider).
- search --limit default 10 -> 25; table Source column shows the provider
  label for github skills.

Tested: 181 unit tests pass; E2E against the live runtime index confirms
'nvidia'/'cuda' searches now surface NVIDIA-provider skills and
--source nvidia narrows to exactly the NVIDIA catalog.
2026-06-26 11:04:41 -07:00
liuhao1024
d9f1f1a1de fix(terminal): prefer $SHELL over bash for background process spawning (#42203)
On macOS, terminal(background=true) silently failed: the process returned a
session_id and exit_code=0 but the command never ran (empty stdout, no side
effects). Root cause is two interacting issues:

1. _find_shell was aliased to _find_bash, which prefers `shutil.which("bash")`
   → /bin/bash (GNU bash 3.2, still shipped on macOS) over $SHELL (/bin/zsh).
2. process_registry.spawn_local runs [shell, "-lic", "set +m; <cmd>"] with
   stdin=/dev/null. bash 3.2 as a login shell sources ~/.bash_profile, which on
   many macOS setups contains `exec /bin/zsh -l`; that exec replaces bash but
   drops the -c argument, so the command is swallowed (exit 0, no output).

Decouple _find_shell from _find_bash: _find_shell now prefers the user's
configured $SHELL on POSIX (the shell they actually log in with), falling back
to _find_bash when $SHELL is unset/missing. _find_bash is unchanged, so callers
that genuinely need bash (e.g. the _run_bash login-shell snapshot) keep bash
semantics. zsh handles -lic correctly even with redirected stdin.

Salvaged from #42219 by @liuhao1024 (authorship preserved via cherry-pick).
On top of the original (8 unit tests covering $SHELL-set/unset/missing/empty,
Windows-ignores-$SHELL, _find_bash-unchanged), added an E2E regression test
that reproduces the real bash-3.2 login-shell swallow (exit 0 / no file) and
asserts the shell _find_shell selects actually executes a -lic background
command. Mutation-verified: reverting _find_shell to the bash alias fails the
$SHELL-preference test. Bug reproduced directly: /bin/bash 3.2 -lic with a
.bash_profile->exec-zsh creates no file; zsh -lic does.

Closes #42203. Supersedes #42290.
2026-06-26 20:45:32 +05:30
kyssta-exe
07cc567dfa fix(security): add circuit breaker for tirith crashes to prevent agent hangs (#41400) 2026-06-26 15:26:08 +05:30
teknium1
fbfccbb3ee fix(security): align cron invisible-unicode set with install-time scanner
The cron runtime tripwire (_scan_cron_prompt) used a 10-char invisible-unicode
set while the install-time scanner (threat_patterns.INVISIBLE_CHARS) flags 17.
The cron-local set was missing U+2062-U+2064 (invisible math operators) and
U+2066-U+2069 (directional isolates), so a directive obfuscated with one of
those codepoints (e.g. "ig<U+2063>nore all previous instructions") slipped past
the runtime cron gate while being caught at install time.

Import the canonical set so the cron tripwire and install scanner can't drift
apart again. Emoji-ZWJ protection (_zwj_has_emoji_neighbour) is unchanged.

Fixes #35075

Co-authored-by: rlaope <piyrw9754@gmail.com>
2026-06-26 01:11:11 -07:00
Teknium
099df3cd89
fix(security): stop blocking AGENTS.md/SOUL.md that name an agent 'Praxis' (#52925)
The known_c2_framework threat pattern included 'praxis' in its
alternation alongside genuine offensive-security tool brands (Cobalt
Strike, Sliver, Havoc, Mythic, Metasploit, Brainworm). Unlike those
distinctive brand names, 'praxis' is a common English word (Greek for
practice/action) and a legitimate agent name, so any context file that
mentioned an agent named Praxis matched at 'context' scope and the whole
AGENTS.md / SOUL.md was replaced with a [BLOCKED] placeholder before it
reached the system prompt.

Remove 'praxis' from the alternation and add a guard comment: every
token in this list must be a distinctive tool brand, not a common word.
Real C2 brands still fire.
2026-06-26 00:36:01 -07:00
Max Hsu
075f93ad78 fix(mcp): auto-recover from invalid_client on stale OAuth client registration
Fixes #36767.

Two complementary recoveries for the recurring "delete three cache files and
re-auth by hand" ritual when an MCP server's dynamically-registered OAuth
client goes dead server-side (IdP redeploy / DB wipe / rebrand):

- Auto-heal (token-endpoint subset): HermesMCPOAuthProvider now sniffs
  auth-flow responses and, on a 400/401 `invalid_client` from the discovered
  token endpoint, backs up + deletes `<server>.client.json` and `.meta.json`
  and clears the in-memory client so the SDK re-runs RFC 7591 dynamic client
  registration on the next flow. Conservative by construction: only
  dynamically-registered (non config-supplied) clients, only the token
  endpoint, only on a word-boundary `invalid_client` match (so RFC 7591's
  `invalid_client_metadata` does not trip it); best-effort so a miss never
  breaks the live flow. Covers both code-exchange and refresh when the token
  endpoint was discovered. Tokens are preserved.

- `hermes mcp reauth [<name>|--all]`: the reporter's primary symptom — the
  IdP's in-browser "Redirect URI Mismatch" — produces no HTTP signal (the SDK
  only sees a callback timeout), so it cannot be auto-detected. The new
  command re-auths one or ALL `auth: oauth` servers, serially: one browser
  flow at a time, which also fixes the startup popup storm when several
  servers are stale at once. Single-server reauth is factored out of
  `mcp login` and shared.

Tests: +14 (poison helper x2; token-endpoint detection x5 incl. wrong-endpoint,
success-response, pre-registered, and invalid_client_metadata negative guards;
a bridge integration test driving the real async_auth_flow generator to prove
the detection hook preserves the bidirectional asend() forwarding contract;
reauth CLI x6). Verified against the pinned mcp==1.26.0: scripts/run_tests.sh
122/122 green for the touched suites; check-windows-footguns.py and ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 00:35:27 -07:00
kshitij
a28b939092
Merge pull request #52678 from kshitijk4poor/salvage/52502-fuzzy-boundary
fix(fuzzy-match): preserve boundary space after whitespace-normalized match (#52491)
2026-06-26 10:59:14 +05:30
yu-xin-c
96bc524a71 fix(curator): protect external skills from background curation 2026-06-25 22:03:02 -07:00
teknium1
6c58878e7d fix(browser): force secret-pattern redaction on browser_type display
Force redact_sensitive_text(force=True) on the browser_type text arg so
recognized credentials (API keys, tokens, JWTs) are masked in tool
progress, previews, callbacks, and return payloads even when the global
security.redact_secrets opt-out is set — a typed credential reaching chat
history is a security boundary, not log hygiene. Normal typed text matches
no pattern and stays fully readable for debuggability.

Tests assert the API-key-shaped secret is masked across every surface and
that normal text passes through unchanged.
2026-06-25 22:02:22 -07:00
rebel
8ff426e53b fix: redact browser typed text surfaces 2026-06-25 22:02:22 -07:00
Teknium
5b5c79a8ef
feat(kanban): typed block reasons + unblock-loop breaker (#52848)
* feat(kanban): typed block reasons + unblock-loop breaker

Stops the kanban blocked-task loop: a worker blocks a task, a cron
unblocks it, the worker re-blocks for the same reason, repeat forever.

block_task now takes a typed kind and a persistent block_recurrences
counter on the tasks table:

- kind=dependency routes to todo (parent-gated, auto-resumed), never
  the human 'blocked' bucket a cron would keep unblocking.
- needs_input/capability/transient/untyped land in blocked; each
  same-cause re-block after an unblock increments block_recurrences,
  and at BLOCK_RECURRENCE_LIMIT (default 2) the task routes to triage
  for a human instead of blocked.
- unblock_task no longer resets block_recurrences (the amnesia that
  let the loop run unbounded); complete_task clears it on success.

Wired through the worker kanban_block tool (new kind arg) and the
hermes kanban block --kind CLI flag, both reporting where the task
actually landed. Docs + 11 new tests; 536 existing kanban tests green.

* test(kanban): make second-block notify test use a distinct block cause

test_notifier_second_blocked_delivers blocked the same task twice with
the same (untyped) reason, which now trips the new unblock-loop breaker
and routes the second block to triage instead of blocked — so only one
'blocked' notification fired. The test's actual intent is that TWO
distinct block cycles each notify; give the two cycles different kinds
(needs_input then capability) so they're genuinely separate blocks. The
same-cause loop→triage path is covered by test_kanban_block_kinds.py.
2026-06-25 21:46:58 -07:00
Que0x
b8fc8c908b fix(approval): fold Windows absolute home paths in dangerous-command detection
The detector folds absolute home / Hermes-home prefixes into their canonical
~/ and ~/.hermes/ forms so static patterns catch /home/alice/.bashrc the same
way they catch ~/.bashrc (abd69b81). On native Windows this fold never fired,
so terminal commands writing to shell startup files, ~/.ssh/authorized_keys,
or ~/.hermes/config.yaml / .env returned "safe" and skipped the approval
prompt — and config.yaml carries the approval policy itself.

Two compounding causes:

1. The fold ran after the backslash-escape strip (r\m -> rm), which dissolves
   the backslash separators in a Windows path (C:\Users\alice\.bashrc ->
   C:Usersalice...) before the fold could match. It now runs before the strip.
2. The fold only recognized POSIX absolute paths and only the home prefix,
   leaving multi-segment backslash suffixes (\.ssh\authorized_keys) to be
   mangled by the strip.

Consolidated into _home_prefix_fold_regex / _fold_home_prefixes: match a home
prefix with either separator, capture the rest of the path token, and
normalize its separators to / so multi-segment patterns match. The
degenerate-path guard generalizes count("/") >= 2 to "at least two components
below the root" (also rejecting a bare drive root C:\). HOME is consulted
directly because Windows' expanduser ignores it; the more specific Hermes home
is folded first, longest candidate first, so neither fold clobbers the other.

POSIX behavior unchanged; the r\m -> rm anti-obfuscation strip still runs.
Adds TestWindowsAbsolutePathFolding, which monkeypatches a Windows-style
HOME/HERMES_HOME so the behavior is also exercised on the CI runner.
2026-06-25 17:49:39 -07:00
brooklyn!
ffa3d3c811
Merge pull request #49037 from NousResearch/bb/projects-paradigm
feat(desktop): first-class projects — sidebar, coding rail, review pane, and agent project tools
2026-06-25 17:49:05 -05:00
Gille
e7d2f0b93c fix(windows): suppress console flashes and harden gateway restarts 2026-06-25 14:42:38 -07:00
Brooklyn Nicholson
cb3f8ec03d fix(tools): isolate per-session worktree cwd 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
4ffdedd369 feat(tools): add project workspace tools 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
e7811345c1 feat(kanban): link tasks to project worktrees 2026-06-25 16:40:26 -05:00
Teknium
c6575df927
feat(moa): expose MoA presets as selectable virtual models (#46081)
* feat(moa): expose MoA presets as selectable virtual models

Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.

Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.

- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
  / one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.

* fix(moa): thread moa_config through _run_agent to _run_agent_inner

The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).

* fix(moa): classify moa as a virtual provider in the catalog

The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
  - test_provider_catalog: api_key providers must expose a credential env var
  - test_provider_parity: every hermes-model provider must be desktop-configurable

moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
2026-06-25 13:52:06 -07:00
kshitij
42bea9e298
Merge pull request #52618 from NousResearch/salvage/14185-todo-coercion
fix(tools): defensive type coercion in todo_tool for malformed LLM input (#14185)
2026-06-26 02:02:18 +05:30
liuhao1024
f23d077b5f fix(fuzzy-match): preserve boundary space after whitespace-normalized match
The trailing-whitespace expansion in _map_normalized_positions
unconditionally consumed whitespace after the matched region — including
the word-boundary space that separates the match from the next token.
This caused silent file corruption when the fuzzy matcher fell back to
the whitespace_normalized strategy.

Guard the expansion on the normalized match actually ending with
whitespace (i.e. the original had a run of spaces that were collapsed).
When the match ends with a non-space character, the first whitespace in
the original is a boundary and must not be consumed.

Fixes #52491
2026-06-26 01:55:27 +05:30
helix4u
4efec63a34 fix(tools): let session_search match session titles 2026-06-26 01:12:26 +05:30
rob-maron
525ee58b43 krea 2026-06-25 12:38:33 -07:00
Tranquil-Flow
0be10607d9 fix(tools): defensive type coercion in todo_tool for malformed LLM input (#14185)
todo_tool crashed with `AttributeError: 'str' object has no attribute 'get'`
when the LLM emitted the `todos` param as a JSON-encoded string instead of an
array, or as a list containing non-dict items (observed intermittently on
Claude 4.5/4.6/4.7, and after a prior tool-call rejection where the model
"self-corrects" by wrapping the list in json.dumps).

Three additive guards, no behavior change for well-formed input:
- todo_tool(): if `todos` is a str, json.loads it; reject unparseable strings
  and non-list values with a clear tool_error instead of crashing downstream.
- _validate(): non-dict items return a {id:"?", content:"(invalid item)"}
  placeholder rather than calling .get() on a str/int/None.
- _dedupe_by_id(): non-dict items get a synthetic key so _validate handles them.

Salvaged from #14785 by @Tranquil-Flow (authorship preserved via cherry-pick).
Comprehensive tests: JSON-string coercion (parse / unparseable / non-list /
non-string), non-dict list items (str/None/int/mixed), and a well-formed-
unchanged regression class — both guards mutation-verified to fail without them.

Closes #14185. Supersedes #14187, #22505, #14350 (same fix, less/no test
coverage) and #16952 (bundled unrelated scope-creep).
2026-06-25 23:42:42 +05:30
kshitij
d682f320b3
Merge pull request #52147 from NousResearch/salvage/29184-mcp-osv-nonblocking
fix(mcp): run OSV malware preflight off the event loop with a bounded timeout (#29184)
2026-06-25 23:39:44 +05:30
qdaszx
6305ac0e4b fix(mcp): run OSV malware preflight off the event loop with a bounded timeout (#29184)
During stdio MCP server startup, _run_stdio (an async method) called the
synchronous check_package_for_malware() inline. That makes a blocking
urllib HTTPS POST to api.osv.dev whose own timeout doesn't reliably cover a
stalled SSL handshake, so an intermittent network issue froze the entire
asyncio event loop for up to ~120s — blowing past the TUI/gateway's 15s
startup budget and showing "gateway startup timeout".

Run the check via asyncio.to_thread (off the loop) AND bound it with
asyncio.wait_for(timeout=_OSV_MALWARE_CHECK_TIMEOUT_S=12s). The malware check
is fail-open, so on timeout we log and proceed rather than blocking startup.

Salvaged from #29190 by @qdaszx (re-applied on current main — the call site
moved since the PR was opened), combining the to_thread approach also proposed
in #29192 by @ygd58. Two load-bearing tests: event-loop-not-blocked-during-
check and timeout-fails-open — both mutation-verified to fail against the old
inline blocking call.

Closes #29184.

Co-authored-by: ygd58 <buraysandro9@gmail.com>
2026-06-25 23:30:41 +05:30
kshitijk4poor
15ee2d6f04 refactor: lightweight sudo count + drop chatty multi-sudo tip
Replace _count_real_sudo_invocations (which called
_rewrite_real_sudo_invocations and discarded the rewritten string) with
a lightweight token scan that reuses the same tokeniser but skips string
building. Remove the agent-facing tip about nested sudo in heredocs —
the cache-cleared warning is enough.
2026-06-25 23:08:48 +05:30
xxxigm
8278d82e17 fix(terminal): improve sudo -S password delivery and cache invalidation
Pipe one password line per sudo invocation in compound commands so a correct
password is not rejected on the second `sudo` in `sudo a && sudo b`. Drop the
session cache when sudo returns Authentication failed, surface sudo_auth_failed
in the tool result, and add hints for interactive sessions.
2026-06-25 23:08:48 +05:30
brooklyn!
da0320bf40
Merge pull request #52285 from NousResearch/bb/verify-ledger
feat(agent): record coding verification evidence
2026-06-24 23:07:10 -05:00
Brooklyn Nicholson
fcbdf3c356 feat(agent): record coding verification evidence
Record foreground verification commands in a bounded, profile-scoped ledger and mark evidence stale when code edits change the workspace.
2026-06-24 22:35:27 -05:00
Victor Kyriazakos
b693bee100 feat(cron): thread-preferred continuable delivery (open a thread, mirror DM fallback)
Continuable cron jobs (attach_to_session / cron.mirror_delivery, default
OFF) now prefer a dedicated thread on thread-capable platforms, falling
back to origin-DM mirroring where threads don't exist.

- Thread-capable (Telegram topics, Discord/Slack threads): open a fresh
  thread for the job via the shipped adapter.create_handoff_thread,
  route the brief into it, and seed the thread-keyed session so the
  user's in-thread reply continues with full context. This is the
  'continuable cron opens its own thread' interface.
- DM-only (WhatsApp/Signal/SMS): create_handoff_thread returns None ->
  fall back to mirroring into the origin DM session (existing behaviour).

Reuses existing infrastructure end-to-end — no new adapter surface, no
provider-chain signature change:
- adapter.create_handoff_thread (already implemented per-platform,
  returns None on unsupported platforms = the fallback signal)
- the live SessionStore via adapter._session_store (already set on every
  adapter), reached without threading a new param through the frozen
  CronScheduler.start() contract
- gateway.mirror.mirror_to_session for the seed/append
- existing per-target delivery routing carries the new thread_id for free

Mirrors GatewayRunner._process_handoff's open-thread-or-fallback +
seed pattern, standalone for the cron delivery path. thread_seeded
guards against a double-mirror after seeding. Scoped to the origin
target only; fan-out/broadcast targets are never threaded or mirrored.

Config docs updated (cron.mirror_delivery) + cronjob tool
attach_to_session description reframed around continuable/thread-preferred.

Tests: +5 (thread id returned on thread platform; None on DM platform;
None without capability/loop; seed creates thread session + mirrors;
seed no-op on empty). 22/22 in TestCronDeliveryMirror; 532 cron tests
pass (4 failures pre-existing: croniter-not-installed + TZ).
2026-06-24 20:27:05 -07:00
Victor Kyriazakos
98f3c19282 feat(cron): pass origin user_id to delivery mirror (send_message parity)
Multi-participant parity with interactive send_message, which passes
HERMES_SESSION_USER_ID to gateway.mirror.mirror_to_session so the mirror
lands in the exact participant's session.

- cronjob_tools._origin_from_env now captures user_id from the session
  context at job-create time (alongside platform/chat_id/thread_id).
- _maybe_mirror_cron_delivery forwards user_id to mirror_to_session.
- _deliver_result threads origin.user_id through for the origin target.

Effect: in a per-user-isolated group chat (group_sessions_per_user=True,
the default), the mirror resolves to the member who scheduled the job
instead of conservatively no-op'ing on ambiguous candidates. DMs and
shared group/thread sessions are unaffected (single candidate). Default
still OFF.

Tests: helper forwards user_id; E2E _deliver_result forwards origin
user_id. 17/17 in TestCronDeliveryMirror; 527 cron tests pass (4 failures
pre-existing: croniter-not-installed + TZ, identical on baseline).
2026-06-24 20:27:05 -07:00
Victor Kyriazakos
1b181724fa feat(cron): optional mirror of cron delivery into target chat session
Adds an opt-in path so a cron job's delivered output is also appended to
the TARGET chat's gateway session transcript (as an assistant turn), so a
user reply to a recurring delivery (daily brief, reminder) is answered with
the delivery in context instead of 'what is that?' amnesia.

- Reuses the shipped gateway.mirror.mirror_to_session — the same primitive
  interactive send_message mirroring already uses. No messaging-toolset
  change (cron still can't call send_message; this rides delivery).
- Gated: per-job attach_to_session overrides global cron.mirror_delivery
  (config.yaml). Default OFF — historical isolation preserved byte-for-byte.
- Mirrors the CLEAN agent output, not the cron header/footer wrapper.
- Alternation/cache-safe: append lands at a turn boundary, never mid-loop,
  never mutates the cached system prompt. Cold-start (no target session)
  is a silent no-op; mirror errors never fail a successful delivery.
- Surfaced on the cronjob tool (attach_to_session) + config schema.

Driven by enterprise cron-as-control-plane use case. 10 new tests; full
cron + cronjob-tool suites pass (600).
2026-06-24 20:27:05 -07:00
Ben Barclay
c15945655f
fix(terminal): sanitize host/relative cwd OVERRIDE before it reaches docker run -w (#50636)
terminal_tool() resolves a per-task cwd override that WINS over config["cwd"]:

    cwd = overrides.get("cwd") or config["cwd"]

config["cwd"] is sanitized for container backends in _get_env_config() (host
prefixes /Users//home//C:\\/C:/ and relative paths are replaced with the
backend default /root). But the override was applied RAW — it was never run
through that guard. The gateway/TUI registers the host launch dir as a cwd
override for workspace tracking (tui_gateway/server.py _register_session_cwd
-> _terminal_task_cwd -> _session_cwd -> os.getcwd()), so on a container
backend a host path leaked straight to `docker run -w <host-path>`:

  - Windows desktop: -w C:\Users\<user>  -> container fails to start (exit 125)
  - POSIX:           -w /home/<user>      -> same

The ACP adapter translates its override cwd (acp_adapter/session.py
_translate_acp_cwd), but the gateway path did neither translation nor
sanitization, so the override bypassed the one guard that would have caught it.

Fix: extract the host/relative-path predicate into a shared
_is_unusable_container_cwd() helper (so the existing _get_env_config()
sanitizer and the new guard can't drift), and re-apply it to the *resolved*
cwd at the override-resolution site. Valid in-container override paths
(RL/benchmark sandboxes that set cwd to /workspace, /root, ...) are absolute
non-host paths and pass through untouched.

Tests: unit-pin the predicate (Windows backslash/forwardslash, POSIX home,
macOS /Users, relative, valid container paths) AND an E2E call-site pin that
drives terminal_tool() with a host-path override registered and asserts the
cwd reaching _create_environment is sanitized. Mutation-verified: reverting
the call-site guard makes the two host-path E2E tests fail (showing the raw
host path leaking) while the valid-/workspace-override test stays green.
2026-06-25 02:33:40 +00:00
Ben
d1cac0e5ef feat(gateway): scale-to-zero idle detection + dormant-quiesce (Phase 0)
The gateway-side BEHAVIOUR layer that consumes the relay scale-to-zero
primitives (gateway-gateway Phase 5): the gateway decides it is idle and
drives the relay transport dormant so the platform (Fly autostop:"suspend")
can suspend the now-traffic-idle machine, which wakes on the connector's
wakeUrl poke (decisions.md Q3=C', D1-D13).

- gateway/scale_to_zero.py: pure helpers — scale_to_zero_enabled (the NAS
  Labs HERMES_SCALE_TO_ZERO stamp, D11/Q8=A), parse_idle_timeout_seconds
  (config.yaml gateway.scale_to_zero.idle_timeout_minutes, D2),
  messaging_is_relay_only_or_absent (F6/D1), should_arm (D1/D11/§3.4(1)),
  is_idle (D2/D3/F7).
- gateway/run.py: _last_inbound_at clock stamped on user inbound in
  _handle_message (F13); the arm-gate + idle predicate + the
  _scale_to_zero_watcher dormant sequence (mark draining -> adapter
  go_dormant() -> cooldown), started only when armed. Deliberately NOT the
  stop path and NOT mark_resume_pending (F12/D13).
- tools/process_registry.py: has_any_active() for the bg-work guard (D3/F7).
- hermes_cli/config.py: gateway.scale_to_zero.idle_timeout_minutes default 5.

Tests: 38 pure-logic + 6 watcher (incl. bg-work regression guard proven RED).
Full relay + scale-to-zero suites: 184 passed. The 20 unrelated failures in
the broader run are PRE-EXISTING on origin/main (custom-provider/tools tests),
confirmed via a pristine baseline worktree.
2026-06-24 18:47:18 -07:00
Ben
cbd6ba1bdd fix(docker): redirect lazy installs to a durable target so opt-in backends work in the immutable image (#51136)
The published Docker image seals the agent venv (root-owned, read-only
/opt/hermes) and sets HERMES_DISABLE_LAZY_INSTALLS=1 so a runtime install
can't mutate and brick the core. But opt-in backends (Firecrawl web search,
Exa, Feishu, ...) deliberately keep their SDKs in tools/lazy_deps.py and out
of [all] (pyproject policy 2026-05-12: one quarantined release must not break
every install). The two policies collided: the SDK isn't baked in AND can't
lazy-install, so the default Firecrawl web_search/web_extract fail out of the
box in Docker (#51136), as do Exa (#49445) and Feishu (#50205).

Fix the whole class instead of baking in one backend: when
HERMES_LAZY_INSTALL_TARGET is set, lazy installs are redirected to a writable
dir on the durable /opt/data volume via `pip/uv install --target`, and that
dir is APPENDED to the end of sys.path. Because the core venv always wins
name collisions, a package installed this way can only ADD new modules — it
can never shadow, downgrade, or break a module the core ships. The worst a
bad/incompatible backend package can do is fail to import and report itself
unavailable; the agent core stays healthy. That structural guarantee is what
made it safe to seal the venv, and it is preserved here even with installs
re-enabled.

- tools/lazy_deps.py: durable-target mode — `--target` install + core-pinned
  `--constraint` file (shared deps resolve to core's versions, conflicts fail
  loudly at install time), append-only sys.path activation, ABI/Python-version
  stamp that wipes the store if an image rebuild bumps the interpreter, and a
  reworked gate so HERMES_DISABLE_LAZY_INSTALLS=1 redirects (rather than hard-
  blocks) when a target is set. security.allow_lazy_installs=false still
  disables installs in every mode.
- hermes_bootstrap.py: activate the durable target on sys.path at first import
  (before any backend imports its SDK) so packages installed on a previous run
  are importable on this run.
- Dockerfile: set HERMES_LAZY_INSTALL_TARGET=/opt/data/lazy-packages.
- docker/stage2-hook.sh: seed + chown the dir on the data volume.
- tests: real-install E2E proving installs land in the target, import cleanly,
  don't leak into the sealed venv, and that a core package is never shadowed;
  ABI-stamp wipe/preserve; gate matrix; Dockerfile/stage2 contract test.

Fixes #51136
2026-06-25 09:20:13 +10:00
liuhao1024
dbf0797335 fix(tools): catch mkdtemp OSError in tirith install to prevent unbounded retry and temp-dir leak (#51826)
When tempfile.mkdtemp() raises OSError (e.g. disk full), the exception
propagated past the try/finally block, so _mark_install_failed() was
never called. The 24h backoff marker never engaged, causing unbounded
retry on every command -- each attempt leaked a tirith-install-* temp
directory, eventually filling /tmp completely.

Fix: wrap mkdtemp in its own try/except OSError, returning
(None, "no_space") so the caller's normal failure path (including
_mark_install_failed) executes.

Salvaged from #51831 by @liuhao1024.

Closes #51826
2026-06-25 02:13:56 +05:30
Riyasudeen Farook
1e4df599ec fix(delegate): strip cronjob toolset from delegated children (#43466)
_strip_blocked_tools used a hardcoded set missing 'cronjob'. Children
on gateway platforms could inherit the cronjob toolset, scheduling
persistent jobs that outlive the delegation despite DELEGATE_BLOCKED_TOOLS.

Fix: derive the strip set from DELEGATE_BLOCKED_TOOLS at runtime so the
two lists can never drift. Add 'cronjob' to DELEGATE_BLOCKED_TOOLS for
documentation consistency. Two regression tests lock the invariant.

Salvaged from #43687 by @riyas22. Adapted test to current main (no
'messaging' toolset exists -- send_message is intentionally not
registered as an agent tool).

Closes #43466
2026-06-25 01:37:25 +05:30
liuhao1024
25e2312230 fix(memory): skip drift guard for add (append-only) action (#42874)
The drift guard (introduced for #26045) correctly protects replace/remove
from clobbering un-roundtrippable content, but it also fires on the add
path. Since add only appends and never overwrites, the guard is
unnecessary and causes false positives when prior add() calls in the same
session shift the byte count of the on-disk file.

Add skip_drift parameter to _reload_target() and pass True from add().
Replace/remove continue to use the drift guard unchanged.

Salvaged from #42880 by @liuhao1024.

Closes #42874
2026-06-25 00:51:12 +05:30