Commit graph

6374 commits

Author SHA1 Message Date
Dale Nguyen
dbbf102b8e fix(terminal): strip VIRTUAL_ENV/CONDA_PREFIX from terminal subprocess env
The Hermes gateway runs inside its own venv, so its process environment
carries VIRTUAL_ENV (and possibly CONDA_PREFIX). The terminal tool spawned
subprocesses inheriting those markers. When the agent ran `uv sync`,
`uv pip install`, `poetry install`, etc. in ANY other project directory,
those tools honored the inherited VIRTUAL_ENV and rebuilt/synced that
project's dependencies into the Hermes venv path — wiping Hermes' own runtime
deps (and, when the other project pinned a different Python, replacing the
interpreter), bricking the gateway on the next restart (#23473).

Strip VIRTUAL_ENV/CONDA_PREFIX in both subprocess-env construction points in
tools/environments/local.py — `_sanitize_subprocess_env` and `_make_run_env`
— via a shared `_ACTIVE_VENV_MARKER_VARS` constant. The Hermes venv stays
reachable because its bin dir is already first on PATH, so removing the
active-environment markers is safe and only prevents the cross-project clobber.

Adds TestActiveVenvMarkerStripping: end-to-end (markers in os.environ don't
reach the spawned subprocess) and unit coverage for both functions, plus a
guard on the marker constant.

Also adds the AUTHOR_MAP entry for the salvaged contributor.

Closes #23473
2026-06-28 01:04:20 +05:30
Teknium
d470ed0c4c
fix(cli): commit tool scrollback lines in verbose mode (non-streaming/MoA) (#53785)
In the interactive CLI, the aggregator's tool calls under a MoA preset (or
any non-streaming model call, e.g. copilot-acp) appeared to overwrite each
other instead of building scrollable history. Each tool only updated the
transient spinner line; no committed scrollback line was printed.

Root cause: persistent tool lines in _on_tool_progress's tool.completed
branch were gated on tool_progress_mode in {all, new}, omitting 'verbose'.
Streaming models hid the bug because _on_tool_gen_start commits a 'preparing'
line per tool during streaming; non-streaming calls (MoA forces
_use_streaming=False) never emit that, so under 'verbose' there was no
committed line at all — only the self-overwriting spinner.

'verbose' is strictly more than 'all', so it now commits the same scrollback
line. Verified live via interactive PTY on the MoA opus-gpt preset: three
terminal calls in turn 1 and two in turn 2 each render as separate persistent
lines.
2026-06-27 12:29:55 -07:00
Teknium
227e6c0143
fix(moa): resolve context window from the aggregator, not the 256K default (#53780)
A MoA session's model is the preset name (e.g. 'opus-gpt') and its base_url is
the virtual local endpoint, so get_model_context_length() missed every probe
and fell through to the 256K fallback — even when the aggregator is a 1M-context
model. The acting model in MoA IS the aggregator, so resolve the context window
from the aggregator slot's real provider+model.

- model_metadata.get_model_context_length: when provider=='moa', resolve the
  preset's aggregator slot through resolve_runtime_provider and recurse with the
  aggregator's real provider/model/base_url. Explicit model.context_length still
  wins (checked first); falls through to the generic default if resolution fails.

Tests: opus-gpt preset now reports 1M (the aggregator window), config override
still honored.
2026-06-27 12:08:09 -07:00
ailthrim
25ec01f79f fix(desktop): don't purge Electron cache / mirror-retry after a late build failure
`hermes desktop` / `hermes update` recover from a corrupt Electron download by
purging the cached zip + re-downloading and retrying the pack, and then by
falling back to a public mirror. That recovery is only meaningful when the
packaged executable is MISSING — the signature of a partial/corrupt unpack.

A LATE failure such as macOS code signing (#40187) leaves
`Hermes.app/Contents/MacOS/Hermes` (or the platform equivalent) in place.
Re-downloading Electron can't repair a signing failure, so the purge +
slow mirror retry just grind through another identical failure before the
build finally errors out.

Gate both recovery blocks on `_desktop_packaged_executable(desktop_dir) is None`
so a build that already produced the executable fails fast instead of
triggering the destructive download recovery. The corrupt-download path
(executable missing) is unchanged.

Salvage of #42782, re-applied onto current main (the surrounding recovery was
refactored to `_electron_dist_ok` / `_redownload_electron_dist` since the PR
was opened). Adds a regression test asserting no purge / mirror retry runs when
the executable exists, and updates the existing retry/mirror tests to model the
corrupt-download case (executable absent) the recovery is actually for.

Related to #40187 (the residual cache-purge sub-issue; the signing failure
itself is fixed by #52591).
2026-06-28 00:29:34 +05:30
teknium1
1ef19bad90 fix(model): show MoA preset picker on selection and label MoA in the banner
Selecting 'Mixture of Agents' in the `hermes model` provider picker fell
through silently — select_provider_and_model had no moa branch, so it just
reprinted the current model/provider summary and exited. And the CLI session
banner rendered the bare preset name (e.g. 'opus-gpt · Nous Research'),
which is meaningless out of context.

- Add _model_flow_moa: always lists the available presets (even one), then
  prints the full reference-models + aggregator breakdown for the selection
  and persists model.provider=moa / model.default=<preset> (dropping stale
  base_url + endpoint creds, since moa is a virtual local provider).
- Wire the branch into select_provider_and_model.
- build_welcome_banner takes provider; when 'moa' it renders
  'MoA: <preset> · agg <aggregator>' instead of a bare slug. Both CLI call
  sites pass self.provider.

Tests: 2 new banner tests (moa + non-moa unchanged); E2E verified the picker
persists the preset and clears stale base_url/api_key.
2026-06-27 11:45:07 -07:00
konsisumer
1b6ebb24c0 fix(agent): validate OpenRouter provider sort before request dispatch 2026-06-27 11:43:08 -07:00
Teknium
27322612b4
fix(update): route loud build/installer output to update.log instead of the terminal (#53616)
* fix(update): route loud build/installer output to update.log instead of the terminal

hermes update flooded the terminal with the full vite asset dump,
electron-builder logs, npm deprecation warnings from the desktop build,
and the cua-driver installer's 'Next steps' wall. All of that is
low-signal noise the user doesn't need on a successful update.

- Capture the desktop --build-only subprocess (vite + electron-builder)
  into ~/.hermes/logs/update.log; print a one-line status, and on
  failure surface the last 15 lines + a pointer to the full log.
- Capture the cua-driver installer's output when verbose=False (the
  hermes update refresh path); concise upgrade line is unchanged.
- Add _log_only_write() / _run_logged_subprocess() helpers that write to
  the update.log handle without echoing to the terminal.

The repo-root npm install keeps streaming (capture_output=False) — that
is the deliberate #18840 guard so a slow postinstall download doesn't
look hung. The desktop npm install is a separate Electron process with
no such progress concern and is captured.

* fix(update): persist full cua-driver installer output to update.log

The captured cua-driver installer output was only sent to logger.debug
(agent.log) on failure, so the 'Next steps' wall was lost from
update.log entirely on success. Write the full captured output straight
to the update.log handle (sys.stdout._log) on both success and failure,
matching the desktop-build capture, so update.log keeps the complete
record of everything an update did.
2026-06-27 11:43:01 -07:00
Teknium
190e1ffac9
fix(redact): mask passwords in lowercase/dotted config keys (#53590)
The secret redactor only matched uppercase env-style keys ([A-Z0-9_]),
so config-file assignments like spring.datasource.password=secret,
app.api.key=xyz, and YAML password: secret leaked verbatim when the
agent ran cat/grep on application.properties or .env files (issue #16413).

Adds three case-insensitive config-key matchers that run only in a
config-file context, preserving the existing #4367 (lowercase code/prose)
and web-URL-passthrough carve-outs:
  - _CFG_DOTTED_RE: namespaced keys (contain a dot) — unambiguously config
  - _CFG_ANCHORED_RE: bare secret-word keys at line start (incl. export)
  - _YAML_ASSIGN_RE: unquoted colon config (password: value)
Value capture stops at whitespace and '&' so form bodies stay pair-wise;
the '://' guard keeps intentional web-URL query-param passthrough intact.

Reported-by: Murtaza1211
2026-06-27 04:43:28 -07:00
Teknium
917f6bdb00
fix(tools): let vision pick any provider+model, not just OpenRouter (#53606)
* fix(tools): let vision pick any provider+model, not just OpenRouter

hermes tools → configure → vision no longer forces an OPENROUTER_API_KEY.
It now offers the same any-provider surface as the model command: Auto
(use main model / aggregator fallback), pick any authenticated provider +
model, or a custom OpenAI-compatible endpoint. Selections persist to
auxiliary.vision.{provider,model,base_url} — the keys the vision resolver
already reads. Custom endpoint pins provider=custom so base_url routes
correctly. Reconfigure path uses the same picker instead of re-prompting
for OPENROUTER_API_KEY.

* docs: add PR infographic for vision any-provider picker
2026-06-27 04:41:42 -07:00
Brandon Zarnitz
9c81c938d3 fix(approval): honour tirith_fail_open=false on Tirith ImportError (#20733)
check_all_command_guards() swallowed ImportError from tools.tirith_security
with an unconditional pass, leaving tirith_result["action"] as "allow"
regardless of security.tirith_fail_open.  When an operator sets
tirith_fail_open: false they have explicitly opted into fail-closed
behaviour; a missing or broken Tirith module must not silently permit
command execution.

Inside the except ImportError handler, read the live security config.
When tirith_enabled is true and tirith_fail_open is false, synthesise a
"warn"-action Tirith result so the command flows through the normal
approval path (prompt the user, or block in cron/gateway contexts)
instead of bypassing it.  The default tirith_fail_open: true behaviour
is unchanged.

Adds three regression tests to tests/tools/test_approval.py:
- fail_open=true  + ImportError → silently allowed (no regression)
- fail_open=false + ImportError → approval callback invoked, command denied
- tirith_enabled=false           → always allowed regardless of fail_open

Fixes #20733

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

# Conflicts:
#	tests/tools/test_approval.py
2026-06-27 04:41:24 -07:00
Teknium
fe1c1c1121
fix(session_search): demote cron below interactive sessions in discover ranking (#53597)
Cron jobs accumulate large volumes of repetitive vocabulary (recurring
project names, dates, summaries) and out-number a user's interactive
sessions. Under bare BM25 they dominate the top FTS rows, so discover's
early-exit-at-N dedup collects only cron sessions and the user's own
conversations never surface — "recall blindness" (#19434).

- _order_for_recall() stable-sorts FTS rows so interactive sources rank
  above cron before lineage dedup; within each class BM25/recency order
  is preserved. Cron is demoted, not excluded, so it still surfaces when
  it is the only match.
- raise discover scan limit 50 -> 300 so buried interactive matches are
  in hand for the demotion pass.

Fixes the cron-flooding sub-bug of #19434. The split-brain sub-bug is
covered by #52798; the child-session sub-bug is superseded by in-place
compaction.
2026-06-27 04:41:22 -07:00
Teknium
cd592c105c
feat(send_message): native WhatsApp media delivery via Baileys bridge (#53598)
send_message with MEDIA:/path to a WhatsApp target previously dropped the
attachment: the WhatsApp branch never passed media_files, the plugin's
_standalone_send accepted the param but only POSTed text, and WhatsApp was
absent from the media-supported platform list.

- send_message_tool: add a Platform.WHATSAPP media block (mirrors Feishu) that
  routes media_files through the whatsapp plugin's standalone_sender_fn, and
  add whatsapp to the supported-media list strings.
- whatsapp adapter: _standalone_send now sends text first (skipped when the
  chunk is media-only), then uploads each file via the bridge /send-media
  endpoint with a mediaType derived from extension/is_voice/force_document, so
  images/videos/voice arrive as native bubbles instead of documents.
- _bridge_media_type classifier maps ext -> image|video|audio|document.

Closes #19105 (remaining send_message gap). Other items in the report
(inbound video paths, image_generate auto-deliver, history dedup, native
gateway bubbles) already landed on main.
2026-06-27 04:40:05 -07:00
Teknium
88c02469cc
fix(mcp): never permanently wedge the circuit breaker on a dead transport (#53599)
A long-running gateway session could permanently lose an MCP server: once a
stdio subprocess died (or transient drops accumulated over the session), the
run loop exhausted its reconnect budget and returned, orphaning the task. With
no listener for _reconnect_event, the circuit breaker's half-open probe could
never revive the server — every probe hit a dead/absent session, re-armed the
60s cooldown, and looped forever until a full gateway restart (#16788).

Root cause was split ownership of transport liveness between the run loop and
the tool handler, plus a permanent give-up path. Fixed by one invariant: a
non-shutdown server task is always reconnectable.

- run loop parks (deregisters phantom tools, then awaits _reconnect_event)
  instead of returning when the reconnect budget is exhausted, so the task
  stays alive as a dormant listener
- retry budget resets on every successful (re)connect, so a healthy
  long-lived server can't accumulate lifetime drops into a death sentence
- half-open probe with no live session signals a reconnect (reviving a
  parked/dead task and respawning a dead stdio subprocess) and returns a
  clean 'reconnecting' error instead of writing into a dead pipe
- breaker resets on successful session init across all transports
  (stdio/HTTP/SSE) — fully transport-agnostic, no PID/pipe polling

Builds on the closed-PR cluster for this issue: keeps #49255's deregister-on-
exhaustion insight and #21006's signal-don't-probe insight, discards the racy
os.kill PID machinery.

Co-authored-by: LeonSGP43 <LeonSGP43@users.noreply.github.com>
Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
2026-06-27 04:39:54 -07:00
r266-tech
dbc925b755 Guard oversized Telegram video downloads 2026-06-27 04:39:48 -07:00
Teknium
02b32e2d7c
fix(moa): call reference + aggregator models through their provider's real route (#53580)
MoA was calling reference and aggregator models through a bare
call_llm(provider=slot["provider"], model=slot["model"]) with a forced
temperature and a forced max_tokens (the preset's hardcoded 4096). That left
base_url/api_key/api_mode unresolved — so the auxiliary auto-detector guessed
the API surface instead of using the provider's real runtime, and the 4096 cap
truncated long aggregator syntheses.

A MoA slot is just a model selection and must be called the same way any model
is called elsewhere. Each slot is now resolved through resolve_runtime_provider
(the canonical provider→api_mode/base_url/api_key resolver the CLI, gateway, and
delegate_task all use) via a new _slot_runtime() helper, and the resolved
endpoint is passed into call_llm. So a reference/aggregator gets its provider's
actual API surface — MiniMax → anthropic_messages, GPT-5/o-series →
max_completion_tokens, custom endpoints → their base_url — identical to how that
model is handled as the acting model.

MoA also no longer imposes its own output cap: max_tokens defaults to None
(omitted → the model's real maximum) for references and is passed through from
the caller for the aggregator. The preset's hardcoded 4096 is gone. The
max_tokens preset config field is left in place (config/web/desktop unchanged);
it is simply no longer applied as a forced cap.

Tests: slots route through resolve_runtime_provider with resolved base_url/
api_key; resolution errors fall back to bare provider/model; neither call
carries an output cap even when the preset config still contains max_tokens.
2026-06-27 04:39:42 -07:00
herbalizer404
3fe16e3cd5 fix(fallback): attach credential pool after provider switch
When automatic fallback activates a provider that differs from the
primary, try_activate_fallback() cleared the primary's pool (to avoid
cross-provider base_url contamination, #33163) but never loaded the
fallback provider's own pool. The fallback then ran with no pool, so
rate_limit/billing/auth recovery couldn't rotate its credentials.

After clearing a mismatched pool, load_pool(fb_provider) and attach it
when it has credentials, so provider-specific rotation continues to
work on the fallback target.
2026-06-27 04:39:26 -07:00
Tranquil-Flow
635841d210 fix(agent): reload credential pool on switch_model provider change (#52727)
switch_model() swapped model/provider/base_url/api_key but never
refreshed agent._credential_pool, which stays bound to the original
provider. recover_with_credential_pool() then sees a pool.provider !=
agent.provider mismatch and short-circuits — so a 429/401 on the new
provider gets no rotation and falls through to fallback instead.

Reload load_pool(new_provider) inside switch_model when the provider
changes (or the pool is missing). The reload is inside the protected
swap block and the pool is added to the rollback snapshot, so a failed
client rebuild restores the original pool.

Fixes #16678, #52727.
2026-06-27 04:39:26 -07:00
Teknium
2002bb49a7
test(telegram): make config-bridge tests immune to ambient .env pollution (#53594)
test_config_bridges_telegram_group_settings and
test_config_bridges_telegram_user_allowlists asserted the YAML→env bridge
via os.environ. A developer's real ~/.hermes/.env can repopulate TELEGRAM_*
vars during load_gateway_config(): the microsoft_teams plugin runs
load_dotenv(find_dotenv(usecwd=True)) at import time, which walks up from the
cwd (under ~/.hermes/ in worktrees) and reloads the user's .env, defeating the
env-over-YAML bridge for any key present there (e.g. TELEGRAM_GROUP_ALLOWED_CHATS).

Assert the returned PlatformConfig.extra instead — it is parsed straight from
the test's config.yaml and is immune to that ambient leak. free_response_chats
is bridged to the env var only (not extra), and TELEGRAM_FREE_RESPONSE_CHATS
doesn't appear in developer .env files, so it stays a deterministic os.environ
assertion.
2026-06-27 04:36:45 -07:00
Teknium
d4c2217e87
fix(gateway): offload /model switch off the event loop (#53603)
The Telegram/Discord /model command's actual switch calls switch_model()
directly on the asyncio event loop. switch_model() can fall through to a
synchronous models.dev HTTP fetch (requests.get, 15s timeout) on a cold or
expired cache, freezing the gateway for up to 15s and dropping the Telegram
connection while a user switches models.

The picker provider-list and fallback text-list sites were already offloaded
(#41289), but the two _switch_model() calls — the picker callback and the
direct /model <name> path — were not. Wrap both in asyncio.to_thread.

Closes #20525.
2026-06-27 04:36:22 -07:00
Teknium
caf4dcc7ad
fix(whatsapp): resolve phone↔LID aliases in adapter DM/group allowlist (#53588)
Some checks failed
CI / Detect affected areas (push) Waiting to run
CI / Python tests (push) Blocked by required conditions
CI / Python lints (push) Blocked by required conditions
CI / TypeScript (push) Blocked by required conditions
CI / Docs Site (push) Blocked by required conditions
CI / Deny unrelated histories (push) Blocked by required conditions
CI / Check contributors (push) Blocked by required conditions
CI / Check uv.lock (push) Blocked by required conditions
CI / Lint Docker scripts (push) Blocked by required conditions
CI / Build&Test Docker image (push) Blocked by required conditions
CI / Supply-chain scan (push) Blocked by required conditions
CI / OSV scan (push) Waiting to run
CI / All required checks pass (push) Blocked by required conditions
Deploy Site / deploy-vercel (push) Waiting to run
Deploy Site / deploy-docs (push) Waiting to run
Build Skills Index / build-index (push) Has been cancelled
Build Skills Index / trigger-deploy (push) Has been cancelled
The adapter-level intake gate (_is_dm_allowed / _is_group_allowed, reached
via _should_process_message) did a raw set-membership check against the
configured allowlist. WhatsApp now delivers inbound DM senders in LID form
(<id>@lid) while operators configure allowlists with phone numbers, so the
check never matched and every DM from an allowed contact was silently
dropped before the gateway authz layer ran.

Route both gates through the existing gateway.whatsapp_identity.
expand_whatsapp_aliases helper (already used by gateway authz and session
keys), which walks the bridge's lid-mapping-*.json session files. Phone and
LID forms now resolve to each other in both directions; exact JID matches,
wildcard, disabled/open policies, and empty-allowlist fail-closed behavior
are all preserved.

Fixes #14486
2026-06-27 04:17:12 -07:00
teknium1
38e7bd8a08 fix(agent): classify 429 'overloaded' bodies as overloaded, not rate_limit
Z.AI / Zhipu reuse HTTP 429 for server-wide overload. The 429 status
path classified these unconditionally as rate_limit with
should_rotate_credential=True, so an overloaded provider exhausted the
credential pool after two errors — fatal for a single-key user, who has
nothing to rotate to.

The credential is valid; the server is just busy. Disambiguate the 429
body against a shared _OVERLOADED_PATTERNS list and route overload
language to FailoverReason.overloaded (retryable, no rotation), matching
the existing 503/529 path and the message-only path (#52890). Genuine
rate limits (no overload language) still rotate.

Extracted the inline overloaded tuple #52890 added into the shared
_OVERLOADED_PATTERNS constant so the status-code and message paths use
one list.

Closes #14038.
2026-06-27 04:16:54 -07:00
ms-alan
16192103f4 fix(config): accept placeholder base_url in custom provider validation
_normalize_custom_provider_entry() ran urlparse() on base_url and dropped
any entry whose value was an un-expanded placeholder, so a caller reaching
the normalizer with raw config (e.g. the Dockerized gateway path) silently
skipped the provider with a 'not a valid URL' warning. Skip URL validation
when the candidate contains a placeholder token — both ${ENV_VAR} env-refs
and bare {region}-style templates — since those are expanded at runtime.

Closes #14457
2026-06-27 04:15:27 -07:00
HiddenPuppy
b34771fc06 fix(cli): disable prompt_toolkit CPR queries to stop escape-sequence leak (#13870)
prompt_toolkit's renderer sends ESC[6n cursor-position queries before
painting in non-fullscreen mode; the terminal replies ESC[<row>;<col>R.
Over SSH/cloudflared tunnels and slow PTYs these replies race past the
input parser and land in the display as raw '20;1R21;1R' text, and the
pending-CPR future can stall the renderer so the prompt freezes after the
agent's final answer.

Build the prompt_toolkit output with enable_cpr=False so CPR is marked
NOT_SUPPORTED up front and ESC[6n is never sent. This is the root-cause
counterpart to the existing input-side _strip_leaked_terminal_responses
scrubbing. Vt100_Output.from_pty() does not expose enable_cpr in
prompt_toolkit 3.x, so _build_cpr_disabled_output() reproduces its
get_size setup and calls the constructor directly; it returns None on any
failure so startup falls back to the default output.

Verified in a real PTY: baseline emits 1 ESC[6n query, the fix emits 0,
banner/UI render identically. Layout is unaffected — with CPR off the
renderer sizes the prompt to its preferred height (the same fallback
prompt_toolkit uses on any terminal that doesn't answer CPR).

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-27 04:15:20 -07:00
LeonSGP43
e7c013494d fix(agent): preserve nested API error bodies 2026-06-27 04:13:53 -07:00
Teknium
5ab4136631
fix(webui): switch provider when Config-page model field changes (#53583)
The dashboard Config tab's Model field is a flat string with no provider
info. _denormalize_config_from_web only updated model.default and kept the
stale provider, so picking an OpenRouter model while the default provider was
ollama-local left provider=ollama-local and every call 404'd.

When the model string actually changes, infer the serving provider — curated
catalog first, then a vendor/model-slug heuristic for non-aggregator providers
— and route the switch through the existing _normalize_main_model_assignment /
_apply_main_model_assignment chokepoints so stale base_url/api_mode/api_key are
cleared on a provider change and preserved on a same-provider re-pick. Saving
an unchanged model never re-detects, so unrelated config saves keep an explicit
provider.

Closes #14058
2026-06-27 04:13:44 -07:00
teknium1
7ee0b68973 fix(gateway,feishu): refuse executor resurrection during real shutdown
Add an explicit _closing guard to both owned executors so the
recreate-on-shutdown path only recovers from an *external* teardown of
the loop default — never resurrects a pool the gateway/adapter itself
stopped. _shutdown_*executor() sets the flag; _get_*executor() raises if
closing; feishu connect() re-arms on reconnect. Updates the gateway
recreate test to assert the refusal contract and adds feishu coverage.
2026-06-27 04:13:09 -07:00
teknium1
b296915c82 fix(feishu): route blocking SDK calls through an adapter-owned executor
Feishu SDK calls ran on asyncio's shared default executor, so a torn-down
default executor wedged every send with 'Executor shutdown has been called'
and left the gateway a zombie (#10849). The adapter now owns a
ThreadPoolExecutor recreated on demand if shut down, mirroring the
gateway-owned executor change. Routes all 17 self._client SDK calls through
_run_blocking; shuts the pool down on disconnect.
2026-06-27 04:13:09 -07:00
konsisumer
1011c07966 fix(gateway): use owned executor for agent work 2026-06-27 04:13:09 -07:00
LeonSGP43
52a09d8faf fix(byterover): honor auto extract config 2026-06-27 04:04:15 -07:00
teknium1
f062cf076b fix(agent): also treat provider=ollama as an Ollama GLM backend
Follow-up to the #13971 fix: a genuine native Ollama provider reached
through a reverse proxy carries no ollama/:11434 URL signature, so the
restricted detection would miss it. Add provider=="ollama" as an
explicit True case (idea from #14789, @Tranquil-Flow) and cover both it
and the #13971 LiteLLM-proxy-to-zai false-positive with E2E tests.
2026-06-27 04:03:07 -07:00
YuShu
00a8252b7d fix(agent): scope Ollama/GLM stop-to-length heuristic to Ollama only
The _is_ollama_glm_backend() function was too broad: any local endpoint
running a GLM model was treated as Ollama, triggering the stop->length
misreport heuristic introduced in 8011aa3. This caused false truncation
detection on sglang, vLLM, LM Studio, and other non-Ollama servers that
correctly report finish_reason.

When a GLM model on sglang/vLLM returned finish_reason='stop', the agent
mistakenly reclassified it as 'length' if the response didn't end with
a whitelisted punctuation character (ASCII or CJK). This particularly
affected Chinese-language responses and Markdown-formatted text.

Root cause: the is_local_endpoint() fallback assumed any local GLM
endpoint = Ollama. But many non-Ollama servers also run on localhost.

Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via
its distinctive signatures (port 11434, 'ollama' in URL). All other
local servers are assumed to report finish_reason correctly.

This is the correct tradeoff because:
- False negatives (Ollama at custom port, heuristic not triggered) only
  mean the user sees a truncated response — same as having no heuristic
- False positives (non-Ollama server, heuristic wrongly triggered) inject
  spurious continuation messages into the conversation — strictly worse

Adds two tests:
- sglang GLM response is NOT reclassified as truncated
- Ollama GLM on port 11434 still triggers the heuristic as before

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-27 04:03:07 -07:00
teknium1
ab1f9b94c5 fix(telegram): accept @username chat_id in delivery paths (#13206)
TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed
all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid
literal for int()'. The Telegram Bot API accepts both a numeric chat_id and
an @username string; Hermes was force-coercing every chat_id with int().

Add normalize_telegram_chat_id() (returns int for numeric values, passes
@username strings through) and apply it at the Bot API send/edit sites in
the Telegram adapter and the send_message tool. Username targets are now
recognized as explicit targets in _parse_target_ref.

Reapplies the approach from #13274 (season179), whose branch predated the
gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py
relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah).

Co-authored-by: season179 <season.saw@gmail.com>
2026-06-27 04:01:58 -07:00
teknium1
f2ca3e3d84 fix(gateway): hold _run_restart on _restart_task + explicit cancel-loop skip
Follow-up on the cherry-picked #13173 fix. Holds the _run_restart task in
self._restart_task (a bare asyncio.create_task keeps only a weak reference,
so a still-pending task can be GC'd mid-flight) and explicitly skips it in
the _stop_impl cancel loop alongside _stop_task. Adds AUTHOR_MAP entry for
the contributor and a regression test that fails when the task is cancellable.

Refs #12875
2026-06-27 03:57:31 -07:00
zeapsu
1ce5d6d974 fix(gateway): exclude _run_restart from _background_tasks to prevent zombie on /restart
When request_restart() adds _run_restart to _background_tasks, _stop_impl
later cancels all entries in that set.  Since _run_restart is awaiting
_stop_task at that point, the CancelledError propagates into _stop_impl,
interrupting cleanup before _shutdown_event.set() and _exit_code = 75
execute.  This leaves the gateway as a zombie (alive but disconnected) or
exiting with code 0 instead of 75, preventing systemd Restart=on-failure
from restarting the service.

Fix: don't add _run_restart to _background_tasks — it self-terminates in
~50ms and needs no lifecycle management.

Fixes #12875
2026-06-27 03:57:31 -07:00
teknium1
08e131f77c test(telegram): cover bot self-message ingestion guard (#11905)
Regression tests for the self-author guard added in the salvaged fix:
- bot-authored DM-topic watcher echo is dropped (the exact #11905 symptom)
- bot self-messages dropped in groups/supergroups too
- other bots in the same chat are still processed (self-id, not is_bot)
- observe-unmentioned sibling path also rejects self-messages
- missing from_user does not crash

Test scaffolding ported from @cola-runner's PR #12817 and adapted to the
current plugins/platforms/telegram/adapter.py and _is_own_message().
2026-06-27 03:56:52 -07:00
Teknium
d73078e7b0
fix(cron): make per-profile cron isolation intentional and tested (#4707) (#53570)
A profile's cron jobs now provably live in AND execute under that profile's
HERMES_HOME. A job authored under profile `coder` is stored at
`~/.hermes/profiles/coder/cron/jobs.json` and runs with coder's .env,
config.yaml, scripts and skills — never the default root's.

This was the de-facto behavior on main but only by accident: PR #50112 had
re-anchored cron storage at the shared default root, and a later stale-branch
squash merge (#52147) silently reverted it back to the profile home. Neither
direction was guarded by a test, so it could flip again on the next stale merge.

Changes:
- cron/jobs.py: document the per-profile storage anchor (get_hermes_home, NOT
  get_default_hermes_root) and why anchoring at the root leaks
  config/credentials/skills across profiles — the #4707 security boundary.
- cron/scheduler.py, cron/suggestions.py: same intent documented at the
  dynamic resolution helper and the suggestions store.
- tests/cron/test_cron_profile_isolation.py: pin storage, lock-path, and
  execution-home resolution to the active profile so a re-anchor can't regress.

Verified E2E: jobs created under two profiles land in separate per-profile
stores with zero cross-profile leakage and no shared-root store; scheduler
execution-home follows the active profile. Full cron suite: 576/576.
2026-06-27 03:55:01 -07:00
Bartok
864d5521ad test(curator): join straggler curator-review thread on fixture teardown
The curator_env fixture left async review threads (synchronous=False spawns
a daemon 'curator-review' thread that calls save_state() on completion)
running past test teardown. save_state() resolves the state path from
HERMES_HOME at write time, so a straggler could write into the next test's
tmp home, corrupting test_state_file_survives_corrupt_read (and others)
under CI load. Join the thread on teardown while HERMES_HOME is still
pinned to this test's home.
2026-06-27 03:52:52 -07:00
Bartok9
45ce35ed72 fix(agent): classify message-only 'overloaded' as server overload
Salvage of #14261 by @ms-alan — rebased onto current main, scoped to the
overloaded-classification fix, with a regression test that fails without it.
2026-06-27 03:52:52 -07:00
teknium1
151ae1e937 test(api-server): cover SSE failure finish_reason for both failure modes
Lock the contract that a clean stream-queue termination followed by an
agent failure never reports finish_reason: "stop". Covers the raised-
exception case (#12422 repro), the flagged failed-result case, truncation
(length), and the success happy path.

Follow-up to the salvaged #12504 fix from @flobo3.
2026-06-27 03:52:44 -07:00
blaryx
76af2456a2 fix(dashboard): merge PUT /api/config with existing on-disk config
The dashboard form is built from CONFIG_SCHEMA, which doesn't enumerate
every root-level key the YAML supports. Most visibly, `custom_providers`
is in `_KNOWN_ROOT_KEYS` but is absent from the schema — so the frontend
never sends it in the PUT body. The previous full-replace save() then
silently wiped the key from disk every time the user clicked anything
that triggered a save. Other casualties (less visible because defaults
re-mask them on load) include `agent.personalities`,
`agent.reasoning_effort`, `terminal.lifetime_seconds`, etc.

Fix: read the raw on-disk config and deep-merge the incoming PUT body
on top of it before saving. The frontend can only overwrite what it
explicitly sends; everything else is preserved verbatim.

Reuses the existing `_deep_merge` helper from `hermes_cli.config`.

Tests:
- `test_round_trip_preserves_custom_providers` exercises the exact bug:
  seed config with custom_providers, GET → drop the key → PUT,
  assert it's still on disk.
- `test_round_trip_preserves_schema_invisible_nested_keys` covers the
  shallow-vs-deep-merge case for nested dicts under `agent` etc.
Both fail on current main; both pass with this patch.
2026-06-27 03:48:18 -07:00
Teknium
ec769e49d2
fix(gateway): WhatsApp/Signal hints affirm markdown instead of forbidding it (#53564)
The 'whatsapp' and 'signal' PLATFORM_HINTS told the agent 'Please do not
use markdown as it does not render' — factually wrong. Both adapters
actively convert markdown to native formatting:

- whatsapp_common.format_message(): **bold**, ~~strike~~, # headers,
  links, code blocks -> WhatsApp native syntax
- signal_format.markdown_to_signal(): same conversions via bodyRanges,
  plus '- item' / '* item' bullets -> '• ' Unicode bullets

The wrong hint made the agent strip bullets and bold the adapter would
have rendered (#12224). Rewrote both hints to mirror whatsapp_cloud:
markdown is auto-converted, bullet lists work, tables are not supported.
Added a contract test asserting markdown-converting platforms never
forbid markdown in their hint.
2026-06-27 03:46:41 -07:00
dodo-reach
ed54469d06 fix(gateway): show MoA presets in model picker 2026-06-27 03:43:38 -07:00
teknium1
4e0788783b refactor(gateway): extract MoA one-shot restore helper; restore #28686 comment; real-method tests
Follow-up on the salvaged MoA restore fix:
- Extract the finally-block restore into _restore_moa_one_shot() so the
  behavior is unit-testable without re-implementing it, and so the gateway
  /moa handler and the finally block share one implementation.
- Restore the load-bearing #28686 zombie-eviction comment above
  _release_running_agent_state that the original diff dropped.
- Rewrite the tests to call the real _restore_moa_one_shot helper (the
  originals re-implemented the restore logic inline, so they passed
  regardless of the production code).
2026-06-27 03:43:28 -07:00
srojk34
2f29e3cfc5 fix(gateway): restore MoA one-shot model override on failed turns
The MoA one-shot restore ran inside the try block after
_handle_message_with_agent returned. When that call raised an
exception (agent init failure, interpreter shutdown, OOM), the
restore was skipped and the MoA model override stayed permanently
on _session_model_overrides — silently routing all subsequent
messages through the MoA reference fan-out with no user-visible
indication.

Move the restore to the finally block so it fires on every exit
path (success, exception, interrupt). The restore data lives on
the per-turn event object and would be lost if not consumed here.
2026-06-27 03:43:28 -07:00
briandevans
17cb829991 test(moa): cover non-list/bare-dict reference_models normalization 2026-06-27 03:43:16 -07:00
Teknium
60f58a2b95
feat(verify-on-stop): default OFF, one-time migration, skip doc-only edits (#53552)
The verify-on-stop guard fired too eagerly — including on doc/markdown/skill
edits with nothing to verify, where it pushed a pointless /tmp verification
script. Three changes:

1. Default OFF for new installs: agent.verify_on_stop defaults to false
   (was the "auto" surface-aware sentinel). _config_version bumped 30 -> 31.
2. One-time migration (v30 -> v31): existing installs are switched off once,
   but only when the value is missing or still the "auto" sentinel — an
   explicit true/false the user set is preserved.
3. Path filter: build_verify_on_stop_nudge() now drops documentation/prose
   paths (.md/.mdx/.rst/.txt/LICENSE/CHANGELOG/...) so even when explicitly
   enabled, a doc-only turn never nudges. Mixed doc+code turns still nudge on
   the code paths.

The legacy "auto" sentinel is still honored when set explicitly (ON for
interactive coding surfaces, OFF for messaging). HERMES_VERIFY_ON_STOP env
override unchanged.
2026-06-27 03:23:22 -07:00
Versun
c655cdf2c1 feat(dashboard): expose cron job execution fields 2026-06-27 03:20:32 -07:00
teknium1
50f6855217 feat(moa): make /moa one-shot only; route preset switching through the model picker
/moa no longer does a sticky model switch. It now always runs a single
prompt through the default MoA preset and restores the prior model
afterward; the whole argument is the prompt (no preset-name matching).
To switch to a MoA preset for the session, select it from the model
picker, where presets already surface under a virtual Mixture of Agents
provider on every model-selection surface.

Also fixes #53444: the TUI one-shot only set session[model_override],
which the already-built cached agent ignored, so MoA silently never ran
and the turn used the original model. The TUI now does a real in-place
agent.switch_model() via _apply_model_switch() when a live agent exists
(with a proper restore after the turn), and falls back to a model_override
for lazy/unbuilt sessions.

Removes the redundant sticky-switch branch from the CLI, gateway, and TUI
/moa handlers; updates the command description, usage string, and docs.
2026-06-27 03:09:09 -07:00
diamondeyesfox
8df231c941 fix(agent): rebaseline in-place compression flushes 2026-06-27 03:04:26 -07:00
Mahesh Sanikommu
1b75b3fd90 feat(memory): add Supermemory setup connection summary
Add post_setup() and get_status_config() to the Supermemory memory
provider so `hermes memory setup` and `hermes memory status` print a
one-line connection summary (container, profile fact count,
auto_recall/auto_capture). Point API-key onboarding at the Hermes
connect URL (app.supermemory.ai/integrations?connect=hermes).

Salvage of #52988. Two fixes folded in:

- Test isolation: the new probe/status tests mocked _SupermemoryClient
  but not the __import__("supermemory") guard inside
  _probe_supermemory_connection, so they passed only where the optional
  supermemory package was installed and failed on a clean checkout / CI
  (the PR shipped with red CI). Added _stub_supermemory_importable()
  mirroring the existing test_is_available_false_when_import_missing
  pattern; the suite now passes with supermemory absent.

- post_setup: `if api_key and api_key not in os.environ` checked whether
  the key's *value* named an env var (always false in practice). Fixed to
  compare the value: `os.environ.get("SUPERMEMORY_API_KEY") != api_key`.

Verified: 38/38 in test_supermemory_provider.py and the full
tests/plugins/memory/ suite green with supermemory not installed.

Closes #52988
2026-06-27 15:07:34 +05:30