Commit graph

571 commits

Author SHA1 Message Date
Teknium
208f0d7c3b
fix(update): default pre-update backup to off (#52729)
The pre-update HERMES_HOME zip shipped on by default (DEFAULT_CONFIG +
runtime fallback both True), so every `hermes update` zipped the entire
~/.hermes — sessions DB, caches, skills — adding minutes to each update.
The shipped cli-config.yaml.example, the --backup help, and the example
config all already said "off by default," so the live default
contradicted its own documentation.

Flip the default to off everywhere: DEFAULT_CONFIG, the runtime
`.get(..., False)` fallback in _run_pre_update_backup, and the stale
--backup help string. Users who want the #48200 safety net opt in via
updates.pre_update_backup: true or --backup for a single run.

Updated test_default_enabled_creates_backup -> test_default_disabled_is_silent
to assert the new default (silent no-op, no zip).
2026-06-25 16:01:09 -07:00
kshitij
e4ff494860
fix(cron): add default retention to per-run job output (#52383) (#52646)
* fix(cron): add default retention to per-run job output to bound disk usage (#52383)

Per-run cron output (cron/output/<job>/<timestamp>.md) is written once
per execution and was never pruned, so a frequently-scheduled job on
a long-running deploy accumulates one file per run indefinitely and
can fill the volume ('no space left on device').

save_job_output() now keeps the most recent N output files per job and
removes older ones. N defaults to 50 and is configurable via
cron.output_retention; a non-positive value disables pruning for
operators who manage cleanup externally.

Salvaged from #52402 by @0xDevNinja.

Closes #52383

* fix(config): add cron.output_retention to DEFAULT_CONFIG

Follow-up to #52383: the retention config key was functional via
get()-with-default but missing from DEFAULT_CONFIG, so the deep-merge
wouldn't auto-populate it for new installs. Add it explicitly.

---------

Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
2026-06-25 16:00:13 -07:00
Teknium
c6575df927
feat(moa): expose MoA presets as selectable virtual models (#46081)
* feat(moa): expose MoA presets as selectable virtual models

Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.

Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.

- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
  / one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.

* fix(moa): thread moa_config through _run_agent to _run_agent_inner

The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).

* fix(moa): classify moa as a virtual provider in the catalog

The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
  - test_provider_catalog: api_key providers must expose a credential env var
  - test_provider_parity: every hermes-model provider must be desktop-configurable

moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
2026-06-25 13:52:06 -07:00
kshitij
ca714f6189
Merge pull request #52653 from kshitijk4poor/salvage/33814-env-quote-hash
fix(config): quote .env values containing # to prevent token truncation (#30355)
2026-06-26 01:32:49 +05:30
kshitijk4poor
2107b86024 feat(compression): flip in_place default to True (#38763) [2/2]
In-place compaction (single durable session id, non-destructive soft-archive)
becomes the default. Rotation is now the opt-out fallback via
compression.in_place: false.

Prerequisite: #50098 (hygiene guard reads result flag not config flag) merged
first — without it, flipping the default causes permanent transcript loss on
gateway hygiene-compress and /compress when no session_db is available.

Blast radius (empirically measured on current main): 7 rotation-asserting
tests broke and are pinned to in_place=False in the companion test commit:
- tests/agent/test_compression_concurrent_fork.py (2)
- tests/agent/test_compression_logging_session_context.py (1)
- tests/agent/test_compression_rotation_state.py (1)
- tests/run_agent/test_compression_boundary_hook.py (2 _make_agent helpers)
- tests/gateway/test_compression_concurrent_sessions.py (2)
Rotation stays as a working fallback and deserves continued coverage.

Plan: .hermes/plans/in-place-compaction-38763.md
2026-06-25 12:56:05 -07:00
sweetcornna
150afea942 fix(config): quote env values containing hash 2026-06-26 00:54:34 +05:30
Brooklyn Nicholson
c4c590e4a1 perf(desktop): make session switching fast under load
Switching sessions in the desktop app could freeze the whole UI for
several seconds on heavy, tool-rich chats. Root causes and fixes:

- Cold `session.resume` built the AIAgent (MCP discovery, prompt/skill
  build) *before* returning, and the desktop awaits that RPC before it
  paints — so the entire switch blocked on the build. Add an opt-in
  `defer_build` resume path (the contract `session.create` already uses):
  return the full display transcript immediately, register an upgradable
  live session, and pre-warm the agent on a short timer. The persisted
  runtime identity (model/provider/base_url/api_mode/reasoning/tier) is
  restored on the deferred build so it can't drop the provider.

- Nothing bounded how many in-memory agents accumulate; a user who
  reconnects often piled up detached sessions for the full 6h TTL. Add a
  soft LRU cap (`max_live_sessions`, default 16) that evicts the
  least-recently-active DETACHED sessions (no live client) — never a
  running, awaiting-input, mid-build, or live-transport one. Reopening
  re-resumes from disk.

- On the prefetch-hit cold-resume path, skip rebuilding a throwaway
  merged-message array (and its 1000-entry Map) when the prefetch already
  painted the exact transcript; the downstream sameMessageList guard
  already drops the publish, so it was pure main-thread cost.

The desktop opts into `defer_build` for every non-watch cold resume; the
eager path stays for CLI/TUI and existing callers.
2026-06-25 14:03:03 -05:00
brooklyn!
d473e5d07a
Merge pull request #52296 from NousResearch/bb/verify-stop-loop
Add verification stop loop
2026-06-24 23:10:03 -05:00
Brooklyn Nicholson
2f1a47b90e feat(agent): require verification before finishing edits
Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.
2026-06-24 23:02:48 -05:00
Victor Kyriazakos
b693bee100 feat(cron): thread-preferred continuable delivery (open a thread, mirror DM fallback)
Continuable cron jobs (attach_to_session / cron.mirror_delivery, default
OFF) now prefer a dedicated thread on thread-capable platforms, falling
back to origin-DM mirroring where threads don't exist.

- Thread-capable (Telegram topics, Discord/Slack threads): open a fresh
  thread for the job via the shipped adapter.create_handoff_thread,
  route the brief into it, and seed the thread-keyed session so the
  user's in-thread reply continues with full context. This is the
  'continuable cron opens its own thread' interface.
- DM-only (WhatsApp/Signal/SMS): create_handoff_thread returns None ->
  fall back to mirroring into the origin DM session (existing behaviour).

Reuses existing infrastructure end-to-end — no new adapter surface, no
provider-chain signature change:
- adapter.create_handoff_thread (already implemented per-platform,
  returns None on unsupported platforms = the fallback signal)
- the live SessionStore via adapter._session_store (already set on every
  adapter), reached without threading a new param through the frozen
  CronScheduler.start() contract
- gateway.mirror.mirror_to_session for the seed/append
- existing per-target delivery routing carries the new thread_id for free

Mirrors GatewayRunner._process_handoff's open-thread-or-fallback +
seed pattern, standalone for the cron delivery path. thread_seeded
guards against a double-mirror after seeding. Scoped to the origin
target only; fan-out/broadcast targets are never threaded or mirrored.

Config docs updated (cron.mirror_delivery) + cronjob tool
attach_to_session description reframed around continuable/thread-preferred.

Tests: +5 (thread id returned on thread platform; None on DM platform;
None without capability/loop; seed creates thread session + mirrors;
seed no-op on empty). 22/22 in TestCronDeliveryMirror; 532 cron tests
pass (4 failures pre-existing: croniter-not-installed + TZ).
2026-06-24 20:27:05 -07:00
Victor Kyriazakos
1b181724fa feat(cron): optional mirror of cron delivery into target chat session
Adds an opt-in path so a cron job's delivered output is also appended to
the TARGET chat's gateway session transcript (as an assistant turn), so a
user reply to a recurring delivery (daily brief, reminder) is answered with
the delivery in context instead of 'what is that?' amnesia.

- Reuses the shipped gateway.mirror.mirror_to_session — the same primitive
  interactive send_message mirroring already uses. No messaging-toolset
  change (cron still can't call send_message; this rides delivery).
- Gated: per-job attach_to_session overrides global cron.mirror_delivery
  (config.yaml). Default OFF — historical isolation preserved byte-for-byte.
- Mirrors the CLEAN agent output, not the cron header/footer wrapper.
- Alternation/cache-safe: append lands at a turn boundary, never mid-loop,
  never mutates the cached system prompt. Cold-start (no target session)
  is a silent no-op; mirror errors never fail a successful delivery.
- Surfaced on the cronjob tool (attach_to_session) + config schema.

Driven by enterprise cron-as-control-plane use case. 10 new tests; full
cron + cronjob-tool suites pass (600).
2026-06-24 20:27:05 -07:00
Teknium
411faf08bd
fix(soul): installers seed the real default persona, upgrade legacy empty templates (#52246)
The desktop bootstrap (and curl/PowerShell/docker installs) seeded
~/.hermes/SOUL.md with a comment-only scaffold that contained no persona
text. That shadowed the runtime default (_ensure_default_soul_md ->
DEFAULT_SOUL_MD), since seeding is guarded by 'if SOUL.md doesn't exist'.
Result: every fresh installer install got the empty template instead of
the documented Hermes persona; desktop just made it visible in onboarding.

- install.sh / install.ps1 / docker/SOUL.md now write DEFAULT_SOUL_MD.
- _ensure_default_soul_md() upgrades a SOUL.md still matching the known
  legacy scaffold in place; customized files (any deviation, incl. a
  persona appended below the comment) are never touched.
- Detection normalizes CRLF/BOM so Windows-installer drift still matches.
2026-06-24 18:56:26 -07:00
Ben
d1cac0e5ef feat(gateway): scale-to-zero idle detection + dormant-quiesce (Phase 0)
The gateway-side BEHAVIOUR layer that consumes the relay scale-to-zero
primitives (gateway-gateway Phase 5): the gateway decides it is idle and
drives the relay transport dormant so the platform (Fly autostop:"suspend")
can suspend the now-traffic-idle machine, which wakes on the connector's
wakeUrl poke (decisions.md Q3=C', D1-D13).

- gateway/scale_to_zero.py: pure helpers — scale_to_zero_enabled (the NAS
  Labs HERMES_SCALE_TO_ZERO stamp, D11/Q8=A), parse_idle_timeout_seconds
  (config.yaml gateway.scale_to_zero.idle_timeout_minutes, D2),
  messaging_is_relay_only_or_absent (F6/D1), should_arm (D1/D11/§3.4(1)),
  is_idle (D2/D3/F7).
- gateway/run.py: _last_inbound_at clock stamped on user inbound in
  _handle_message (F13); the arm-gate + idle predicate + the
  _scale_to_zero_watcher dormant sequence (mark draining -> adapter
  go_dormant() -> cooldown), started only when armed. Deliberately NOT the
  stop path and NOT mark_resume_pending (F12/D13).
- tools/process_registry.py: has_any_active() for the bg-work guard (D3/F7).
- hermes_cli/config.py: gateway.scale_to_zero.idle_timeout_minutes default 5.

Tests: 38 pure-logic + 6 watcher (incl. bg-work regression guard proven RED).
Full relay + scale-to-zero suites: 184 passed. The 20 unrelated failures in
the broader run are PRE-EXISTING on origin/main (custom-provider/tools tests),
confirmed via a pristine baseline worktree.
2026-06-24 18:47:18 -07:00
helix4u
17beb55e3c fix(telegram): gate rich draft previews separately 2026-06-24 18:11:14 -07:00
uperLu
0d4cecb352 fix(cron): avoid provider package shadowing core cron 2026-06-23 23:39:22 -07:00
Brooklyn Nicholson
e495b33bf1 Merge remote-tracking branch 'origin/main' into bb/pets-merge
# Conflicts:
#	hermes_cli/commands.py
#	tui_gateway/server.py
2026-06-23 19:05:22 -05:00
Teknium
6cc07b6cd0
feat(discord): render reasoning as -# subtext via display.reasoning_style (#51168)
Adds a per-platform display.reasoning_style setting (code | blockquote |
subtext) controlling how the show_reasoning summary renders on the gateway.
Discord defaults to "subtext" (-# small grey metadata text); every other
platform keeps the fenced code block. Resolves through the existing
display.platforms.<platform>.reasoning_style override chain.
2026-06-23 10:44:02 -07:00
Teknium
87c4a5ebb8
feat(background-review): aux-model selector for the self-improvement review (#49252)
Adds auxiliary.background_review.{provider,model} (default auto = main chat
model — unchanged). Set it to a different, cheaper model and the post-turn
self-improvement review runs there for ~3-5x lower cost.

Cache-aware by design: the main chat is warm in the prompt cache, so the
default full-history replay on the main model is cheap cache reads — left
exactly as-is. A different model can't reuse that cache (different key), so
when (and only when) routed to a different model the fork replays a compact
digest instead of the full transcript, minimising what it cold-writes on the
aux model. Same model -> full replay; different model -> digest.

Quality holds in benchmarks: memory capture identical, skill near-identical.
Nothing changes unless you opt in by naming a different model.

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-22 14:54:53 -07:00
Teknium
f1e6d39a74
feat(computer_use): disable cua-driver telemetry by default, add opt-in (#50842)
* feat(computer_use): disable cua-driver telemetry by default, add opt-in

cua-driver ships anonymous PostHog usage telemetry ENABLED by default
upstream (fires cua_driver_install / cua_driver_doctor events to
eu.i.posthog.com). Hermes now disables it for our users unless they
explicitly opt in.

- New config key `computer_use.cua_telemetry` (default false) in
  DEFAULT_CONFIG.
- `cua_backend.cua_driver_child_env()` injects
  `CUA_DRIVER_RS_TELEMETRY_ENABLED=0` into the child env when telemetry is
  disabled (the default); leaves the var untouched on opt-in so the driver
  uses its own default. Reads config fail-safe — any error defaults to
  telemetry off.
- Routed every cua-driver spawn site through the policy: MCP backend
  (StdioServerParameters env), `cua_driver_update_check`, doctor's
  health_report Popen, the install.sh/install.ps1 runner, and the
  `--version` / status probes.
- Docs: new Telemetry subsection in computer-use.md (EN).
- Tests: tests/computer_use/test_cua_telemetry.py — default disables,
  explicit-false disables, opt-in leaves var untouched, config-failure
  fails safe, inherited-enabled is overridden off.

Verified live on Linux against the real cua-driver-rs 0.6.0 binary: with
the var=0 the driver reports "telemetry: disabled via
CUA_DRIVER_RS_TELEMETRY_ENABLED" and sends no event; with it unset it logs
"sending event: cua_driver_doctor". 213 computer_use + install tests green.

* fix(dashboard): fold computer_use config category into agent tab

The new computer_use.cua_telemetry key created a single-field dashboard
config category, tripping test_no_single_field_categories (web_server's
invariant that categories with <2 fields must be merged to avoid tab
sprawl). Add computer_use -> agent to _CATEGORY_MERGE, matching the
existing onboarding/telegram single-field folds.
2026-06-22 09:57:16 -07:00
Brooklyn Nicholson
5342eccf12 Merge remote-tracking branch 'origin/main' into bb/pets 2026-06-22 05:25:49 -05:00
Shannon Sands
5dae502b86 Address email pairing review feedback 2026-06-21 22:43:57 -07:00
teknium1
4314d451ca fix(gateway): accept any inbound file type across all messaging platforms
Authorization to message the agent is the gate, not the file extension.
Previously the inbound-attachment allowlist (SUPPORTED_DOCUMENT_TYPES) was
opt-OUT on Discord (allow_any_attachment defaulted false) and had no bypass
at all on Telegram/Slack — so an .html (or any non-allowlisted type) was
dropped or hard-rejected before the agent saw it.

Now every authorized upload is cached and surfaced to the agent regardless
of type:
- base.cache_media_bytes(): unknown types cache as octet-stream (or the
  caller-supplied MIME) instead of returning None — fixes the chokepoint
  that Teams/Telegram-media route through.
- discord/telegram/slack adapters: removed the allowlist reject/skip; any
  non-media attachment is typed DOCUMENT and cached. Known types keep their
  precise MIME.
- Text inlining now gates on a shared _TEXT_INJECT_EXTENSIONS set (text +
  code + config + markup) instead of a blind UTF-8 decode, so binary formats
  (PDF/zip/docx) with ASCII headers are never inlined.
- gateway/run.py emits the path-pointing context note for every DOCUMENT,
  including non text/application MIME types.
- discord.allow_any_attachment is now a documented no-op kept for config
  back-compat.

Validation: 357 gateway tests pass; E2E confirms .html/.bin/custom types
cache, known types stay precise, PDFs are not inlined.
2026-06-21 22:43:45 -07:00
Teknium
95d53c3bcb
feat(cli): /reasoning full — show complete thinking, not 10-line clamp (#50499)
* feat(cli): /reasoning full to show complete thinking, not 10-line clamp

The post-response Reasoning recap box hard-clamped long thinking to the
first 10 lines, so there was no way to see the full reasoning trace after
a turn (live streaming already shows it in full). Add display.reasoning_full
(default off) plus /reasoning full|clamp to toggle it at runtime; the clamp
truncation note now points at the command. Addresses repeated user requests
to show all thinking tokens.

* test(gateway): de-snapshot /reasoning help assertion

The test froze the exact args-hint literal '/reasoning [level|show|hide]',
which the new full/clamp args change to '[level|show|hide|full|clamp]'.
Convert to an invariant: assert /reasoning is in help and carries its core
args, not the exact hint string.

* feat(tui): /reasoning full|clamp parity in tui_gateway

The classic-CLI reasoning_full toggle had no TUI equivalent — typing
/reasoning full in the TUI fell through to parse_reasoning_effort and
errored. The TUI renders thinking as an expand/collapse section (no fixed
10-line recap), so map full -> sections.thinking=expanded (raw, uncapped
via thinkingPreview mode='full') and clamp -> collapsed, persisting
display.reasoning_full for cross-surface config consistency.
2026-06-21 20:21:11 -07:00
Teknium
7130d60861
feat(providers): remove google-gemini-cli + google-antigravity OAuth providers (#50492)
* feat(providers): remove google-gemini-cli + google-antigravity OAuth providers

Google now actively bans accounts for third-party tools that piggyback on
Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention
sits at a backend layer the ban can extend to the entire Google account
(Gmail/Drive), with a second violation being permanent.
Ref: https://github.com/google-gemini/gemini-cli/discussions/20632

Removes both OAuth inference providers entirely (modules, provider profiles,
auth/runtime/config/models wiring, the /gquota Code Assist quota command,
the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans).
The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against
generativelanguage.googleapis.com) is unaffected and stays fully supported.

* fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed

The antigravity-cli optional skill orchestrates the external `agy` binary as
a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference
through the banned google-antigravity OAuth provider, so it carries none of
the account-ban risk that motivated removing that provider. Restore the skill,
its docs page, the sidebar entry, and the optional-skills catalog row. The
google-antigravity / google-gemini-cli inference providers stay fully removed.
2026-06-21 19:53:27 -07:00
teknium1
8cecaf0b29 feat(process): escalate SIGTERM->SIGKILL on host-pid termination after grace
A daemon that ignores or stalls in its SIGTERM handler currently survives the
process-registry reap and leaks until reboot (observed as agent-browser
daemons accumulating to EMFILE on long-running gateways). _terminate_host_pid
now snapshots the tree, SIGTERMs it, waits a bounded grace window
(terminal.daemon_term_grace_seconds, default 2.0s, 0 disables), then SIGKILLs
any survivor. The recycled-PID identity guard still gates the whole path, so
escalation never reaches a stranger; Windows is unchanged (taskkill /F is
already a hard kill).

Config lives in config.yaml (terminal.daemon_term_grace_seconds), NOT an env
var, per the .env-secrets-only policy.

Implements the SIGKILL-escalation idea from @tkwong's #15008, reworked onto the
current _terminate_host_pid tree-kill path (the original predated it) and
config-gated instead of env-var-gated.

Co-authored-by: Benjamin Wong <tkwong@inspiresynergy.com>
2026-06-21 19:08:52 -07:00
pmos69
8baa4e9976 feat(cli): add native Antigravity OAuth provider 2026-06-21 16:41:30 -07:00
Teknium
824c9d3812
fix(config): alias model.api_base -> model.base_url for custom providers (#50385)
A bare custom provider configured via `model.api_base` (the intuitive name
OpenAI-SDK / LiteLLM users reach for) was silently ignored: `hermes config set`
accepts any dotted key, so `model.api_base` got written and confirmed, but the
runtime resolver reads only `model.base_url`. Requests fell back to OpenRouter
with an empty key -> 401, zero hits to the custom endpoint (issue #8919).

Now api_base is migrated to base_url at load time (fixes existing broken
configs) and at set time (with a notice), never overriding an explicit
base_url. Closes #8919.
2026-06-21 13:33:41 -07:00
kn8-codes
6183e8ce1b fix(telegram): make Bot API 10.1 rich messages opt-in (default off)
Rich messages are not ready for primetime: current Telegram clients can
render Bot API 10.1 rich messages as blank/unsupported bubbles and make
them hard to copy as plain text, which is worse than the legacy
MarkdownV2 path for command snippets and mobile handoffs. Default the
rich_messages toggle to False so replies stay on the copyable legacy
path; users opt in per bot via platforms.telegram.extra.rich_messages:
true. Updates adapter, gateway config default, example config, English +
zh-Hans docs, and the default/opt-in tests.
2026-06-21 12:03:24 -07:00
sgaofen
93ea9b04af fix(gateway): cap inbound media download size to prevent memory exhaustion
Inbound image/audio/video payloads were buffered fully into process memory
before being written to the cache, with no size limit. A large upload
(Discord Nitro allows 500 MB) or a remote media URL in an inbound message
pointing at a huge file could spike RAM and OOM-kill the gateway.

Enforce a configurable cap in the shared cache helpers (gateway/platforms/
base.py) so the protection holds across every platform adapter, not one:

- cache_image/audio/video_from_bytes reject oversized payloads before writing
  (video was the gap in the original report — now covered).
- cache_image/audio_from_url stream the body, rejecting on an oversized
  Content-Length header and re-checking the running total per chunk so an
  absent/lying header can't smuggle an unbounded body past the cap.
- Discord's _read_attachment_bytes checks att.size up front, so an oversized
  attachment is rejected before any bytes are pulled into memory.

Configurable via gateway.max_inbound_media_bytes in config.yaml (default
128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml.

Salvaged and extended from @sgaofen's PR #13341 (the original report and the
shared-helper approach). Reapplied onto current main (Discord adapter has
since moved to plugins/platforms/discord/), the configurable knob moved from
an env var to config.yaml, and the video cache helper added.

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-21 11:56:46 -07:00
Teknium
a18bae65b9
fix(config): redact api_key in config show/set output (#50245) (#50313)
hermes config show printed the model dict raw via print(), bypassing the
logging redactor; a custom-provider api_key (e.g. Cloudflare cfut_...) was
shown in plaintext even with security.redact_secrets=true. Opaque tokens
don't match any vendor-prefix regex, so structural key-name masking is
required.

- Add redact_config_value(): recursively masks credential-shaped keys
  (api_key/token/secret/... exact-match) via mask_secret.
- Wrap the show_config model dump in it.
- Mask the set_config_value echo when the leaf key is credential-shaped
  (config set model.api_key routes to config.yaml, lowercase misses the
  .env allowlist).
2026-06-21 11:50:31 -07:00
Teknium
03563dabac
fix(gateway): raise session-hygiene hard message limit 400 → 5000 (#50194)
The gateway pre-compression hygiene valve force-compressed any session
crossing 400 messages regardless of token usage. On large-context (1M+)
models doing many short, message-dense turns, a healthy session at ~16%
token usage could hit 400 messages and get force-compressed — and the
compression summary's stale Active Task could then bleed into the next
turn.

The valve's actual purpose is to break a death spiral: when API calls
keep disconnecting on an oversized session, no token-usage data arrives,
the token threshold never fires, and the transcript grows unbounded.
It's a count-based floor for that pathological case only. 400 was tuned
for ~200K-context models and is far too low for modern large-context
sessions. Raise the default to 5000 — still well clear of any death
spiral, but no longer firing on legitimate long conversations.

The value remains fully configurable via compression.hygiene_hard_message_limit.
2026-06-21 08:26:19 -07:00
Teknium
e499d69e3e
feat(api-server): configurable concurrent-run cap to prevent DoS (#50007)
The OpenAI-compatible API server only enforced a hardcoded cap of 10
concurrent runs on /v1/runs, leaving /v1/chat/completions and
/v1/responses unbounded — a request flood could exhaust CPU, memory,
and upstream LLM quota (#7483).

- Add gateway.api_server.max_concurrent_runs (config.yaml, default 10,
  0 disables). No env var.
- Shared concurrency gate across all three agent-serving endpoints,
  counting both the chat/responses in-flight counter and the /v1/runs
  stream set. Returns OpenAI-style 429 + Retry-After when at the cap.
- Remove the dead hardcoded _MAX_CONCURRENT_RUNS class attribute.

Closes #7483.
2026-06-21 07:26:03 -07:00
Brooklyn Nicholson
e7dbfdaad7 feat(pets): pet engine + display.pet config
Add the shared pet engine under agent/pet/: spritesheet manifest loading
and in-process caching, six-state animation model, frame rendering, and
the persistent pet store. Register the display.pet config block (pet,
scale, enabled, etc.) that every surface reads from. Covered by
tests/agent/test_pet_engine.py.
2026-06-20 14:18:30 -05:00
kshitijk4poor
69716a2e6f docs(compression): fix stale 'discarded' wording on in_place config flag
Review nit (yoniebans): the config.py comment still said compaction is
'lossy: the pre-compaction transcript is discarded, matching Claude Code /
Codex' — leftover from the original destructive design. The shipped behavior
is soft-archive: lossy for the LIVE context (what the model reloads), but the
pre-compaction turns are kept on disk (active=0, compacted=1), searchable via
session_search and recoverable. Comment now says so. Comment-only; no behavior
change.
2026-06-20 10:57:07 -07:00
kshitijk4poor
47fadc24d7 feat(compression): in-place compaction option that keeps one session id (#38763)
Context compression today rewrites the message list AND rotates the
session id — it ends the session, forks a parent_session_id child, and
renumbers the title (name -> name #2). That moving identity key is the
root cause of a whole bug cluster: /goal lost (#33618), pending response
lost at the split (#14238), orphan sessions (#33907), TUI sid desync
(#36777), FTS search gaps + duplicate sidebar entries (#45117), null
continuation cwd (#42228), and title-rename dead-ends (#48989). It also
forced a large defensive apparatus (compression lock, contextvar/env/
logging triple-sync, orphan finalization, gateway SessionEntry
re-propagation, tip projection) whose only job is surviving a
mid-conversation id change.

Add a compression.in_place config flag (default False during rollout).
When True, compaction rewrites the transcript and rebuilds the system
prompt but keeps the SAME session_id: no end_session, no child row, no
title renumber, no contextvar/logging re-sync, no memory/context-engine
session-switch. The conversation keeps one durable id for life, like
Claude Code / Codex. Compaction is lossy by design — the pre-compaction
transcript is summarized away, not archived.

The rotation path is unchanged when the flag is off (moved verbatim into
an else branch). Staged rollout: this PR ships the option behind a
default-off flag for live validation; a follow-up flips the default and
deletes the now-redundant rotation machinery, superseding the 14 open
band-aid PRs in this area.

- hermes_cli/config.py: add compression.in_place (default False), documented
- agent/agent_init.py: resolve the flag -> agent.compression_in_place
- agent/conversation_compression.py: branch compress_context() on the flag
- tests/run_agent/test_in_place_compaction.py: in-place invariants +
  rotation regression guard + config default

The pre-flush of current-turn messages (#47202) runs in BOTH modes, so no
boundary data loss. Prompt-cache invariant preserved: the system-prompt
rebuild is the same single sanctioned invalidation that already happens
during compaction — no NEW invalidation. Message alternation preserved.
2026-06-20 10:57:07 -07:00
helix4u
c253b07380 fix(model): clear stale endpoint credentials across switches 2026-06-19 19:58:26 -07:00
Teknium
cf58f1a520
feat(titles): support language-aware title generation (#45296)
Make auxiliary title prompts match the user language by default, with an optional pinned `auxiliary.title_generation.language` config.
2026-06-19 17:15:52 -07:00
alt-glitch
b6e2a54a94 fix(mcp): address adversarial review round 1 (cache parity, gates, races)
Consolidated findings from three independent reviewers (Codex, Claude Code, a
Hermes subagent w/ the hermes-agent-dev skill):

- BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently
  dropping post-build-injected memory-provider (mem0/honcho/…) and context-
  engine (lcm_*) tools on every refresh. Now additive-preserving: re-applies
  the same injectors agent_init uses, staged on locals and published atomically.
- Re-injection now honors the #5544 enabled_toolsets gate for context-engine
  tools, so a restricted-toolset platform can't get lcm_* leaked back in.
- Atomic read-diff-publish under one lock: the returned `added` set and the
  (tools, valid_tool_names) pair are consistent even under concurrent callers
  (no half-swap, no TOCTOU).
- background_review fork opts out (_skip_mcp_refresh) so its byte-identical
  tools[] cache parity with the parent is preserved.
- CLI /reload-mcp routed through the shared helper (was a 4th divergent copy
  with the same clobber bug + missing disabled_toolsets).
- Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user
  just enabled in config this session is picked up; automatic paths reuse the
  agent's build-time selection.
- mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the
  between-turns refresh, so the startup wait is only a small turn-1 UX bump
  rather than a heavy dead-server latency penalty.
- has_registered_mcp_tools checks registered TOOLS (not connected servers) so a
  zero-tool/prompt-only server doesn't make the per-turn hook fire forever.
- Tests: rewrote the thread-safety test to actually exercise the write path
  (alternating tool sets), added the #5544-gate regression, the memory/context
  preservation regression, and a "callable next turn via valid_tool_names"
  contract; removed a dead monkeypatch line.
2026-06-19 11:57:43 -07:00
alt-glitch
93d6e73028 fix(mcp): expose late-connecting MCP tools to the agent (TUI/CLI/gateway)
MCP servers that connect after the agent's one-time tool snapshot were
invisible for the whole session. Two root causes, fixed together:

1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers
   commonly take 2-6s on a cold connect, so they missed the window and
   their tools never entered the agent's snapshot. `thread.join(timeout)`
   already returns the instant discovery completes, so raising the bound
   costs ~0s for the common case (no MCP / fast servers) and only ever
   blocks for a genuinely-pending server, capped so a dead server can't
   freeze startup. The bound is now configurable via
   `mcp_discovery_timeout` (config.yaml, default 5.0s).

2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI
   `reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh
   thread), and the late-refresh detected changes by tool COUNT — missing
   an equal-size add/remove swap. Consolidated into one shared
   `tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by
   tool NAME, mutates the agent under a lock (thread-safe), and respects
   the agent's own enabled/disabled toolsets.

The late-binding refresh keeps its pre-first-turn cache-safety guard:
it never rebuilds the tool list once a turn has started, so the cached
prompt prefix is never invalidated mid-conversation.

Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the
name-based diff, in-place mutation, agent-scoped filtering, thread
safety, and the config-driven discovery bound (incl. instant-return
when nothing is pending). 75 passed across the touched areas.
2026-06-19 11:57:43 -07:00
Teknium
2a5e9d994a
Merge pull request #48275 from NousResearch/feat/cron-scheduler-provider-chronos
feat(cron): pluggable CronScheduler interface + Chronos managed-cron provider (scale-to-zero)
2026-06-19 07:51:59 -07:00
Ben
ddd519ea70 feat(managed-scope): surface managed scope in config show and doctor
- show_config prints an administrator header naming the managed source and
  lists the pinned config/env keys when a scope is active (silent otherwise).
- hermes doctor gains a managed_scope_check under Configuration Files that
  reports the resolved managed dir + pinned key counts, and flags a
  HERMES_MANAGED_DIR redirect (the documented foot-gun).
2026-06-19 07:46:33 -07:00
Ben
4f9e15df97 feat(managed-scope): guard writes to managed config/env keys
- set_config_value hard-rejects a managed config key (D2) and names the
  source, exiting non-zero.
- save_env_value / remove_env_value refuse a managed env key.
- save_config strips managed leaves from a bulk write (mechanical safety net)
  with a warning, so the unmanaged remainder still persists.
New _strip_dotted_keys helper drives the bulk-save pruning. All guards are
distinct from and layered after the existing is_managed() package-manager
write-lock.
2026-06-19 07:46:33 -07:00
Ben
b5ddd6e719 feat(managed-scope): managed config layer wins over user config
_load_config_impl now deep-merges the managed config.yaml on top of the
expanded user config so managed leaves win while sibling keys stay
user-controlled (leaf-level merge, D3). Managed values are expanded against
the process env only, never user-defined ${VAR}, so a user can't shadow a
managed literal. The managed file's (mtime,size) is folded into the load
cache key so editing it invalidates the cache. This inverts the usual
env-over-config precedence for pinned keys by design (see design doc §4.1).
2026-06-19 07:46:33 -07:00
teknium1
a58287afcb
Merge remote-tracking branch 'origin/main' into pr48275-rebase
# Conflicts:
#	cron/scheduler.py
2026-06-19 07:40:29 -07:00
Teknium
d7bff949af
fix(cli): default cli_refresh_interval to 1.0 to keep status bar alive (#49087)
PR #49056 set the default to 0, which reverts the #45592 idle-clock fix:
without a periodic invalidate, prompt_toolkit stops repainting the bottom
chrome during idle and the status bar goes stale/disappears after a turn.

Restore 1.0 as the default for everyone. The config knob stays — users on
emulators where the per-second redraw fights auto-scroll (#48309) can set
display.cli_refresh_interval: 0 to opt out.
2026-06-19 07:35:06 -07:00
OYLFLMH
c1ffd4c3b4 fix(cli): make refresh_interval configurable, default to 0 (disabled)
Commit 6724daa2c added refresh_interval=1.0 to keep the idle clock
ticking, but unconditional 1 Hz redraws in non-fullscreen prompt_toolkit
mode cause terminal emulators (Xshell, iTerm2, Windows Terminal) to
auto-scroll to the bottom on every tick — breaking scroll-up to read
history.

Drive it from display.cli_refresh_interval (0 = disabled, the default)
so users who want the ticking clock can opt in without affecting everyone.

Fixes: #48309
Related: 6724daa2c, 8972a151a
2026-06-19 07:06:34 -07:00
Shannon Sands
d9190491a6 Add Slack setup hints and field validation 2026-06-19 12:16:23 +05:30
Shannon Sands
f741e70791 Add Slack allowed users setup field 2026-06-19 12:16:23 +05:30
Ben
637aff46e7 Merge remote-tracking branch 'origin/main' into hermes/hermes-6fe26723 2026-06-19 15:17:13 +10:00
Victor Kyriazakos
3ead2bdd0d feat(prompt): configurable per-platform system-prompt hint overrides
Add platform_hints config so an admin can append to or replace Hermes'
built-in platform hint for a single messaging platform (WhatsApp, Slack,
Telegram, ...) without affecting other platforms. Enables enterprise
managed profiles to steer platform-aware skills (e.g. invoke a custom
table-formatting skill on WhatsApp where Markdown tables don't render)
while leaving Telegram/Slack/CLI behavior unchanged.

- hermes_cli/config.py: document platform_hints in DEFAULT_CONFIG
- agent/agent_init.py: load platform_hints -> agent._platform_hint_overrides
- agent/system_prompt.py: _resolve_platform_hint() applies append/replace
  (replace wins; bare string = append shorthand); defensive on bad config
- tests: 16 cases covering append/replace/shorthand/isolation/malformed

Override only affects the platform-hint segment of the system prompt;
SOUL/context/memory tiers and general instructions are unchanged.
2026-06-18 14:28:01 -07:00