Authorization to message the agent is the gate, not the file extension.
Previously the inbound-attachment allowlist (SUPPORTED_DOCUMENT_TYPES) was
opt-OUT on Discord (allow_any_attachment defaulted false) and had no bypass
at all on Telegram/Slack — so an .html (or any non-allowlisted type) was
dropped or hard-rejected before the agent saw it.
Now every authorized upload is cached and surfaced to the agent regardless
of type:
- base.cache_media_bytes(): unknown types cache as octet-stream (or the
caller-supplied MIME) instead of returning None — fixes the chokepoint
that Teams/Telegram-media route through.
- discord/telegram/slack adapters: removed the allowlist reject/skip; any
non-media attachment is typed DOCUMENT and cached. Known types keep their
precise MIME.
- Text inlining now gates on a shared _TEXT_INJECT_EXTENSIONS set (text +
code + config + markup) instead of a blind UTF-8 decode, so binary formats
(PDF/zip/docx) with ASCII headers are never inlined.
- gateway/run.py emits the path-pointing context note for every DOCUMENT,
including non text/application MIME types.
- discord.allow_any_attachment is now a documented no-op kept for config
back-compat.
Validation: 357 gateway tests pass; E2E confirms .html/.bin/custom types
cache, known types stay precise, PDFs are not inlined.
* feat(cli): /reasoning full to show complete thinking, not 10-line clamp
The post-response Reasoning recap box hard-clamped long thinking to the
first 10 lines, so there was no way to see the full reasoning trace after
a turn (live streaming already shows it in full). Add display.reasoning_full
(default off) plus /reasoning full|clamp to toggle it at runtime; the clamp
truncation note now points at the command. Addresses repeated user requests
to show all thinking tokens.
* test(gateway): de-snapshot /reasoning help assertion
The test froze the exact args-hint literal '/reasoning [level|show|hide]',
which the new full/clamp args change to '[level|show|hide|full|clamp]'.
Convert to an invariant: assert /reasoning is in help and carries its core
args, not the exact hint string.
* feat(tui): /reasoning full|clamp parity in tui_gateway
The classic-CLI reasoning_full toggle had no TUI equivalent — typing
/reasoning full in the TUI fell through to parse_reasoning_effort and
errored. The TUI renders thinking as an expand/collapse section (no fixed
10-line recap), so map full -> sections.thinking=expanded (raw, uncapped
via thinkingPreview mode='full') and clamp -> collapsed, persisting
display.reasoning_full for cross-surface config consistency.
* feat(providers): remove google-gemini-cli + google-antigravity OAuth providers
Google now actively bans accounts for third-party tools that piggyback on
Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention
sits at a backend layer the ban can extend to the entire Google account
(Gmail/Drive), with a second violation being permanent.
Ref: https://github.com/google-gemini/gemini-cli/discussions/20632
Removes both OAuth inference providers entirely (modules, provider profiles,
auth/runtime/config/models wiring, the /gquota Code Assist quota command,
the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans).
The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against
generativelanguage.googleapis.com) is unaffected and stays fully supported.
* fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed
The antigravity-cli optional skill orchestrates the external `agy` binary as
a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference
through the banned google-antigravity OAuth provider, so it carries none of
the account-ban risk that motivated removing that provider. Restore the skill,
its docs page, the sidebar entry, and the optional-skills catalog row. The
google-antigravity / google-gemini-cli inference providers stay fully removed.
A daemon that ignores or stalls in its SIGTERM handler currently survives the
process-registry reap and leaks until reboot (observed as agent-browser
daemons accumulating to EMFILE on long-running gateways). _terminate_host_pid
now snapshots the tree, SIGTERMs it, waits a bounded grace window
(terminal.daemon_term_grace_seconds, default 2.0s, 0 disables), then SIGKILLs
any survivor. The recycled-PID identity guard still gates the whole path, so
escalation never reaches a stranger; Windows is unchanged (taskkill /F is
already a hard kill).
Config lives in config.yaml (terminal.daemon_term_grace_seconds), NOT an env
var, per the .env-secrets-only policy.
Implements the SIGKILL-escalation idea from @tkwong's #15008, reworked onto the
current _terminate_host_pid tree-kill path (the original predated it) and
config-gated instead of env-var-gated.
Co-authored-by: Benjamin Wong <tkwong@inspiresynergy.com>
A bare custom provider configured via `model.api_base` (the intuitive name
OpenAI-SDK / LiteLLM users reach for) was silently ignored: `hermes config set`
accepts any dotted key, so `model.api_base` got written and confirmed, but the
runtime resolver reads only `model.base_url`. Requests fell back to OpenRouter
with an empty key -> 401, zero hits to the custom endpoint (issue #8919).
Now api_base is migrated to base_url at load time (fixes existing broken
configs) and at set time (with a notice), never overriding an explicit
base_url. Closes#8919.
Rich messages are not ready for primetime: current Telegram clients can
render Bot API 10.1 rich messages as blank/unsupported bubbles and make
them hard to copy as plain text, which is worse than the legacy
MarkdownV2 path for command snippets and mobile handoffs. Default the
rich_messages toggle to False so replies stay on the copyable legacy
path; users opt in per bot via platforms.telegram.extra.rich_messages:
true. Updates adapter, gateway config default, example config, English +
zh-Hans docs, and the default/opt-in tests.
Inbound image/audio/video payloads were buffered fully into process memory
before being written to the cache, with no size limit. A large upload
(Discord Nitro allows 500 MB) or a remote media URL in an inbound message
pointing at a huge file could spike RAM and OOM-kill the gateway.
Enforce a configurable cap in the shared cache helpers (gateway/platforms/
base.py) so the protection holds across every platform adapter, not one:
- cache_image/audio/video_from_bytes reject oversized payloads before writing
(video was the gap in the original report — now covered).
- cache_image/audio_from_url stream the body, rejecting on an oversized
Content-Length header and re-checking the running total per chunk so an
absent/lying header can't smuggle an unbounded body past the cap.
- Discord's _read_attachment_bytes checks att.size up front, so an oversized
attachment is rejected before any bytes are pulled into memory.
Configurable via gateway.max_inbound_media_bytes in config.yaml (default
128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml.
Salvaged and extended from @sgaofen's PR #13341 (the original report and the
shared-helper approach). Reapplied onto current main (Discord adapter has
since moved to plugins/platforms/discord/), the configurable knob moved from
an env var to config.yaml, and the video cache helper added.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
hermes config show printed the model dict raw via print(), bypassing the
logging redactor; a custom-provider api_key (e.g. Cloudflare cfut_...) was
shown in plaintext even with security.redact_secrets=true. Opaque tokens
don't match any vendor-prefix regex, so structural key-name masking is
required.
- Add redact_config_value(): recursively masks credential-shaped keys
(api_key/token/secret/... exact-match) via mask_secret.
- Wrap the show_config model dump in it.
- Mask the set_config_value echo when the leaf key is credential-shaped
(config set model.api_key routes to config.yaml, lowercase misses the
.env allowlist).
The gateway pre-compression hygiene valve force-compressed any session
crossing 400 messages regardless of token usage. On large-context (1M+)
models doing many short, message-dense turns, a healthy session at ~16%
token usage could hit 400 messages and get force-compressed — and the
compression summary's stale Active Task could then bleed into the next
turn.
The valve's actual purpose is to break a death spiral: when API calls
keep disconnecting on an oversized session, no token-usage data arrives,
the token threshold never fires, and the transcript grows unbounded.
It's a count-based floor for that pathological case only. 400 was tuned
for ~200K-context models and is far too low for modern large-context
sessions. Raise the default to 5000 — still well clear of any death
spiral, but no longer firing on legitimate long conversations.
The value remains fully configurable via compression.hygiene_hard_message_limit.
The OpenAI-compatible API server only enforced a hardcoded cap of 10
concurrent runs on /v1/runs, leaving /v1/chat/completions and
/v1/responses unbounded — a request flood could exhaust CPU, memory,
and upstream LLM quota (#7483).
- Add gateway.api_server.max_concurrent_runs (config.yaml, default 10,
0 disables). No env var.
- Shared concurrency gate across all three agent-serving endpoints,
counting both the chat/responses in-flight counter and the /v1/runs
stream set. Returns OpenAI-style 429 + Retry-After when at the cap.
- Remove the dead hardcoded _MAX_CONCURRENT_RUNS class attribute.
Closes#7483.
Review nit (yoniebans): the config.py comment still said compaction is
'lossy: the pre-compaction transcript is discarded, matching Claude Code /
Codex' — leftover from the original destructive design. The shipped behavior
is soft-archive: lossy for the LIVE context (what the model reloads), but the
pre-compaction turns are kept on disk (active=0, compacted=1), searchable via
session_search and recoverable. Comment now says so. Comment-only; no behavior
change.
Context compression today rewrites the message list AND rotates the
session id — it ends the session, forks a parent_session_id child, and
renumbers the title (name -> name #2). That moving identity key is the
root cause of a whole bug cluster: /goal lost (#33618), pending response
lost at the split (#14238), orphan sessions (#33907), TUI sid desync
(#36777), FTS search gaps + duplicate sidebar entries (#45117), null
continuation cwd (#42228), and title-rename dead-ends (#48989). It also
forced a large defensive apparatus (compression lock, contextvar/env/
logging triple-sync, orphan finalization, gateway SessionEntry
re-propagation, tip projection) whose only job is surviving a
mid-conversation id change.
Add a compression.in_place config flag (default False during rollout).
When True, compaction rewrites the transcript and rebuilds the system
prompt but keeps the SAME session_id: no end_session, no child row, no
title renumber, no contextvar/logging re-sync, no memory/context-engine
session-switch. The conversation keeps one durable id for life, like
Claude Code / Codex. Compaction is lossy by design — the pre-compaction
transcript is summarized away, not archived.
The rotation path is unchanged when the flag is off (moved verbatim into
an else branch). Staged rollout: this PR ships the option behind a
default-off flag for live validation; a follow-up flips the default and
deletes the now-redundant rotation machinery, superseding the 14 open
band-aid PRs in this area.
- hermes_cli/config.py: add compression.in_place (default False), documented
- agent/agent_init.py: resolve the flag -> agent.compression_in_place
- agent/conversation_compression.py: branch compress_context() on the flag
- tests/run_agent/test_in_place_compaction.py: in-place invariants +
rotation regression guard + config default
The pre-flush of current-turn messages (#47202) runs in BOTH modes, so no
boundary data loss. Prompt-cache invariant preserved: the system-prompt
rebuild is the same single sanctioned invalidation that already happens
during compaction — no NEW invalidation. Message alternation preserved.
Consolidated findings from three independent reviewers (Codex, Claude Code, a
Hermes subagent w/ the hermes-agent-dev skill):
- BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently
dropping post-build-injected memory-provider (mem0/honcho/…) and context-
engine (lcm_*) tools on every refresh. Now additive-preserving: re-applies
the same injectors agent_init uses, staged on locals and published atomically.
- Re-injection now honors the #5544 enabled_toolsets gate for context-engine
tools, so a restricted-toolset platform can't get lcm_* leaked back in.
- Atomic read-diff-publish under one lock: the returned `added` set and the
(tools, valid_tool_names) pair are consistent even under concurrent callers
(no half-swap, no TOCTOU).
- background_review fork opts out (_skip_mcp_refresh) so its byte-identical
tools[] cache parity with the parent is preserved.
- CLI /reload-mcp routed through the shared helper (was a 4th divergent copy
with the same clobber bug + missing disabled_toolsets).
- Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user
just enabled in config this session is picked up; automatic paths reuse the
agent's build-time selection.
- mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the
between-turns refresh, so the startup wait is only a small turn-1 UX bump
rather than a heavy dead-server latency penalty.
- has_registered_mcp_tools checks registered TOOLS (not connected servers) so a
zero-tool/prompt-only server doesn't make the per-turn hook fire forever.
- Tests: rewrote the thread-safety test to actually exercise the write path
(alternating tool sets), added the #5544-gate regression, the memory/context
preservation regression, and a "callable next turn via valid_tool_names"
contract; removed a dead monkeypatch line.
MCP servers that connect after the agent's one-time tool snapshot were
invisible for the whole session. Two root causes, fixed together:
1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers
commonly take 2-6s on a cold connect, so they missed the window and
their tools never entered the agent's snapshot. `thread.join(timeout)`
already returns the instant discovery completes, so raising the bound
costs ~0s for the common case (no MCP / fast servers) and only ever
blocks for a genuinely-pending server, capped so a dead server can't
freeze startup. The bound is now configurable via
`mcp_discovery_timeout` (config.yaml, default 5.0s).
2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI
`reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh
thread), and the late-refresh detected changes by tool COUNT — missing
an equal-size add/remove swap. Consolidated into one shared
`tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by
tool NAME, mutates the agent under a lock (thread-safe), and respects
the agent's own enabled/disabled toolsets.
The late-binding refresh keeps its pre-first-turn cache-safety guard:
it never rebuilds the tool list once a turn has started, so the cached
prompt prefix is never invalidated mid-conversation.
Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the
name-based diff, in-place mutation, agent-scoped filtering, thread
safety, and the config-driven discovery bound (incl. instant-return
when nothing is pending). 75 passed across the touched areas.
- show_config prints an administrator header naming the managed source and
lists the pinned config/env keys when a scope is active (silent otherwise).
- hermes doctor gains a managed_scope_check under Configuration Files that
reports the resolved managed dir + pinned key counts, and flags a
HERMES_MANAGED_DIR redirect (the documented foot-gun).
- set_config_value hard-rejects a managed config key (D2) and names the
source, exiting non-zero.
- save_env_value / remove_env_value refuse a managed env key.
- save_config strips managed leaves from a bulk write (mechanical safety net)
with a warning, so the unmanaged remainder still persists.
New _strip_dotted_keys helper drives the bulk-save pruning. All guards are
distinct from and layered after the existing is_managed() package-manager
write-lock.
_load_config_impl now deep-merges the managed config.yaml on top of the
expanded user config so managed leaves win while sibling keys stay
user-controlled (leaf-level merge, D3). Managed values are expanded against
the process env only, never user-defined ${VAR}, so a user can't shadow a
managed literal. The managed file's (mtime,size) is folded into the load
cache key so editing it invalidates the cache. This inverts the usual
env-over-config precedence for pinned keys by design (see design doc §4.1).
PR #49056 set the default to 0, which reverts the #45592 idle-clock fix:
without a periodic invalidate, prompt_toolkit stops repainting the bottom
chrome during idle and the status bar goes stale/disappears after a turn.
Restore 1.0 as the default for everyone. The config knob stays — users on
emulators where the per-second redraw fights auto-scroll (#48309) can set
display.cli_refresh_interval: 0 to opt out.
Commit 6724daa2c added refresh_interval=1.0 to keep the idle clock
ticking, but unconditional 1 Hz redraws in non-fullscreen prompt_toolkit
mode cause terminal emulators (Xshell, iTerm2, Windows Terminal) to
auto-scroll to the bottom on every tick — breaking scroll-up to read
history.
Drive it from display.cli_refresh_interval (0 = disabled, the default)
so users who want the ticking clock can opt in without affecting everyone.
Fixes: #48309
Related: 6724daa2c, 8972a151a
Add platform_hints config so an admin can append to or replace Hermes'
built-in platform hint for a single messaging platform (WhatsApp, Slack,
Telegram, ...) without affecting other platforms. Enables enterprise
managed profiles to steer platform-aware skills (e.g. invoke a custom
table-formatting skill on WhatsApp where Markdown tables don't render)
while leaving Telegram/Slack/CLI behavior unchanged.
- hermes_cli/config.py: document platform_hints in DEFAULT_CONFIG
- agent/agent_init.py: load platform_hints -> agent._platform_hint_overrides
- agent/system_prompt.py: _resolve_platform_hint() applies append/replace
(replace wins; bare string = append shorthand); defensive on bad config
- tests: 16 cases covering append/replace/shorthand/isolation/malformed
Override only affects the platform-hint segment of the system prompt;
SOUL/context/memory tiers and general instructions are unchanged.
When a worker calls kanban_create from inside a session that has a
persistent delivery channel, the originating session is now subscribed
to the new task's completion/block events automatically. The agent
that dispatched the task gets notified instead of having to poll.
- Gateway sessions (telegram/discord/slack): HERMES_SESSION_PLATFORM +
HERMES_SESSION_CHAT_ID ContextVars, set by the messaging gateway.
- TUI / desktop sessions: HERMES_SESSION_KEY in the subprocess env.
The TUI notification poller keys on platform='tui' + chat_id=<key>.
- CLI / cron / test: no persistent channel, no subscription.
Gated by kanban.auto_subscribe_on_create in config.yaml (default True).
Disable to mirror pre-feature behaviour — users who want explicit
kanban_notify-subscribe calls per task can set it to false. This
config gate addresses the design concern that got PR #19718 reverted
upstream (unconditional implicit auto-subscribe on tool-driven
kanban_create was too aggressive for orchestrator users).
HERMES_SESSION_ID is intentionally not a fallback channel — it is
set by ACP/agent subprocess telemetry for every invocation, not just
TUI, so treating it as a notification target would auto-subscribe
every CLI session and re-introduce the over-eager behaviour.
The kanban_create response now includes a 'subscribed' bool so
orchestrators can react if subscription failed (e.g. by falling
back to explicit kanban_notify-subscribe or to polling).
Includes 6 tests covering the gateway / TUI / CLI / partial-context /
gated / add_notify_sub-failure paths. All 90 tests in
test_kanban_tools.py pass; 509 broader kanban tests pass.
* Port from cline/cline#11514: encourage parallel tool calls
Add a universal system-prompt guidance block telling the model to batch
independent tool calls (reads, searches, web fetches, read-only commands)
into a single assistant turn instead of one call per turn. The runtime
already executes independent batches concurrently (read-only tools always;
non-overlapping path-scoped file ops); the open-source system prompt had
nothing steering the model to PRODUCE the batch. Fewer round-trips means
less resent context, which compounds over a long conversation.
- prompt_builder.py: new PARALLEL_TOOL_CALL_GUIDANCE block (short, static,
cache-amortised) modeled on TASK_COMPLETION_GUIDANCE.
- system_prompt.py: inject right after the task-completion block, gated by
agent.valid_tool_names + the new toggle.
- agent_init.py: read agent.parallel_tool_call_guidance (default True).
- config.py: add the default under the agent section.
- test_prompt_builder.py: behavior-contract tests (batching steer, dependent
carve-out, length bound) — invariants, not wording snapshots.
Adapted from Cline's TypeScript tool-surface guidance to hermes-agent's
Python prompt-assembly architecture and config-over-env conventions.
* fix(desktop): never persist or restore a named custom provider as bare "custom"
Custom providers vanish from the Desktop/TUI model picker with
"No LLM provider configured" — repeatedly fixed (#44062, #44109, #45578)
and repeatedly regressed (#44022, #47714) because every fix only recovered
the entry identity from a persisted base_url. When a session is
persisted/restored with the resolved provider "custom" and NO base_url, bare
"custom" leaked through verbatim; resolve_runtime_provider("custom") routes to
the OpenRouter default URL with no api_key, so the next turn/resume dies.
Bare "custom" is the resolved billing class shared by every named providers:/
custom_providers: entry — it is not a routable identity. Centralize the
"never let bare custom escape" invariant in one helper,
runtime_provider.canonical_custom_identity(), and apply it at all four leak
sites in tui_gateway/server.py:
- _ensure_session_db_row — the ORIGIN: first DB write seeds the bad row
- _runtime_model_config — live persist
- _stored_session_runtime_overrides — resume restore (heals old rows; drops
unrecoverable bare custom so resume falls back to config default)
- _make_agent — rebuild / per-turn
The helper recovers custom:<name> from the endpoint URL when present, else
from config.model.provider (the durable identity left when no base_url
survived). Regression tests in test_custom_provider_session_persistence.py
lock the no-base_url vector at every site so it cannot regress again.
Defense-in-depth fix for the silent wipe of ~/.hermes/ documented in
#48200. A `hermes update --yes` run silently destroyed a user's
.env, MEMORY.md, kanban.db, custom skills, and scripts. Two changes:
1. `_rmtree_writable` in tools/skills_sync.py now refuses to rmtree
anything outside SKILLS_DIR (the HERMES_HOME/skills/ root).
All five call sites pass paths under SKILLS_DIR, so the guard is
a no-op for current code and a loud, recoverable failure for
any future regression (bad path join, malicious bundled
manifest, stale path in scope after an exception).
2. The default `updates.pre_update_backup` flips from false to
true in hermes_cli/config.py. A few minutes of zip per update
is negligible compared to silent total data loss. Still
overridable; --no-backup still works for one-off opt-out.
Five new tests in TestRmtreeWritableScopeGuard (root path,
hermes home, sibling dir, skills root itself, subdir) plus a
flipped `test_default_enabled_creates_backup` in test_backup.py.
178/178 tests pass in the two affected files. Public method
signatures unchanged, no test-stub blast radius.
Closes#48200
Phase 4F (F.1 + F.2 + F.3, agent side). F.4 is the operator-run live smoke
(needs a NAS deployment); recorded in the PR, not code.
F.1 — on_jobs_changed wiring:
- cron/scheduler.py: _notify_provider_jobs_changed() — resolve the active
provider, call on_jobs_changed(), swallow errors. Lives in scheduler.py (not
jobs.py) so the store stays free of provider imports (no import cycle).
- Wired at the consumer surfaces AFTER a successful mutation: the cronjob model
tool (tools/cronjob_tools.py, create/update/remove/pause/resume) — which the
`hermes cron` CLI also routes through — and the REST handlers
(gateway/platforms/api_server.py, same five). Built-in's no-op default = zero
behavior change on the default path. Sleeping-agent direct jobs.json writes
(no tool/CLI/REST) are covered by reconcile-on-wake in start().
F.2 — config: cron.chronos.{portal_url,callback_url,expected_audience,
nas_jwks_url}. All non-secret; the agent holds no scheduler creds and the
outbound provision call reuses the existing Nous token (no token key). Additive
deep-merge key, no version literal.
F.3 — docs:
- docs/chronos-managed-cron-contract.md: authoritative agent↔NAS wire contract
(the three agent-cron endpoints + inbound /api/cron/fire + the 3-hop trust
model + at-most-once/re-arm semantics). This is what the NAS-side agent builds
against.
- cron-internals.md: "Managed cron (Chronos) for scale-to-zero" section.
- cli-commands.md: cron.provider accepts chronos + the cron.chronos.* keys.
- User docs name no scheduler vendor (QStash is a NAS-internal detail).
INVARIANT re-verified: zero qstash/upstash hits across plugins/cron, gateway,
hermes_cli, tools, website/docs (the one remaining repo hit is an unrelated
Context7 MCP comment in tools/mcp_tool.py).
Tests: test_jobs_changed_notify (5) — notify calls provider hook, swallows
errors, built-in harmless, tool create/remove notify. Full cron + chronos +
webhook + config + api_server_jobs suites green (504 in the cron+chronos+webhook
run).
The install method (docker/git/pip/...) describes the *running binary*, but
detect_install_method() read it from $HERMES_HOME/.install_method — a shared
DATA directory. The Docker docs deliberately bind-mount $HERMES_HOME
(~/.hermes:/opt/data) so config/sessions/memory persist and can be shared with
a host-side Desktop/CLI install.
When a containerized gateway and a host install share one $HERMES_HOME, the
home-scoped stamp is a single slot describing two installs: the published image
stamps 'docker' on every boot, the host install then reads 'docker' and the
in-app updater refuses to run 'hermes update' ("doesn't apply inside the Docker
container"). Reinstalling the Desktop app from the DMG doesn't help because the
contaminated stamp is re-read every time.
Fix (option 1 — code-scoped stamp):
- detect_install_method() reads <install tree>/.install_method first (next to
the running code, immune to the shared data dir). It falls back to the legacy
$HERMES_HOME stamp for back-compat, but IGNORES a 'docker' home stamp when
not actually containerized — so already-poisoned shared homes self-heal.
- stamp_install_method() writes the code-scoped stamp.
- install.sh stamps $INSTALL_DIR instead of $HERMES_HOME.
- Dockerfile bakes 'docker' into /opt/hermes/.install_method at build time
(inside the immutable block); stage2-hook.sh no longer writes the home stamp
and proactively removes a stale 'docker' one to heal existing shared homes.
Genuine containers still resolve to 'docker' (baked stamp, or legacy home stamp
honored when containerized). Unstamped installs in generic containers still fall
through to git/pip (preserves the #34397 fix).
Phase 2 of the pluggable cron-scheduler refactor. Still no call-site changes;
this wires up provider SELECTION with a hard safety net.
Task 2.1: cron.provider config key (hermes_cli/config.py), empty = built-in.
Additive key — deep-merge picks it up into existing configs with no version
bump (verified: load_config() yields the key on a pre-existing config.yaml).
Task 2.2: plugins/cron/__init__.py — discovery machinery cloned near-verbatim
from plugins/memory/__init__.py, retargeted at CronScheduler /
register_cron_scheduler. Bundled (plugins/cron/<name>/) + user
(/plugins/<name>/) dirs, bundled wins collisions. The built-in is
NOT discovered here — it's core, so the fallback can't be removed.
Task 2.3: resolve_cron_scheduler() in cron/scheduler_provider.py — reads
cron.provider and ALWAYS degrades to built-in (missing / unavailable / load
error / typo all fall back with a warning). cron can never be left without a
trigger.
Deviation from plan: the plan's resolver snippet used cfg_get("cron.provider")
(dotted-string form). The real cfg_get signature is cfg_get(cfg, *keys,
default=) — corrected to cfg_get(load_config(), "cron", "provider", default=""),
matching plugins/memory/__init__.py:349. Tests monkeypatch load_config (not
cfg_get) so the real traversal runs.
Tests: default key empty, discovery returns list, unknown load returns None,
and the four resolver paths (empty→builtin, no-section→builtin,
unknown→builtin, unavailable→builtin, available→used). Full tests/cron/: 453
passed; config suite green (additive key, no migration break).
Context files (AGENTS.md, CLAUDE.md, .hermes.md, .cursorrules, SOUL.md) were
hard-capped at a flat 20K chars before head/tail truncation. Among the agent
harnesses we track, only Codex caps project docs at all (32 KiB); Claude Code,
OpenCode, and Cline load them whole. The flat 20K predates large context
windows and silently truncates real-world AGENTS.md files.
B — dynamic cap: when context_file_max_chars is unset (now the shipped
default), the cap scales with the model's context window
(ctx_tokens * 4 * 0.06, floor 20K, ceiling 500K). Small-context models stay at
the historical 20K; a 200K model gets 48K; large models stop truncating real
docs. An explicit context_file_max_chars still wins. Context length is resolved
once per conversation (stable -> prompt cache untouched).
C — when truncation does happen, the marker now names the concrete file path
and tells the agent to read_file it for the full content.
Validation: 154 targeted tests + full agent/ + hermes_cli/ + test_config
(0 failures); E2E against a real 60K AGENTS.md confirms small windows truncate
with the path-bearing marker, large windows load whole, and the system prompt
is byte-stable across rebuilds.
The curator now defaults to prune-only: the deterministic inactivity pass
(mark stale / archive long-unused skills) still runs whenever the curator is
enabled, but the opinionated LLM umbrella-building consolidation fork is OFF
by default.
- agent/curator.py: add DEFAULT_CONSOLIDATE=False + get_consolidate(); gate
the forked aux-model review in run_curator_review behind it (new consolidate
param, None=read config). When off, the LLM pass is skipped entirely (no
aux-model cost); the run is still recorded and reported.
- config.py: add curator.consolidate (default false); v29->v30 migration seeds
the key for existing installs without clobbering a user-set value.
- hermes_cli/curator.py: 'hermes curator run --consolidate' override; status
shows consolidate state; prune-only notice on run.
- docs + tests.
Follow-up to salvaged PR #41633: the timestamp prefix injection was
unconditional. Gate the in-context render behind
gateway.message_timestamps.enabled (default false) at both the live-message
and history-replay sites; timestamp metadata is still captured + persisted
regardless so the toggle can be flipped on later. Add DEFAULT_CONFIG entry,
docs, and gate tests.
PROBLEM: Automatic context files such as SOUL.md and AGENTS.md were capped by a hardcoded CONTEXT_FILE_MAX_CHARS value. Amy's local fork had raised that constant from 20K to 25K so a larger SOUL.md would not be silently truncated, but the hardcoded 25K value changed upstream default behavior and made the patch less generally useful.
SOLUTION: Restore the upstream-compatible 20K default, add a context_file_max_chars config setting for users who intentionally keep larger identity/project-context files, keep chat-visible truncation warnings, and document the new setting. Tests cover the default, config override, explicit max_chars precedence, and the warning text.
Streamed Telegram replies that finalize through editMessageText were
converted to MarkdownV2, which has no table syntax and rewrites pipe
tables into bullet lists — users saw a table while streaming that
collapsed to a list at the last moment.
Finalize now edits the existing preview IN PLACE via Bot API 10.1's
editMessageText rich_message parameter when the content has constructs
the legacy path degrades (tables, task lists, <details>, block math).
No fresh send + delete, so no duplicate-preview flicker — the reason
#46206 reverted the fresh-final re-send path. prefers_fresh_final_streaming
stays False; the in-place edit replaces it.
- _needs_rich_rendering(): rich reserved for table/task-list/details/math
(adapted from #45995, @YonganZhang); plain replies stay on MarkdownV2.
- _try_edit_rich(): editMessageText + rich_message via do_api_request,
mirroring _try_send_rich's fallback/latch/transient contract.
- edit_message finalize tries rich in place before the 4,096 overflow
pre-flight (rich cap is 32,768), falling back to legacy on rejection.
- rich_messages default flipped back to True (DEFAULT_CONFIG + adapter).
- docs (en + zh-Hans) + cli-config example updated to default-on.
Closes the root cause behind #45911 / #46009.
* feat(delegation): async background subagents via delegate_task(background=true)
delegate_task(background=true) dispatches a subagent that runs in the
background and returns a handle immediately, so the user and model keep
working while it runs. The full result — plus the original task source —
re-enters the conversation as a new turn when the subagent finishes,
riding the same completion-queue rail as terminal background processes.
- tools/async_delegation.py: daemon-executor registry, capacity cap,
rich self-contained completion event pushed onto the shared
process_registry.completion_queue (type='async_delegation').
- delegate_tool.py: background param + single-task dispatch branch;
batch async rejected (v1).
- process_registry.py: format_process_notification renders the rich
task-source block (goal/context/toolsets/model/status/result).
- gateway/run.py: dedicated _async_delegation_watcher drains + injects
results into the originating session (idle + post-turn), session_key
routing enrichment, shutdown interrupt of dangling delegations.
- config: delegation.max_async_children (default 3).
Reuses the existing idle-drain wiring rather than mutating a running
agent loop, preserving message-role alternation and prompt-cache
invariants. 13 targeted tests; CLI + gateway paths E2E-verified.
* test(delegation): make async non-blocking tests environment-independent
CI 'test (5)' flaked on a cold, 8-worker runner: the first
delegate_task(background=true) call measured 2.27s of one-time setup
(config load + child-agent construction + imports), tripping the
elapsed < 1.0 wall-clock assertion. That assertion was testing setup
overhead, not blocking.
Replace the wall-clock thresholds with the real invariant: dispatch
returns while the child is still gated (active_count == 1, completion
queue empty), which a synchronous impl could not do. Keep only a loose
4s sanity backstop well under the runner's 5s gate.
* fix(delegation): harden async background delegation
Follow-up review fixes:
- Detach background child from parent._active_children at dispatch —
otherwise parent-turn interrupts (Ctrl+C, mid-turn steering), cache
evicts (release_clients), and session close (/new) kill/close the
detached subagent mid-run, defeating the point of background mode.
Lifecycle is owned by the async registry's interrupt_fn.
- Make the capacity check atomic with the record insert (TOCTOU: two
concurrent dispatches could both pass active_count() and exceed the cap).
- TUI dedup: key async_delegation events by delegation_id — the
fallthrough keyed them all as ("", type), suppressing every completion
after the first in the desktop/TUI status feed.
- CLI /stop now interrupts running background delegations and /agents
lists them (they live outside the process registry and were invisible).
- Drop stray unbalanced ']' line from the re-injection block and the
unused _ASYNC_DEFAULT import.
Tests: detach-at-dispatch + concurrent-capacity race added (15 total in
test_async_delegation.py); 137 delegate + 140 process-registry/notify/watch
+ 7 TUI dedup tests pass.
* fix(delegation): harden async background completion drains
Subagents doing legitimate heavy work (deep code reviews, research
fan-outs, slow reasoning models) were routinely killed at the blanket
600s child_timeout_seconds cap while making steady progress (e.g. 36
API calls completed when the axe fell). Failures should come from what
the child is actually doing — API errors, tool errors, iteration
budget — not a delegation-level stopwatch.
- DEFAULT_CHILD_TIMEOUT: 600 -> None; Future.result(timeout=None)
blocks until the child finishes
- config default delegation.child_timeout_seconds: 600 -> 0
(0/negative = disabled; positive opts back in, floor 30s unchanged)
- stuck-child protection unchanged: the heartbeat staleness monitor
still stops refreshing parent activity so the gateway inactivity
timeout fires on a truly wedged worker; the 0-API-call diagnostic
dump still works when a cap is configured
- docs updated (EN + zh-Hans)