Commit graph

1332 commits

Author SHA1 Message Date
konsisumer
d1d0ef6dbd fix(gateway): persist user message on transient agent failures (#7100)
The #1630 fix introduced a blanket ``agent_failed_early`` transcript skip
to prevent context-overflow sessions from looping.  That guard also
triggers for unrelated transient failures (429 rate limits, read
timeouts, connection resets, provider 5xx) which have nothing to do with
session size — and it silently drops the user's message, so the agent
has no memory of the last turn on retry.

Split the failure classification in ``GatewayRunner._run_agent``:

* Context-overflow (``compression_exhausted`` flag, explicit
  context-length phrases, or generic 400 with a long history) → keep
  the existing skip, preserving the #1630/#9893 fix.
* Anything else that failed → persist just the user message so the
  conversation survives a retry.

Use specific multi-word phrases (``context length``, ``token limit``,
``prompt is too long``, etc.) to match ``run_agent.py``'s own
classifier; bare ``exceed`` false-positively flagged "rate limit
exceeded" as context overflow.

Covered by new tests in ``tests/gateway/test_7100_transient_failure_transcript.py``
and the existing #1630 suite still passes.
2026-04-30 04:32:33 -07:00
Rob Moen
0dd373ec43 fix(context): honor model.context_length for Ollama num_ctx and all display paths
When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).

Root cause: two separate context values existed independently:
  - context_compressor.context_length = config value (e.g. 65536) ✓
  - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config

Changes:

1. Cap Ollama num_ctx to config context_length (run_agent.py)
   When model.context_length is explicitly set and no explicit
   ollama_num_ctx override exists, cap the auto-detected GGUF value
   to the user's context_length. This is the core fix — it prevents
   Ollama from allocating more VRAM than the user budgeted.

2. Pass config_context_length through all secondary call sites
   Several paths called get_model_context_length() without the config
   override, falling through to the 256K default fallback:
   - cli.py: @-reference expansion and /model switch display
   - gateway/run.py: @-reference expansion and /model switch display
   - tui_gateway/server.py: @-reference expansion
   - hermes_cli/model_switch.py: resolve_display_context_length()

3. Normalize root-level context_length in config (hermes_cli/config.py)
   _normalize_root_model_keys() now migrates root-level context_length
   into the model section, matching existing behavior for provider and
   base_url. Users who wrote `context_length: 65536` at the YAML root
   instead of under `model:` had it silently ignored.

4. Fix misleading comments (agent/model_metadata.py)
   DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
   as two comments stated.

Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
2026-04-30 04:31:23 -07:00
Bartok9
fbb3775770 fix(gateway): enforce auth check in busy-session path to prevent unauthorized injection (#17775)
The busy-session handler (_handle_active_session_busy_message) bypassed the
authorization gate that the cold path enforces via _is_user_authorized(). In
shared-thread contexts (Slack threads, Telegram forum topics, Discord threads)
where thread_sessions_per_user=False (the default), all participants share one
session_key. An unauthorized user posting in the same thread as an authorized
user would hit the active-session branch, skip the auth check, and have their
text merged into _pending_messages or injected via agent.interrupt().

This commit adds the same _is_user_authorized() check at the top of the busy
handler, before any message queuing, steering, or interrupt logic. Unauthorized
messages are silently dropped (return True) with a warning log — matching the
cold-path behavior.

Affected platforms: Slack, Telegram, Discord, any adapter with shared-session
thread contexts.

Closes #17775
2026-04-30 04:29:15 -07:00
Teknium
3de8e21683 feat(gateway): native send_multiple_images for Telegram, Discord, Slack, Mattermost, Email
Ports PR #17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.

Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
  GIFs peeled off and routed through send_animation (albums don't support
  animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
  over); URL images downloaded into BytesIO so they render inline; forum
  channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
  respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
  chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
  SMTP size governs); remote URLs remain linked in body (parity with
  existing send_image)

All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.

Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.

Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.

Co-authored-by: Maxence Groine <maxence@groine.fr>
2026-04-30 04:28:08 -07:00
Maxence Groine
04ea895ffb feat(gateway/signal): add support for multiple images sending
Adds a new `send_multiple_images` method to the ``BasePlatformAdapter``
that implements the default "One image per message" loop and allows for
platform-specific overriding.

Implements such an override for the Signal adapter, batching images
and trying (best-effort) to work around rate-limits for voluminous
batches using a specific scheduler.

Also implements batching + rate-limit handling in the `send_message`
tool.

New tests added for the Signal adapter, its rate-limit scheduler and the
`send_message` tool
2026-04-30 04:28:08 -07:00
Teknium
411f586c67 refactor(gateway): extract _float_env helper for env-var float casts
Follow-up to the try/except guards added in the previous commit.
Four sibling call sites all read HERMES_AGENT_TIMEOUT /
HERMES_AGENT_TIMEOUT_WARNING / HERMES_AGENT_NOTIFY_INTERVAL via the
same read-env-or-fallback pattern, so factor it into _float_env(name,
default) alongside the existing _auto_continue_freshness_window()
helper.
2026-04-30 03:32:37 -07:00
vominh1919
ca87c822ed fix(gateway): guard yaml.safe_load and float() env var casts against crash
Two defensive fixes in gateway/run.py:

1. yaml.safe_load returning None on empty config files (line 12706):
   GatewayConfig.from_dict(data) crashes with AttributeError when the YAML
   file is empty because safe_load returns None. All 6 other yaml.safe_load
   call sites already use `or {}` — this one was missed.
   Impact: gateway fails to start with empty --config file.

2. float() on env vars without ValueError guard (lines 3951, 11757, 11805,
   11807): HERMES_AGENT_TIMEOUT, HERMES_AGENT_TIMEOUT_WARNING, and
   HERMES_AGENT_NOTIFY_INTERVAL are cast via float() directly from
   os.getenv(). A typo (e.g. "abc") raises ValueError and crashes the
   agent turn or gateway startup.
   Impact: single misconfigured env var crashes the entire gateway.
2026-04-30 03:32:37 -07:00
briandevans
f44f1f9615 fix(gateway): preserve session guard across in-band drain handoff
When the in-band pending-message drain spawns a fresh task and
transfers ownership via _session_tasks[session_key] = drain_task,
the original task still unwinds through the finally block.  The
drain task picks up the same interrupt_event in its own
_process_message_background entry, so an unconditional
_release_session_guard(session_key, guard=interrupt_event) at the
end of the finally matches and deletes _active_sessions[session_key]
while the drain task is still pending its first await.

A concurrent inbound message arriving in that handoff window passes
the Level-1 guard (no entry exists) and spawns a second
_process_message_background for the same session — two agents on
one session_key, duplicate responses, duplicate tool calls.

Fix: only call _release_session_guard when the current task still
owns _session_tasks[session_key].  When ownership has been
transferred to a drain task, leave _active_sessions populated; the
drain task's own lifecycle releases it.  This mirrors the
late-arrival drain path in the same finally block, which already
leaves both entries alone after handing off.

Also reorder stdlib imports in the new regression test file to
match the gateway test convention (stdlib before third-party).

Regression test: capture _active_sessions[sk] identity at every
handler entry across a 2-step in-band drain chain and assert the
guard Event identity stays the same.  Pre-fix, the original task's
finally deletes the entry, the drain task falls through to the
`or asyncio.Event()` branch, and a fresh Event is installed —
identity diverges.  Post-fix, the entry is preserved and the drain
task reuses the original Event.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:27:08 -07:00
briandevans
663ba9a58f fix(gateway): drain pending messages via fresh task, not recursion (#17758)
`_process_message_background` finished a turn, found a queued
follow-up, and drained it via `await
self._process_message_background(pending_event, session_key)`.  Each
chained follow-up added a frame to the call stack instead of starting
fresh.  Under sustained pending-queue activity (e.g. a user sending
follow-ups faster than the agent finishes turns) the C stack would
exhaust at ~2000 nested frames and SIGSEGV the process.

Mirror the late-arrival drain pattern that already exists in the same
function: spawn a new `asyncio.create_task(...)` for the pending event
and return so the current frame can unwind.  The new task takes
ownership via `_session_tasks[session_key]`.

The late-arrival drain in `finally` could now race with the in-band
drain across the `await typing_task` / `await stop_typing` window, so
add a guard: if `_session_tasks[session_key]` is no longer the current
task, an in-band drain already spawned a follow-up task — re-queue the
late-arrival event so that task picks it up after its current event,
instead of spawning a second concurrent task for the same session_key.

Regression test (`test_pending_drain_no_recursion.py`) chains 12
follow-ups and asserts the recorded
`_process_message_background` stack depth stays bounded at handler
entry.  Pre-fix: depths grow linearly `[1,2,3,…,12]`.  Post-fix: all
depths are `1`.

`test_duplicate_reply_suppression::test_stale_response_suppressed_when_interrupted`
called `_process_message_background` directly and implicitly relied on
the old recursive `await` semantic — updated to wait for the spawned
drain task before checking the sent list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:27:08 -07:00
Teknium
aa7bf329bc
feat(gateway): centralize audio routing + FLAC support + Telegram doc fallback (#17833)
Extracted from PR #17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
2026-04-30 01:32:31 -07:00
emozilla
718e4e2e7e fix(plugins): register dynamically-loaded modules in sys.modules before exec
Dashboard plugin API routes (web_server._mount_plugin_api_routes) and
gateway event hooks (gateway.hooks.HookRegistry.discover_and_load) both
loaded Python files via importlib.util.spec_from_file_location +
exec_module without registering the resulting module in sys.modules.

That breaks any plugin or hook handler that uses `from __future__ import
annotations` together with a Pydantic BaseModel / dataclass / anything
that introspects `__module__`: at first request Pydantic tries to
resolve string-form type hints against the defining module's namespace,
can't find it by name, and raises:

  PydanticUserError: TypeAdapter[...] is not fully defined;
  you should define ... and all referenced types,
  then call `.rebuild()` on the instance.

This is what broke the kanban dashboard's 'triage' button — POST
/api/plugins/kanban/tasks validated against CreateTaskBody (a Pydantic
model in a file using `from __future__ import annotations`) and
returned 500 on every click.

The fix, applied symmetrically to both loaders:

  1. Compute module_name once.
  2. Register the module in sys.modules BEFORE exec_module.
  3. On exec_module failure, pop the half-initialized stub so subsequent
     reloads don't pick up broken state.

GETs were unaffected because they don't build a body TypeAdapter, which
is why this only surfaced when users started POSTing.
2026-04-29 23:34:35 -07:00
Stephen Schoettler
f73364b1c4
fix(ci): stabilize main test suite regressions (#17660)
* fix: stabilize main test suite regressions

* test(agent): update MiniMax normalization expectation

* test: stabilize remaining CI assertions

* test: harden config helper monkeypatching

* test: harden CI-only assertions

* fix(agent): propagate fast streaming interrupts
2026-04-29 23:18:55 -07:00
Teknium
71c8ca17dc chore(salvage): strip duplicated/merge-corrupted blocks from PR #17664
Removes drive-by duplication that accumulated during the contributor
branch's multiple rebases. All runtime-benign (dict last-wins,
redefinition last-wins) but left dead source that would confuse
reviewers and maintainers.

Surgical in-place de-duplication (kept PR's intentional additions,
removed only the doubled copy):

* hermes_cli/auth.py: duplicate "gmi" + "azure-foundry" ProviderConfig
* hermes_cli/models.py: duplicate "gmi" entry in _PROVIDER_MODELS
* hermes_cli/config.py: duplicate NOTION/LINEAR/AIRTABLE/TENOR skill env
  block + duplicate get_custom_provider_context_length definition
* hermes_cli/gateway.py: duplicate _setup_yuanbao
* gateway/platforms/base.py: duplicate is_host_excluded_by_no_proxy
* gateway/platforms/telegram.py: duplicate delete_message
* gateway/stream_consumer.py: duplicate _should_send_fresh_final and
  _try_fresh_final
* gateway/run.py: duplicate _parse_reasoning_command_args /
  _resolve_session_reasoning_config / _set_session_reasoning_override,
  duplicate "Drain silently when interrupted" interrupt check
* run_agent.py: duplicate HERMES_AGENT_HELP_GUIDANCE append, duplicate
  codex_message_items capture, duplicate custom_providers resolution
* tools/approval.py: duplicate HARDLINE_PATTERNS section and duplicate
  hardline call in check_dangerous_command
* tools/mcp_tool.py: duplicate _orphan_stdio_pids module-level decl
* cron/scheduler.py: duplicate "not configured/enabled" check — kept
  the new early-rejection, removed the stale late-path copy

Full-file resets to origin/main (all PR additions were duplicates of
content already on main):

* ui-tui/packages/hermes-ink/index.d.ts
* ui-tui/packages/hermes-ink/src/entry-exports.ts
* ui-tui/packages/hermes-ink/src/ink/selection.ts
* ui-tui/src/app/interfaces.ts
* ui-tui/src/app/slash/commands/core.ts
* ui-tui/src/components/thinking.tsx
* ui-tui/src/lib/memoryMonitor.ts
* ui-tui/src/types.ts
* ui-tui/src/types/hermes-ink.d.ts
* tests/hermes_cli/test_doctor.py
* tests/hermes_cli/test_api_key_providers.py
* tests/hermes_cli/test_model_validation.py
* tests/plugins/memory/test_hindsight_provider.py
* tests/run_agent/test_run_agent.py
* tests/gateway/test_email.py
* tests/tools/test_dockerfile_pid1_reaping.py
* hermes_cli/commands.py (slack_native_slashes block — full duplicate)
2026-04-29 21:56:51 -07:00
Ari Lotter
868bc1c242 feat(irc): add interactive setup
feat(gateway): refine Platform._missing_ and platform-connected dispatch

Restricts plugin-name acceptance to bundled plugin scan + registry
(no arbitrary string -> enum-pollution), pulls per-platform connectivity
checks into a _PLATFORM_CONNECTED_CHECKERS lambda map with a clean
_is_platform_connected method, and adds tests covering the checker map,
plugin platform interface, and IRC setup wizard.
2026-04-29 21:56:51 -07:00
Ari Lotter
1f1608067c feat(gateway): unify setup flows, load platforms dynamically from registry
Merge the two gateway setup paths (hermes setup gateway + hermes gateway
setup) to use a single _unified_platforms() list that merges built-in
_PLATFORMS with dynamically registered plugin entries from
platform_registry.

- Add setup_fn field to PlatformEntry for plugin setup flows
- _unified_platforms() merges built-ins with registry entries by key
- setup_gateway() now uses unified list instead of hardcoded
  _GATEWAY_PLATFORMS tuple list
- gateway_setup() uses same unified list, plugin entries appear
  alongside built-ins with no [plugin] suffix
- _platform_status() handles plugin platforms via registry check_fn
- Plugin platforms with setup_fn get called directly; plugins without
  get a generic env-var display fallback

IRC and other plugin platforms now appear automatically in the setup
menu when registered via platform_registry.register().

feat(gateway): surface disabled platform plugins in setup and auto-enable on select

Platform plugins under plugins/platforms/* (IRC, etc.) were gated behind
plugins.enabled, so `hermes gateway setup` wouldn't list them until the
user ran `hermes plugins enable <name>` first. Now the setup menu always
surfaces them as "plugin disabled — select to enable", and picking one
adds it to plugins.enabled before running its setup flow.

Along the way, unify the two gateway setup flows so `hermes setup gateway`
and `hermes gateway setup` both read from the same platform list (built-in
_PLATFORMS + platform_registry entries), dispatch through a single
_configure_platform() helper, and share _platform_status(). Deletes the
dead bespoke wrappers in setup.py (_setup_whatsapp, _setup_weixin,
_setup_email, etc.) that duplicated logic now covered by the registry
path or _setup_standard_platform.

Also:
- PlatformEntry gains a plugin_name field so the registry knows which
  plugin owns each entry (required for auto-enable).
- PluginContext.register_platform auto-stamps plugin_name from the
  manifest so plugins don't have to pass it explicitly.
- PluginManager now scans plugins/platforms/* as its own category root,
  one level below the bundled plugin scan.
- Fix IRC plugin discovery: rename PLUGIN.yaml → plugin.yaml (the
  scanner is case-sensitive) and add the missing __init__.py that
  _load_directory_module requires.
2026-04-29 21:56:51 -07:00
Teknium
e464cde58f feat: final platform plugin parity — webhook delivery, platform hints, docs
Closes remaining functional gaps and adds documentation.

webhook.py: Cross-platform delivery now checks the plugin registry
  for unknown platform names instead of hardcoding 15 names in a tuple.
  Plugin platforms can receive webhook-routed deliveries.

prompt_builder: Platform hints (system prompt LLM guidance) now fall
  back to the plugin registry's platform_hint field. Plugin platforms
  can tell the LLM 'you're on IRC, no markdown.'

PlatformEntry: Added platform_hint field for LLM guidance injection.

IRC adapter: Added acquire_scoped_lock/release_scoped_lock in
  connect/disconnect to prevent two profiles from using the same IRC
  identity. Added platform_hint for IRC-specific LLM guidance.

Removed dead token-empty-warning extension for plugin platforms
  (plugin adapters handle their own env vars via check_fn).

website/docs/developer-guide/adding-platform-adapters.md:
  - Added 'Plugin Path (Recommended)' section with full code examples,
    PLUGIN.yaml template, config.yaml examples, and a table showing all
    18 integration points the plugin system handles automatically
  - Renamed built-in checklist to clarify it's for core contributors

gateway/platforms/ADDING_A_PLATFORM.md:
  - Added Plugin Path section pointing to the reference implementation
    and full docs guide
  - Clarified built-in path is for core contributors only
2026-04-29 21:56:51 -07:00
Teknium
457128d4e8 fix: wire PII redaction + token empty warnings for plugin platforms
PII redaction: build_session_context_prompt() now checks the plugin
registry's pii_safe flag in addition to the hardcoded _PII_SAFE_PLATFORMS
frozenset. Plugin platforms that set pii_safe=True (e.g. phone-based
messaging bridges) get their user IDs redacted before LLM context.

Token empty warnings: the empty-token diagnostic at config load now
checks the plugin registry's required_env when a platform isn't in the
hardcoded _token_env_names dict. Catches 'enabled but empty' for
plugin platforms too.
2026-04-29 21:56:51 -07:00
Teknium
2e20f6ae2d feat: complete plugin platform parity — all 12 integration points
Extends the platform plugin interface from Phase 1 to cover every
touchpoint where built-in platforms have hardcoded behavior.

- allowed_users_env / allow_all_env: per-platform auth env vars
- max_message_length: smart-chunking for send_message tool
- pii_safe: session PII redaction flag
- emoji: CLI/gateway display
- allow_update_command: /update access control

send_message tool (tools/send_message_tool.py):
- Replaced hardcoded platform_map dict with Platform() call
- Added _send_via_adapter() for plugin platforms — routes through
  live gateway adapter when available
- Registry-aware max message length for smart chunking

Cron delivery (cron/scheduler.py):
- Replaced hardcoded 15-entry platform_map with Platform() call
- Plugin platforms now work as cron delivery targets

User authorization (gateway/run.py _is_user_authorized):
- Registry fallback: checks PlatformEntry.allowed_users_env and
  allow_all_env when platform not in hardcoded maps
- Plugin platforms get per-platform auth support

_UPDATE_ALLOWED_PLATFORMS: checks registry allow_update_command flag
Channel directory: includes plugin platforms in session enumeration
Orphaned config warning: descriptive message when plugin platform is
  in config but no plugin registered it
Gateway weakref: _gateway_runner_ref for cross-module adapter access

hermes status: shows plugin platforms with (plugin) tag
hermes gateway setup: plugin platforms appear in menu with setup hints
hermes_cli/platforms.py: get_all_platforms() merges with registry,
  platform_label() falls back to registry for plugin names

- 8 new tests (extended fields, cron resolution, platforms merge)
- Updated 3 tests for new Platform() based resolution
- 2829 passed, 24 pre-existing failures, zero new failures
2026-04-29 21:56:51 -07:00
Teknium
8f144fe36b feat: pluggable platform adapter registry + IRC reference implementation
Adds a platform adapter plugin interface so anyone can create new gateway
platforms (IRC, Viber, Line, etc.) as drop-in plugins without modifying
core gateway code.

- PlatformEntry dataclass: name, label, adapter_factory, check_fn,
  validate_config, required_env, install_hint, source
- PlatformRegistry singleton with register/unregister/create_adapter
- _create_adapter() in gateway/run.py checks registry first, falls
  through to existing if/elif chain for built-in platforms

- Platform._missing_() accepts unknown string values, creating cached
  pseudo-members so Platform('irc') is Platform('irc') holds true
- GatewayConfig.from_dict() now parses plugin platform names from
  config.yaml without rejecting them
- get_connected_platforms() delegates to registry for unknown platforms

- PluginContext.register_platform() for plugin authors
- Mirrors the existing register_tool() / register_hook() pattern

- Full async IRC adapter using stdlib asyncio (zero external deps)
- Connects via TLS, handles PING/PONG, nick collision, NickServ auth
- Channel messages require addressing (nick: msg), DMs always dispatch
- Markdown stripping for IRC-clean output, message splitting for
  512-byte line limit
- Config via config.yaml extra dict or IRC_* env vars

- Platform enum dynamic members (identity stability, case normalization)
- PlatformRegistry (register, unregister, create, validation, factory)
- GatewayConfig integration (from_dict parsing, get_connected_platforms)
- IRC adapter (init, send, protocol parsing, markdown, requirements)

No existing platform adapters were migrated — the if/elif chain is
untouched. This is Phase 1: prove the interface with a real plugin.
2026-04-29 21:56:51 -07:00
Teknium
4d7fc0f37c feat(gateway,cli): confirm /reload-mcp to warn about prompt cache invalidation
Reloading MCP servers rebuilds the tool set for the active session, which
invalidates the provider prompt cache (tool schemas are baked into the
system prompt). The next message re-sends full input tokens — can be
expensive on long-context or high-reasoning models.

To surface that cost, /reload-mcp now routes through a new slash-confirm
primitive with three options: Approve Once / Always Approve / Cancel.
'Always Approve' persists approvals.mcp_reload_confirm: false so future
reloads run silently.

Coverage:

* Classic CLI (cli.py) — interactive numbered prompt.
* TUI (tui_gateway + Ink ops.ts) — text warning on first call; `now` /
  `always` args skip the gate; `always` also persists the opt-out.
* Messenger gateway — button UI on Telegram (inline keyboard), Discord
  (discord.ui.View), Slack (Block Kit actions); text fallback on every
  other platform via /approve /always /cancel replies intercepted in
  gateway/run.py _handle_message.
* Config key: approvals.mcp_reload_confirm (default true).
* Auto-reload paths (CLI file watcher, TUI config-sync mtime poll) pass
  confirm=true so they do NOT prompt.

Implementation:

* tools/slash_confirm.py — module-level pending-state store used by all
  adapters and by the CLI prompt. Thread-safe register/resolve/clear.
* gateway/platforms/base.py — send_slash_confirm hook (default 'Not
  supported' → text fallback).
* gateway/run.py — _request_slash_confirm helper + text intercept in
  _handle_message (yields to in-progress tool-exec approvals so
  dangerous-command /approve still unblocks the tool thread first).

Tests:

* tests/tools/test_slash_confirm.py — primitive lifecycle + async
  resolution + double-click atomicity (16 tests).
* tests/hermes_cli/test_mcp_reload_confirm_gate.py — default-config
  shape + deep-merge preserves user opt-out (5 tests).

Targeted runs (hermetic): 89 passed (slash-confirm, config gate,
existing agent cache, existing telegram approval buttons).
2026-04-29 21:56:47 -07:00
helix4u
7fae87bc00 fix(gateway): refresh cached agents after MCP tool changes 2026-04-29 21:56:47 -07:00
memosr
d69a0b2c29 fix(security): apply ACL checks to QQBot guild messages and guild DMs to prevent allowlist bypass 2026-04-29 21:08:28 -07:00
teknium1
763aadd6bf fix(telegram): preserve pre-#17686 chat-ID-in-_USERS configs + doc split
PR #15027 (5 days ago) shipped TELEGRAM_GROUP_ALLOWED_USERS as a chat-ID
allowlist. #17686 correctly renames that to sender user IDs and moves
chat IDs to TELEGRAM_GROUP_ALLOWED_CHATS. Without a shim, any user on
PR #15027's guidance would silently start rejecting group traffic on
upgrade.

- gateway/run.py: in _is_user_authorized, if TELEGRAM_GROUP_ALLOWED_USERS
  contains values starting with '-' (chat-ID-shaped), honor them as chat
  IDs and log a one-shot deprecation warning pointing users at the new
  TELEGRAM_GROUP_ALLOWED_CHATS var.
- tests/gateway/test_unauthorized_dm_behavior.py: three new tests cover
  legacy chat-ID values authorizing the listed chat, not crossing to
  other chats, and mixed sender/chat values in the same var.
- website/docs/user-guide/messaging/telegram.md: rewrite the Group
  Allowlisting section to document the new user/chat split + migration
  note. Remove stale '/thread_id' suffix claim (code never parsed it).
- website/docs/reference/environment-variables.md: document all three
  Telegram allowlist env vars.
2026-04-29 21:07:55 -07:00
Anders Bell
1f712173b2 fix(telegram): support group user allowlist 2026-04-29 21:07:55 -07:00
teknium1
dd2d1ba5e6 refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool
Salvage-follow-up to @shannonsands's /reload-skills PR. Trims the feature to
match the design: user-initiated rescan, no prompt-cache reset, no new
schema surface, no phantom user turn, and the next-turn note carries each
added/removed skill's 60-char description (not just its name).

Changes vs the original PR:

* Drop the in-process skills prompt-cache clear in reload_skills(). Skills
  are invoked at runtime via /skill-name, skills_list, or skill_view —
  they don't need to live in the system prompt for the model to use them.
  Keeping the cache intact preserves prefix caching across the reload so
  /reload-skills pays no cache-reset cost. (MCP has to break the cache
  because tool schemas must be known at conversation start; skills do not.)

* Drop the skills_reload agent tool and SKILLS_RELOAD_SCHEMA from
  tools/skills_tool.py, plus the four skills_reload enumerations in
  toolsets.py. No new schema surface — agents can already see a freshly-
  installed skill via skill_view / skills_list the moment it's on disk.

* Replace the phantom 'role: user' turn injection with a one-shot queued
  note. CLI uses self._pending_skills_reload_note (same pattern as
  _pending_model_switch_note, prepended to the next API call and cleared).
  Gateway uses self._pending_skills_reload_notes[session_key]. The note
  is prepended to the NEXT real user message in this session, so message
  alternation stays intact and nothing out-of-band is persisted to the
  transcript.

* reload_skills() now returns added/removed as
  [{'name': str, 'description': str}, ...] (description truncated to 60
  chars — matches the curator / gateway adapter budget). The injected
  next-turn note formats each entry as 'name — description' so the model
  can actually reason about which new skills to call without running
  skills_list first.

* Only emit the note when the diff is non-empty. On empty diff, print
  'No new skills detected' and do nothing else.

* Tests rewritten to cover the queue semantics, the description payload,
  and a regression guard that the prompt-cache snapshot is preserved.
2026-04-29 21:07:47 -07:00
Shannon Sands
7966560fb5 feat(skills): /reload-skills slash command + skills_reload agent tool
Adds a public reload path for the in-process skill caches so newly
installed (or removed) skills become visible mid-session without a
gateway restart. Mirrors the shape of /reload-mcp.

Three surfaces:
* /reload-skills slash command — CLI (cli.py) and gateway (gateway/run.py),
  with /reload_skills alias for Telegram autocomplete and an explicit
  Discord registration.
* skills_reload agent tool (tools/skills_tool.py) — lets agents/subagents
  pick up freshly-installed skills via tool call.
* agent.skill_commands.reload_skills() — shared helper that clears
  _skill_commands, _SKILLS_PROMPT_CACHE (in-process LRU), and the
  on-disk .skills_prompt_snapshot.json, then returns an added/removed
  diff plus the new total count.

Tested:
* tests/agent/test_skill_commands_reload.py (9 cases)
* tests/cli/test_cli_reload_skills.py       (3 cases)
* tests/gateway/test_reload_skills_command.py (4 cases)

Use case: NemoClaw / OpenShell-style sandboxed orchestrators that drop
skills into ~/.hermes/skills mid-session, plus agentic flows where the
agent itself installs a skill via the shell tool and needs it bound
without a gateway restart. The Python helper
clear_skills_system_prompt_cache(clear_snapshot=True) already exists
internally — this PR just exposes it via slash command and tool.
2026-04-29 21:07:47 -07:00
Teknium
31f70d1f2a
fix(ci): recover 38 failing tests on main (#17642)
CI Tests workflow has been red on main for 40+ consecutive runs. This
commit recovers every failure visible in run 25130722163 (most recent
completed run prior to this PR).

Root causes, by group:

Test-mock drift after product landed (fix: update mocks)
- test_mcp_structured_content / test_mcp_dynamic_discovery (6 tests):
  product added _rpc_lock (#02ae15222) and _schedule_tools_refresh
  (#1350d12b0) without updating sibling test files. Install a real
  asyncio.Lock inside the fake run-loop and patch at _schedule_tools_refresh.
- test_session.py: renamed normalize_whatsapp_identifier → canonical_
  whatsapp_identifier upstream; keep a local alias so the legacy tests
  keep working.
- test_run_progress_topics Slack DM test: PR #8006 made Slack default
  tool_progress=off; explicitly set it to 'all' in the test fixture so
  the progress-callback path still runs. Also read tool_progress_callback
  at call time rather than freezing it in FakeAgent.__init__ — production
  assigns it AFTER construction.
- test_tui_gateway_server session-create/close race: session.create now
  defers _start_agent_build behind a 50ms timer — wait for the build
  thread to enter _make_agent before closing, otherwise the orphan-
  cleanup path never runs.
- test_protocol session.resume: product get_messages_as_conversation now
  takes include_ancestors kwarg; accept **_kwargs in the test stub.
- test_copilot_acp_client redaction: redactor is OFF by default (snapshots
  HERMES_REDACT_SECRETS at import); patch agent.redact._REDACT_ENABLED=True
  for the duration of the test.
- test_minimax_provider: after #17171, dots in non-Anthropic model names
  stay dots even with preserve_dots=False. Assert the new invariant
  rather than the old 'broken for MiniMax' behavior.
- test_update_autostash: updater now scans `ps -A` for dashboard PIDs;
  the test's catch-all subprocess.run stub needed stdout/stderr fields.
- test_accretion_caps: read_timestamps dict is populated lazily when
  os.path.getmtime succeeds. Use .get("read_timestamps", {}) to tolerate
  CI filesystems where the stat races file creation.

Change-detector tests (fix: rewrite as structural invariants)
- test_credential_sources_registry_has_expected_steps: was a frozen set
  comparison that broke when minimax-oauth was added. Rewrite as an
  invariant check (every step has description, no dupes, core steps
  present) per AGENTS.md 'don't write change-detector tests'.

xdist ordering / test pollution (fix: reset state, use module-local patches)
- test_setup vercel: sibling test saved VERCEL_PROJECT_ID='project' to
  os.environ via save_env_value() and never cleared it. monkeypatch.delenv
  the VERCEL_* vars in the link-file test.
- test_clipboard TestIsWsl: GitHub Actions is on Azure VMs whose real
  /proc/version often contains 'microsoft'. Patching builtins.open with
  mock_open didn't reliably intercept hermes_constants.is_wsl's call in
  xdist workers that had already cached _wsl_detected=True from an
  earlier test. Patch hermes_constants.open directly and add
  teardown_method to reset the cache after each test.

Pytest-asyncio cancellation hangs (fix: bound product await with timeout)
- test_session_split_brain_11016 (3 params) + test_gateway_shutdown
  cancel-inflight: under pytest-asyncio 1.3.0, 'await task' and
  'asyncio.gather(cancelled_tasks)' can stall for 30s when the cancelled
  task's finally block awaits typing-task cleanup. Bound both with
  asyncio.wait_for(..., timeout=5.0) and asyncio.shield — the stragglers
  are released from adapter tracking and allowed to finish unwinding in
  the background. This is also a legitimate hardening: a wedged finally
  shouldn't stall the caller's dispatch or a gateway shutdown.

Orphan UI config (fix: merge tiny tab into messaging category)
- test_web_server test_no_single_field_categories: the telegram.reactions
  config field lived in its own 'telegram' schema category with no
  siblings. Fold it under 'discord' via _CATEGORY_MERGE so the dashboard
  doesn't render an orphan single-field tab.

Local verification: 38/38 originally-failing tests pass; 4044/4044
gateway tests pass; 684/684 targeted subset (all 16 touched test files)
passes.
2026-04-29 20:05:32 -07:00
briandevans
e0a03f3f40 fix(api-server): collapse tool start/lifecycle into a single SSE event
Address Copilot review on PR #16666:

1. **Duplicate event on every tool start** — both ``tool_progress_callback``
   and ``tool_start_callback`` fire side-by-side in ``run_agent.py``, so
   wiring both into chat completions emitted *two* ``hermes.tool.progress``
   events per real tool call. Drop the legacy ``_on_tool_progress`` emit
   entirely; ``_on_tool_start`` now produces a single unified event that
   carries the legacy ``tool``/``emoji``/``label`` fields plus the new
   ``toolCallId``/``status`` correlation fields. Label is computed inline
   via ``build_tool_preview`` so callers do not need to pre-format it.

2. **Weak per-event correlation in the regression test** — the previous
   assertion checked that a ``toolCallId`` appeared *somewhere* in the
   aggregate, which would have passed even if ``running`` lacked the id.
   Collect ``(status, toolCallId)`` per event and assert each event
   carries the correct pair, plus exactly two events on the wire (no
   silent duplication regression).

The two existing chat-completions tool-progress tests are updated to fire
``tool_start_callback`` instead of ``tool_progress_callback``, matching
production reality where ``run_agent`` always pairs them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:08:16 -07:00
Scott Trinh
5a1d4f6804 feat: add Vercel Sandbox backend
Adds Vercel Sandbox as a supported Hermes terminal backend alongside
existing providers (Local, Docker, Modal, SSH, Daytona, Singularity).

Uses the Vercel Python SDK to create/manage cloud microVMs, supports
snapshot-based filesystem persistence keyed by task_id, and integrates
with the existing BaseEnvironment shell contract and FileSyncManager
for credential/skill syncing.

Based on #17127 by @scotttrinh, cherry-picked onto current main.
2026-04-29 07:22:33 -07:00
Magaav
810d98e892 feat(api_server): expose run status for external UIs (#17085)
Adds two API server endpoints for external UIs and orchestrators:

- GET /v1/capabilities — machine-readable feature discovery so clients
  can detect which Runs API / SSE / auth features this Hermes version
  supports before depending on them.
- GET /v1/runs/{run_id} — pollable run status so dashboards can check
  queued/running/completed/failed/cancelled/stopping state without
  holding an SSE connection open.

Also moves request validation ahead of run allocation so invalid
payloads no longer leave orphaned entries in _run_streams waiting for
the TTL sweep.

task_id is intentionally kept as "default" for the Runs API to
preserve the shared-sandbox model used by CLI, gateway, and the
existing _run_agent_with_callbacks path. session_id is surfaced in
run status for external-UI correlation only.

Salvage of PR #17085 by @Magaav.
2026-04-29 06:38:10 -07:00
Teknium
f317325279
docs(weixin): clarify iLink bot identity limits and warn on group policy (#17433)
QR-login connects an iLink bot identity (...@im.bot), not a scriptable
personal WeChat account. iLink typically does not deliver ordinary WeChat
group events to these bots, so WEIXIN_GROUP_POLICY / WEIXIN_GROUP_ALLOWED_USERS
often have no effect regardless of value.

- Setup wizard: print iLink-bot caveat before the group-policy prompt; relabel
  the allowlist input as 'group chat IDs (not member user IDs)'; note that
  'open' / 'allowlist' only take effect if iLink delivers group events.
- Adapter: log a WARNING at connect() when WEIXIN_GROUP_POLICY is non-disabled
  so the limitation is surfaced in gateway logs, not just docs.
- Docs: add a top-of-page warning callout to weixin.md explaining the iLink
  bot identity, narrow the 'DM and group messaging' feature line to DM-only
  with a group caveat, tighten the Group Policy section and troubleshooting
  row, and clarify WEIXIN_GROUP_ALLOWED_USERS as group IDs (not user IDs)
  in weixin.md and environment-variables.md.

Closes #17094
2026-04-29 06:26:10 -07:00
vominh1919
e9b96fd050 fix: recognize ret=-2 as stale-session signal in Weixin adapter
The Weixin adapter only recognized errcode=-14 as a session-expired
signal. However, iLink also returns ret=-2 with errmsg="unknown error"
for the same underlying condition (stale session). The adapter treated
ret=-2 as a rate-limit, exhausting retries with the same stale
context_token instead of refreshing the session.

Added _is_stale_session_ret() helper that distinguishes ret=-2 with
"unknown error" from genuine rate limits. Updated both the poll loop
and _send_text_chunk to use the helper.

Fixes NousResearch/hermes-agent#17228
2026-04-29 05:44:44 -07:00
tmimmanuel
3606414ec7 fix(gateway): isolate platform connect failures with per-platform timeout
Wrap each adapter.connect() in asyncio.wait_for() so one platform hanging
during startup or reconnect cannot block the others. Telegram's 8-retry
connect loop (~140s worst case) previously prevented Feishu from ever
starting when Telegram was network-restricted — common for users in
regions where Telegram is blocked.

Default timeout is 30s; override via HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT
(0 disables). Applied to both startup and the reconnect watcher so a
platform that hangs mid-retry also does not stall retries for others.

Fixes #17242
2026-04-29 05:00:37 -07:00
Teknium
13683c0842
feat(memory): notify providers on mid-process session_id rotation (#17409)
Fixes #6672

Memory providers now receive on_session_switch() whenever AIAgent.session_id
rotates mid-process — /resume, /branch, /reset, /new, and context
compression. Before this, providers that cached per-session state in
initialize() (Hindsight's _session_id, _document_id, accumulated
_session_turns, _turn_counter) kept writing into the old session's
record after the agent had moved on.

MemoryProvider ABC
------------------
- New optional hook on_session_switch(new_session_id, *,
  parent_session_id='', reset=False, **kwargs) with no-op default for
  backward compat. reset=True signals /reset or /new — providers should
  flush accumulated per-session buffers. reset=False for /resume,
  /branch, compression where the logical conversation continues.

MemoryManager
-------------
- on_session_switch() fans the hook out to every registered provider.
  Isolated try/except per provider — one bad provider can't block others.
- Empty/None new_session_id is a no-op to avoid corrupting provider state
  during shutdown paths.

run_agent.py
------------
- _sync_external_memory_for_turn now passes session_id=self.session_id
  into sync_all() and queue_prefetch_all(). Providers with defensive
  session_id updates in sync_turn (Hindsight already had this at
  plugins/memory/hindsight/__init__.py:1199) now actually receive the
  current id.
- Compression block at ~L8884 already notified the context engine of
  the rollover; now also calls
  _memory_manager.on_session_switch(reason='compression').

cli.py
------
- new_session() fires reset=True, reason='new_session' so providers
  flush buffers.
- _handle_resume_command fires reset=False, reason='resume' with the
  previous session as parent_session_id.
- _handle_branch_command fires reset=False, reason='branch' with the
  parent session_id already captured for the DB parent link.

gateway/run.py
--------------
- _handle_resume_command now evicts the cached AIAgent, mirroring
  /branch and /reset. The next message rebuilds a fresh agent whose
  memory provider initialize() runs with the correct session_id —
  matches the pattern the gateway already uses for provider state
  cross-session transitions.

Hindsight reference implementation
----------------------------------
- plugins/memory/hindsight/__init__.py adds on_session_switch that:
  updates _session_id, mints a fresh _document_id (prevents
  vectorize-io/hindsight#1303 overwrite), and clears _session_turns /
  _turn_counter / _turn_index so in-flight batches don't flush under
  the new document id. parent_session_id only overwritten when provided
  (avoids clobbering on a bare switch).

Tests
-----
- tests/agent/test_memory_session_switch.py: new dedicated file. ABC
  default no-op, manager fan-out, failure isolation, empty-id no-op,
  session_id propagation through sync_all/queue_prefetch_all, Hindsight
  state transitions for every reset/non-reset case, parent preservation.
- tests/cli/test_branch_command.py: new test verifying /branch fires
  the hook with correct parent_session_id + reset=False + reason.
- tests/gateway/test_resume_command.py: new test verifying /resume
  evicts the cached agent.
- tests/run_agent/test_memory_sync_interrupted.py: updated existing
  assertions to account for the session_id kwarg on sync_all and
  queue_prefetch_all.

E2E verified (real imports, tmp HERMES_HOME):
- /resume: session_id updates, doc_id fresh, buffers cleared, parent set
- /branch: session_id forks, parent links to original
- /new: reset=True clears accumulated state
- compression: reason='compression' propagated, lineage preserved
- Empty id: no-op, state preserved
- Legacy provider without on_session_switch: no crash

Reported by @nicoloboschi (Hindsight maintainer); related scope-widening
comment by @kidonng extending coverage to compression.
2026-04-29 04:57:22 -07:00
teknium1
4a62ba9ccd fix(signal): correct SPOILER docstring + AUTHOR_MAP for exiao
- _markdown_to_signal docstring claimed SPOILER support but the regex list
  never handled ``||...||``. Correct the docstring to match the four
  actually-supported styles (BOLD / ITALIC / STRIKETHROUGH / MONOSPACE).
  Signal's SPOILER bodyRange would need dedicated ``||spoiler||`` parsing
  and is left for a follow-up.

- scripts/release.py: add exiao's noreply email to AUTHOR_MAP so the
  contributor-attribution gate accepts their cherry-picked commit.
2026-04-29 04:38:17 -07:00
exiao
23f5fc6765 feat(gateway/signal): native formatting, reply quotes, and reactions
Three Signal adapter improvements that depend on the no-edit-mode
plumbing from the previous commit.

1. Native formatting (markdown -> Signal bodyRanges)
   Signal renders markdown as literal characters (**bold**, `code`, #
   heading), which looks broken. Added _markdown_to_signal(text) that
   strips markdown syntax and emits Signal-native bodyRanges as
   start:length:STYLE entries. Offsets are computed in UTF-16 code
   units so non-BMP emoji stay aligned. Supports BOLD, ITALIC, STRIKE,
   MONO, and headings mapped to BOLD. Fenced code and inline code are
   handled; link syntax is unwrapped to visible text + URL.

   Includes edge-case fixes reported previously:
   - Bullet lists ("* item") no longer misidentified as italics
   - URLs containing underscores no longer italicized around the dot

2. Reply-quote context
   Parses dataMessage.quote on inbound messages and populates
   MessageEvent.raw_message with sender + timestamp_ms. This lets the
   gateway's existing [Replying to: "..."] injector (gateway/run.py)
   work on Signal, matching Telegram/Matrix behavior.

3. Processing reactions
   Overrides on_processing_start -> hourglass and on_processing_complete
   -> checkmark via the sendReaction JSON-RPC using targetAuthor and
   targetTimestamp pulled from raw_message. Uses the ProcessingOutcome
   enum introduced in the previous commit.

Also sets SUPPORTS_MESSAGE_EDITING = False on SignalAdapter so the
no-edit streaming path activates.

Tests: 40+ new tests in tests/gateway/test_signal_format.py covering
markdown conversion, UTF-16 offset correctness with non-BMP emoji,
bullet-list and URL false-positive regressions, reply-quote extraction,
and reaction payload shape. Regression extensions to test_signal.py.
2026-04-29 04:38:17 -07:00
Ben Barclay
58a6171bfb
Merge pull request #17305 from NousResearch/feat/docker-run-as-host-user
feat(docker): run container as host user to avoid root-owned bind mounts
2026-04-29 16:41:55 +10:00
Teknium
2d137074a3
refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304)
The "cfg.get('X', {}).get('Y', default)" pattern appears 50+ times
across tools/, gateway/, and plugins/. Each call site manually handles
the same three gotchas:

  1. Missing intermediate key → empty dict → chain works
  2. Non-dict value at intermediate position → AttributeError
     (uncaught in most sites, so a misconfigured YAML crashes the tool)
  3. cfg is None → AttributeError

Introduces cfg_get(cfg, *keys, default=None) in hermes_cli/config.py
as the canonical helper. Handles all three uniformly, returns default
only when the final key is *absent* (matches dict.get semantics —
explicit None values are preserved, falsy values like 0 / False / ''
are preserved).

Named cfg_get rather than cfg_path to avoid shadowing the existing
'cfg_path = _hermes_home / "config.yaml"' local variable that appears
in gateway/run.py, cron/scheduler.py, hermes_cli/main.py, etc.

Migrated 20 call sites as the first-batch proof-of-value:

  gateway/run.py            10 sites (agent/display subtrees)
  tools/browser_tool.py      3 sites
  tools/vision_tools.py      2 sites
  tools/browser_camofox.py   1 site
  tools/approval.py          1 site
  tools/skills_tool.py       1 site
  tools/skill_manager_tool.py 1 site
  tools/credential_files.py  1 site
  tools/env_passthrough.py   1 site

The remaining ~30 sites across plugins/ and smaller tool files can be
migrated opportunistically — the helper is now available and the
pattern is established.

Fixed a latent bug along the way: tools/vision_tools.py had its
cfg_get usage at line 560 inside a function that locally re-imports
'from hermes_cli.config import load_config', but the AST-based
migration script wrote the top-level cfg_get import to a different
function scope, leaving line 560's cfg_get as a NameError silently
swallowed by the surrounding try/except. Test
test_vision_uses_configured_temperature_and_timeout caught it. Fixed
by including cfg_get in the function-local import.

Verified:
- 7880/7893 tests/tools/ + tests/gateway/ + tests/hermes_cli/test_config
  tests pass; all 13 failures pre-existing on main (MCP, delegate,
  session_split_brain — verified earlier in the sweep).
- All 20 migrated sites AST-verified to have cfg_get in scope (either
  module-level or function-local).
- Live 'hermes chat' smoke: 2 turns + /model switch + tool calls +
  /quit, zero errors. Agent correctly counted 20 cfg_get hits across
  8 tool files — matching the migration.

Semantic parity verified against the original pattern across 8 edge
cases (missing keys, None values, falsy values, empty strings, string
instead of dict, None cfg, nested levels).
2026-04-28 23:17:39 -07:00
Ben
5531c0df82 feat(docker): run container as host user to avoid root-owned bind mounts
Add opt-in terminal.docker_run_as_host_user config flag that passes
--user $(id -u):$(id -g) to the Docker backend so files written into
bind-mounted directories (/workspace, /root, docker_volumes entries) are
owned by the host user instead of root.

When enabled on POSIX platforms, also drops SETUID/SETGID caps since the
container no longer needs gosu/su to switch users.  Falls back cleanly on
platforms without os.getuid (e.g. native Windows Docker) with a warning.

Wired through all three config.yaml -> TERMINAL_* env-var bridges:
  - cli.py env_mappings        (CLI + TUI startup)
  - gateway/run.py _terminal_env_map (gateway / messaging platforms)
  - hermes_cli/config.py _config_to_env_sync (`hermes config set`)

Also fixes docker_mount_cwd_to_workspace silently failing in gateway
mode -- it was missing from gateway/run.py's _terminal_env_map.

Adds tests/tools/test_terminal_config_env_sync.py to guard against
future drift between the three bridges (same bug class shipped twice
in one month).

Bundled Hermes image won't work with this flag since its entrypoint
expects to start as root for the usermod/gosu hermes flow; works with
the default nikolaik/python-nodejs image and plain Debian/Ubuntu.
2026-04-29 16:16:43 +10:00
Teknium
019d4c1c3f feat(curator): hook into the gateway's cron-ticker thread
Long-running gateways need the curator to fire on cadence without
restarts. Piggy-back on the existing cron ticker thread (which already
runs image/document cache cleanup every hour on the same pattern)
instead of spawning a dedicated timer thread.

- New CURATOR_EVERY = 60 ticks (poll hourly at default 60s interval).
  The inner config.interval_hours gate controls the real cadence, so
  60 of these 60 hourly pokes are cheap no-ops and one runs the review.
- Removed the boot-time call added in the prior commit — the ticker
  covers boot + every hour thereafter. Avoids double-running.

Handles the weekly-default-on-24/7-gateway gap flagged in review.
2026-04-28 22:33:33 -07:00
Teknium
bc79e227e6 feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.

Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).

Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
  .hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
  via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache

New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
  provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
  transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
  pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests

Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
  patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
  register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)

Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end

Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.

Refs #7816.
2026-04-28 22:33:33 -07:00
Lyle Lengyel
80e474f11f fix(gateway,terminal): expand shell tilde in terminal.cwd before subprocess
Commit 3c42064e made config.yaml the single source of truth for
TERMINAL_CWD, but the config bridge passes cwd values verbatim to
os.environ. When a user sets terminal.cwd: ~/ in config.yaml, the
literal string '~/'' reaches subprocess.Popen, which the kernel
rejects because it does not expand shell tilde syntax.

This patch adds three defensive layers:

1. gateway/run.py — expanduser at config bridge time so TERMINAL_CWD
   is always an absolute path.

2. tools/terminal_tool.py — expanduser when reading TERMINAL_CWD in
   _get_env_config(), guarding against stale or manually-set env vars.

3. tools/environments/local.py — expanduser in LocalEnvironment before
   passing cwd to subprocess.Popen, the final safety net.

Includes regression tests in test_config_cwd_bridge.py for nested
terminal.cwd, top-level cwd alias, and precedence ordering.

Refs: 3c42064e
2026-04-28 22:26:09 -07:00
Teknium
dcd7b717f8
fix(gateway): linearize tool-progress bubbles with content messages (#17280)
After PR #7885 (97b0cd51e) added content-side segment breaks for
natural mid-turn assistant messages, the tool-progress task in
gateway/run.py was not updated to match. progress_msg_id and
progress_lines persisted for the whole run, so after a tool batch
produced bubble B1 followed by content bubble C1, the next tool.started
kept editing the OLD bubble B1 above C1 — making the chat appear out
of order on Telegram, Discord, and Slack.

Add on_new_message callback to GatewayStreamConsumer, fired at the
four sites where a fresh content bubble lands on the platform:
  - _send_or_edit first-send branch (NOT edits)
  - _send_commentary
  - _send_new_chunk (overflow split)
  - each successful chunk of _send_fallback_final

Gateway supplies a lambda that enqueues ('__reset__',) into the
progress_queue. send_progress_messages() handles the marker in both
the main loop and the CancelledError drain path, clearing
progress_msg_id, progress_lines, and the dedup state so the next
tool.started opens a fresh bubble below the new content.

Result: each tool batch appears in chronological order below the
preceding content. When no content appears between tool batches,
tools still group in one bubble (CLI-style compactness).

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 22:17:33 -07:00
Rugved Somwanshi
214ca943ac feat(agent): add lmstudio integration 2026-04-28 12:27:36 -07:00
Teknium
b53a091b97
remove: BOOT.md built-in hook (#17093)
BOOT.md was merged in PR #3733 before the feature was ready — the
built-in hook spawned a bare AIAgent() with no model/runtime kwargs,
which immediately 401s on any provider with a custom endpoint. Three
separate community PRs (#5240, #12514, #14992) tried to paper over it.

Remove the BOOT.md hook entirely and its user-facing docs/tips. Keep
the gateway/builtin_hooks/ package and the HookRegistry._register_builtin_hooks()
hook-point intact as the extension surface for future always-on
gateway hooks.

Closes #5239.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 09:50:27 -07:00
Teknium
b5128a751b
perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage (#17046)
* perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage

Four heavy SDK/module imports are now deferred off the hot startup path.
Net savings on cold module imports:

  cli                       1200 → 958 ms  (-242)
  run_agent                 1220 → 901 ms  (-319)
  tools.web_tools            711 → 423 ms  (-288)
  agent.anthropic_adapter    230 →  15 ms  (-215)
  agent.auxiliary_client     253 →  68 ms  (-185)

Four independent changes in one PR since they all use the same pattern
and share the same risk profile (heavy SDK import → lazy proxy or
function-local import):

1. tools/web_tools.py:
   'from firecrawl import Firecrawl' moved into _get_firecrawl_client(),
   which is only called when backend='firecrawl'. Users on Exa/Tavily/
   Parallel pay zero firecrawl cost.

2. cli.py + gateway/run.py:
   'from agent.account_usage import ...' moved into the /limits handlers.
   account_usage transitively pulls the OpenAI SDK chain; only needed
   when the user runs /limits.

3. agent/anthropic_adapter.py:
   'try: import anthropic as _anthropic_sdk' replaced with a cached
   '_get_anthropic_sdk()' accessor. The three usage sites
   (build_anthropic_client, build_anthropic_bedrock_client,
   read_claude_code_credentials_from_keychain) now resolve via the
   accessor. All pre-existing test patches of
   'agent.anthropic_adapter._anthropic_sdk' keep working because the
   accessor respects any value already in module globals.

4. agent/auxiliary_client.py AND run_agent.py:
   'from openai import OpenAI' replaced with an '_OpenAIProxy()' module-
   level object that looks like the OpenAI class but imports the SDK on
   first call/isinstance check. This preserves:
     - 15+ in-module OpenAI(...) construction sites in auxiliary_client
       and the single site in run_agent's _create_openai_client (Python's
       function-scope name lookup finds the proxy, forwards the call);
     - 'patch("agent.auxiliary_client.OpenAI", ...)' and
       'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test
       files (patch replaces the module attribute as usual).
   Tried two alternatives first:
     - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py
       (the audit's hypothesis here was wrong).
     - Module-level __getattr__ — works for external access but Python
       function-scope name resolution skips __getattr__, so in-module
       OpenAI(...) calls NameError.

Note: 'openai' still loads on 'import cli' because
cli.py -> neuter_async_httpx_del() -> openai._base_client, and
run_agent.py -> code_execution_tool.py (module-level
build_execute_code_schema) -> _load_config() -> 'from cli import
CLI_CONFIG'. Deferring those is a separate, larger change — out of scope
for this PR. The savings above all come from avoiding the openai/*,
anthropic/*, and firecrawl/* top-level type-tree imports on paths that
don't need them.

Verified:
- 302/302 tests in tests/agent/{test_anthropic_adapter,
  test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain}
  pass. Two pre-existing failures on main unchanged.
- 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail).
- 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py,
  test_plugin_context_engine_init.py, test_invalid_context_length_warning.py,
  test_api_max_retries_config.py,
  tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py
  pass (1 pre-existing fail).
- Live hermes chat smoke: 2 turns + /model switch + tool calls, zero
  errors in the 57-line agent.log window.
- Module-level import of run_agent + auxiliary_client + anthropic_adapter
  no longer pulls 'anthropic' or 'firecrawl' at all.

* fix(gateway): restore top-level account_usage import for test-patch surface

CI caught two failures in tests/gateway/test_usage_command.py that I
missed locally:

    AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage'

The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...)
to inject a fake account-fetch call. Moving the import inside the
handler deleted that module-level attribute, breaking the patch surface.

Restoring the top-level import in gateway/run.py gives up the ~230 ms
gateway-boot savings from that one lazy, but:

  1. the gateway is a long-running daemon — boot cost is paid once per
     install, not per turn;
  2. the other four lazy-imports (firecrawl, openai, anthropic, cli's
     account_usage) remain in place and still account for the bulk of
     the savings reported in the PR body;
  3. preserving the patch surface keeps the established
     'gateway.run.fetch_account_usage' monkeypatch pattern working
     without touching tests.

Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed.
Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent):
2332 passed, 4 failed — all 4 pre-existing on main.

---------

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 09:38:42 -07:00
Teknium
df51ad7973
perf(config): mtime-cache load_config() and read_raw_config() (#17041)
load_config() and read_raw_config() now cache their result keyed on
the config file's (mtime_ns, size). On cache hit they return a deepcopy
of the cached value, skipping yaml.safe_load + deep-merge + normalize +
env-var expansion entirely. save_config() + migrate_config() write via
atomic_yaml_write which produces a fresh inode, so stat() sees a new
mtime_ns and the next load repopulates automatically — no explicit
invalidation hook needed.

Measured per-call cost:
  load_config() cold:   13.3 ms
  load_config() cached:  0.23 ms  (57x faster)
  read_raw_config() cached: 0.13 ms

A single gateway turn hits the config 5-15 times (session context,
auxiliary client resolution, memory config, plugin hooks, approval
lookups, per-tool settings). That's 65-200 ms/turn of pure YAML
re-parsing on main. After this change: 1-3 ms/turn.

Also migrates gateway/run.py's 6 direct yaml.safe_load(config.yaml)
call sites through _load_gateway_config, which now shares the
read_raw_config cache when _hermes_home agrees with the canonical
config path. The direct-read fallback is retained for tests that
monkeypatch gateway_run._hermes_home without touching HERMES_HOME.

Safety:
- load_config() returns a deepcopy on every call; the 67+ call sites
  that mutate the result (cfg["model"]["default"] = ..., etc.) can't
  corrupt the cache.
- save_config() / atomic_yaml_write bump mtime, naturally invalidating
  the cache for the next reader.
- Cache is keyed on str(config_path), so HERMES_HOME profile switches
  don't collide.

Verified:
- 112 config tests pass (test_config, test_config_env_expansion,
  test_config_env_refs, test_config_drift, test_config_validation,
  test_aux_config).
- 87 gateway tests pass (test_verbose_command, test_session_info,
  test_compress_focus, test_runtime_footer, test_resume_command,
  test_reasoning_command, test_approve_deny_commands,
  test_run_progress_interrupt).
- Live hermes chat smoke — 2 turns + /model switch + tool calls,
  zero errors in agent.log.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 07:06:35 -07:00
Teknium
e0f5d39837
fix(discord): widen slash-sync timeout to 600s under rate-limit pressure (#16713) (#17029)
Discord's per-app command-management bucket is ~5 writes / 20 s. A
mass-prune-plus-upsert reconcile (77 orphans + 30 desired = 107 writes
in the reported case) can't finish under the old flat 30 s budget, and
the subsequent reconnect retries inside the rate-limit cooldown also
time out — leaving slash commands broken for ~60 min until the bucket
fully recovers.

Bump the timeout to 600 s so realistic bursts drain, update the warning
message to point at the saturated bucket instead of a hardcoded 30 s.
The 600 s cap still guards against a true hang.

Credit to @Tranquil-Flow for PR #16739 and @davidbordenwi for reporting
#16713 with the bucket-math diagnosis.

Closes #16713.

Co-authored-by: Teknium <teknium@nousresearch.com>
2026-04-28 07:02:43 -07:00
konsisumer
e4b69bf149 fix(gateway): guard against None request_overrides in _build_api_kwargs 2026-04-28 06:57:23 -07:00
Teknium
e123f4ecf0
feat(gateway): opt-in runtime-metadata footer on final replies (#17026)
Append a compact 'model · 68% · ~/projects/hermes' footer to the FINAL
message of each turn, disabled by default (display.runtime_footer.enabled).
Answers the Telegram-side parity ask: runtime context that the CLI status
bar already shows is now available in messaging replies when enabled.

Wiring:
- gateway/runtime_footer.py: resolve_footer_config + format_runtime_footer +
  build_footer_line. Pure-function renderer; per-platform overrides under
  display.platforms.<platform>.runtime_footer.
- gateway/run.py: appends footer to response right after reasoning prepend
  so it lands only on the final message (never tool progress or streaming
  chunks). When streaming already delivered the body (already_sent), the
  footer is sent as a small trailing message instead.
- agent_result now exposes context_length alongside last_prompt_tokens so
  the footer can compute the pct; both gateway return paths updated.
- /footer [on|off|status] slash command, wired in CLI (cli.py) and gateway
  (gateway/run.py both running-agent bypass and main dispatch). Global
  toggle only; per-platform overrides via config.yaml.

Graceful degradation:
- Missing context_length (unknown model) → pct field silently dropped
  (no '?%' artifact).
- Empty final_response → no footer appended.
- Unknown field names in config → silently ignored.

Tests: 25-case unit suite (tests/gateway/test_runtime_footer.py) plus E2E
harness covering streaming vs non-streaming branches, per-platform override,
and the exact argument contract gateway/run.py uses.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 06:50:04 -07:00