Commit graph

1108 commits

Author SHA1 Message Date
kshitijk4poor
100e7be20e fix(security): deny root-level credential stores in media delivery
The media-delivery denylist in gateway/platforms/base.py enumerated only
.env/auth.json/credentials/config.yaml under HERMES_HOME, so other
credential stores that live at the root fell through and could be
auto-attached to chat replies. The reported case: the Google Workspace
skill's google_token.json refreshes every turn, bumping its mtime to
'now', which kept passing the strict-mode recency window and re-sent the
OAuth token on every reply.

Extend the explicit per-file denylist to mirror the canonical credential
set already enforced by the read/write guards in agent/file_safety.py:
google_token.json, google_oauth_pending.json, auth/google_oauth.json,
.anthropic_oauth.json, webhook_subscriptions.json, cache/bws_cache.json,
auth.lock, and the pairing/ token directory.

Targeted per-file additions (not a blanket ~/.hermes deny, which was
declined in #32090/#34425 because it would block skills/, logs/, and
ad-hoc agent-written deliverables). mcp-tokens/ (#37222) and
state.db/kanban.db (#41071) are left to their sibling targeted PRs.

Reported-by: xxxigm (#50912)
2026-06-23 02:56:48 +05:30
teknium
e9cd8c5bf3 fix(delivery): drop env-var knob, flag all chunking adapters
Follow-up to ScotterMonk's cron-truncation fix:

- Remove HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var. Behavioral config
  belongs in config.yaml, not a new HERMES_* env var (.env is secrets
  only). The actual bug is fixed entirely by the adapter-aware skip; the
  configurable cap was unneeded scope. MAX_PLATFORM_OUTPUT is a constant
  again, collapsing the max_output=0 disable branch and the
  audit-vs-truncation threshold divergence.
- Flag the remaining verified-chunking adapters (slack, matrix, feishu,
  mattermost, teams, whatsapp, whatsapp_cloud, weixin, bluebubbles,
  yuanbao) with splits_long_messages=True so the fix covers the whole
  bug class, not just Discord/Telegram. Each verified to chunk in its
  own send() via truncate_message().
- SMS deliberately left False: it chunks for normal replies but a
  multi-segment cron blast is cost-bearing; the 4000-cap + file save is
  the safer default there.
- Update tests: drop the two env-override tests, add a test asserting a
  save failure during truncation (non-chunking) propagates.
2026-06-22 05:41:22 -07:00
ScotterMonk
86e4521cb1 fix(delivery): make cron output truncation configurable + adapter-aware
Gateway-level truncation (MAX_PLATFORM_OUTPUT=4000) was pre-empting
adapter-side message splitting. Discord and Telegram both chunk long
content natively in their send() via truncate_message(), but the
delivery router truncated to 3800 chars + footer before the adapter
ever saw the full payload — so long cron output was cut short instead
of being delivered as multiple messages (issue #50126).

Changes:
- HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var makes the cap configurable
  (default 4000, backward compatible). Set to 0 to disable truncation.
- TRUNCATED_VISIBLE (3800) removed — visible portion now derived
  dynamically from max_output minus the actual footer length.
- New BasePlatformAdapter.splits_long_messages capability flag (default
  False). Adapters that chunk in send() set True; delivery skips
  truncation for them but still saves full output to disk as audit.
- Flagged Discord and Telegram (both verified to chunk in send()).

Fixes #50126
2026-06-22 05:41:22 -07:00
kshitij
1f28b1a9b9
fix(gateway): redact credentials from approval prompts before sending to clients (#48456) (#50767)
Tirith redacts its own findings, but the approval-request callbacks built the
operator prompt from the RAW command string, so a credential-shaped value
Tirith flagged was sent verbatim to clients, undoing the redaction one layer up.

Two egress transports carried the leak; both are fixed via a shared
module-level seam _redact_approval_command() (redact_sensitive_text force=True):
  1. chat platforms — _approval_notify_sync (gateway/run.py): redact before
     both the button path (send_exec_approval) and the plain-text /approve
     fallback.
  2. SSE/API stream — _approval_notify (gateway/platforms/api_server.py):
     redact event['command'] before it is enqueued to API/desktop clients.
     (whole-bug-class: sibling call path on a separate transport.)

force=True so the prompt — a hard secret-egress boundary — honors redaction
even when security.redact_secrets is off. Clean commands pass through unchanged.

Tests bind the seam (synthetic credential-format fixtures, force-when-disabled) AND assert
BOTH callbacks ASSIGN the redacted result before the send/enqueue sink, via an
AST contract that rejects a discarded-result call. All mutation-checked.
2026-06-22 11:39:45 +00:00
teknium1
4314d451ca fix(gateway): accept any inbound file type across all messaging platforms
Authorization to message the agent is the gate, not the file extension.
Previously the inbound-attachment allowlist (SUPPORTED_DOCUMENT_TYPES) was
opt-OUT on Discord (allow_any_attachment defaulted false) and had no bypass
at all on Telegram/Slack — so an .html (or any non-allowlisted type) was
dropped or hard-rejected before the agent saw it.

Now every authorized upload is cached and surfaced to the agent regardless
of type:
- base.cache_media_bytes(): unknown types cache as octet-stream (or the
  caller-supplied MIME) instead of returning None — fixes the chokepoint
  that Teams/Telegram-media route through.
- discord/telegram/slack adapters: removed the allowlist reject/skip; any
  non-media attachment is typed DOCUMENT and cached. Known types keep their
  precise MIME.
- Text inlining now gates on a shared _TEXT_INJECT_EXTENSIONS set (text +
  code + config + markup) instead of a blind UTF-8 decode, so binary formats
  (PDF/zip/docx) with ASCII headers are never inlined.
- gateway/run.py emits the path-pointing context note for every DOCUMENT,
  including non text/application MIME types.
- discord.allow_any_attachment is now a documented no-op kept for config
  back-compat.

Validation: 357 gateway tests pass; E2E confirms .html/.bin/custom types
cache, known types stay precise, PDFs are not inlined.
2026-06-21 22:43:45 -07:00
teknium1
7726ce3040 fix(security): close hermes-0day MCP-persistence attack surface
Remove the dashboard --insecure auth-bypass, add an MCP persistence guard +
IOC blocklist, and raise the API-server key entropy floor.

Driven by the June 2026 hermes-0day campaign (r/hermesagent, live 854.media
instance): scanners find exposed Hermes dashboards/API servers, drive the
root agent to plant a 'command: bash' MCP entry that appends an attacker SSH
key to authorized_keys, which cron + startup then re-execute every tick.

- dashboard: --insecure no longer disables the auth gate. should_require_auth
  returns True for every non-loopback bind; a public bind ALWAYS requires an
  auth provider (bundled password provider or OAuth). --insecure kept as a
  warned no-op for backward compat. Fail-closed error now points at the
  password provider, not at --insecure.
- mcp_security: validate_mcp_server_entry now also rejects shell payloads that
  write to OS persistence surfaces (authorized_keys/.ssh/pam.d/sudoers/cron/
  rc files) and hard-rejects a hermes-0day IOC blocklist (attacker SSH key +
  source IPs) anywhere in command/args/env. Runs at save AND spawn time.
- api_server: raise network-bind API_SERVER_KEY entropy floor 8->16 chars;
  warn when a network-accessible API server runs an unsandboxed local backend.
2026-06-21 19:05:27 -07:00
Teknium
c0409a87ff
feat(gateway): typed send-error classification (SendResult.error_kind) (#50342)
Add a platform-neutral send-failure vocabulary so consumers can branch on a
typed category instead of substring-matching the raw provider message.

- base.py: SEND_ERROR_KINDS + classify_send_error() (too_long / bad_format /
  forbidden / not_found / rate_limited / transient / unknown), and an optional
  SendResult.error_kind field (defaults None — fully backward compatible).
- telegram.py: populate error_kind on send() failures; message_too_long keeps
  its existing error token plus error_kind='too_long'.

Purely additive: no behavioral change to the existing degrade-and-deliver
paths (MarkdownV2->plain-text fallback, overflow split, retry classification
all untouched). 22 new tests + 210 adapter regression tests green.
2026-06-21 12:34:22 -07:00
Teknium
7a131f7f40
fix(api-server): stop silently promising async delivery on stateless HTTP path (#50319)
* fix(api-server): stop silently promising async delivery on stateless HTTP path

terminal(notify_on_complete=True / watch_patterns) and delegate_task(background=True)
silently no-op'd on the API server / WebUI path (#10760): the watcher / detached
child registered, but every API-server route (OpenAI-spec /v1/chat/completions
and /v1/responses, plus the proprietary /v1/runs SSE stream) tears down its
channel when the turn ends, and APIServerAdapter.send() is a no-op stub. A
completion that fires after the response closed had nowhere to go — from the
agent side, indistinguishable from a hang.

There is no spec-compliant surface to wake the agent later on a stateless HTTP
client, so make the no-op honest instead of silent:

- Add a per-adapter capability flag supports_async_delivery (default True;
  APIServerAdapter = False), propagated into a HERMES_SESSION_ASYNC_DELIVERY
  contextvar via async_delivery_supported(). Toggle on the adapter, not a
  hardcoded platform string — a future stateless adapter is correct-by-default.
- terminal: when delivery is unsupported, skip watcher registration, force
  notify_on_complete off, and return a notify_unsupported note telling the
  agent to process(action='poll').
- delegate_task: when delivery is unsupported, fall back to SYNCHRONOUS
  execution (work runs and returns in the same response) with a note, instead
  of handing out a handle that never resolves.

CLI (in-process completion_queue) and the real gateway platforms are unchanged.

Fixes #10760

* refactor(api-server): route session binding through a single no-delivery chokepoint

Add APIServerAdapter._bind_api_server_session() and route both agent-entry
paths (_run_agent for /v1/chat/completions + /v1/responses, and the /v1/runs
_run_sync path) through it. The helper hardwires platform="api_server" and
async_delivery=False with no async_delivery parameter to pass, so a future
route added to the API server physically cannot reintroduce the silent
no-op (#10760) by forgetting to mark the channel as non-delivering.

The binding stays request-scoped (cleared per turn), so a session resumed
later on a delivering interface (CLI / gateway platform) re-binds fresh and
is NOT blocked — the no-delivery decision tracks the interface handling the
current turn, never the session.
2026-06-21 12:15:14 -07:00
sgaofen
93ea9b04af fix(gateway): cap inbound media download size to prevent memory exhaustion
Inbound image/audio/video payloads were buffered fully into process memory
before being written to the cache, with no size limit. A large upload
(Discord Nitro allows 500 MB) or a remote media URL in an inbound message
pointing at a huge file could spike RAM and OOM-kill the gateway.

Enforce a configurable cap in the shared cache helpers (gateway/platforms/
base.py) so the protection holds across every platform adapter, not one:

- cache_image/audio/video_from_bytes reject oversized payloads before writing
  (video was the gap in the original report — now covered).
- cache_image/audio_from_url stream the body, rejecting on an oversized
  Content-Length header and re-checking the running total per chunk so an
  absent/lying header can't smuggle an unbounded body past the cap.
- Discord's _read_attachment_bytes checks att.size up front, so an oversized
  attachment is rejected before any bytes are pulled into memory.

Configurable via gateway.max_inbound_media_bytes in config.yaml (default
128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml.

Salvaged and extended from @sgaofen's PR #13341 (the original report and the
shared-helper approach). Reapplied onto current main (Discord adapter has
since moved to plugins/platforms/discord/), the configurable knob moved from
an env var to config.yaml, and the video cache helper added.

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-21 11:56:46 -07:00
Teknium
e499d69e3e
feat(api-server): configurable concurrent-run cap to prevent DoS (#50007)
The OpenAI-compatible API server only enforced a hardcoded cap of 10
concurrent runs on /v1/runs, leaving /v1/chat/completions and
/v1/responses unbounded — a request flood could exhaust CPU, memory,
and upstream LLM quota (#7483).

- Add gateway.api_server.max_concurrent_runs (config.yaml, default 10,
  0 disables). No env var.
- Shared concurrency gate across all three agent-serving endpoints,
  counting both the chat/responses in-flight counter and the /v1/runs
  stream set. Returns OpenAI-style 429 + Retry-After when at the cap.
- Remove the dead hardcoded _MAX_CONCURRENT_RUNS class attribute.

Closes #7483.
2026-06-21 07:26:03 -07:00
kshitijk4poor
b577f25100 refactor(gateway): dedupe drain-timeout resolution + share active_agents parse
Follow-up cleanups on top of the busy/idle readout (PR #50103):

- web_server.py /api/status reused the single drain-timeout resolver
  hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT
  env -> agent.restart_drain_timeout config -> default) instead of inlining a
  third hand-rolled copy of that precedence chain. Also fixes a subtle
  divergence: the inline copy used os.environ.get() so a set-but-empty env var
  was treated as a value rather than falling through to config; the shared
  resolver .strip()s and falls through correctly.
- Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces
  (/api/status and /health/detailed) through it, so the exposed active_agents
  field is consistently clamped non-negative. Previously /api/status clamped
  while /health/detailed exposed the raw file value, diverging on a corrupt
  count.
- Added TestParseActiveAgents covering the shared coercion contract.
2026-06-21 17:22:52 +05:30
Ben
0ee75469d7 feat(dashboard): surface gateway busy/drainable on /api/status
Give an external consumer (NAS) a trustworthy, always-reachable busy/idle
readout it can poll before a disruptive lifecycle action (restart,
migrate, stop, auto-update). The dashboard /api/status is the only HTTP
surface guaranteed up on a hosted agent regardless of which gateway
platforms are enabled, and it already reads gateway_state.json.

Add to /api/status (additive, non-breaking):
  - active_agents       — in-flight gateway-turn count (now refreshed
                          per-turn by the companion gateway-side commit)
  - gateway_busy        — running AND active_agents > 0
  - gateway_drainable   — running and live (a valid begin-drain target)
  - restart_drain_timeout — resolved seconds, so the consumer can size its
                          poll deadline without out-of-band knowledge
                          (env HERMES_RESTART_DRAIN_TIMEOUT → config
                          agent.restart_drain_timeout → default)

The busy/drainable contract is defined once in gateway.status
(derive_gateway_busy / derive_gateway_drainable) and consumed by both
/api/status and /health/detailed so the two surfaces can never disagree.
Liveness keys off gateway_running (a live PID/health probe), NEVER
gateway_updated_at — a healthy idle gateway never advances that timestamp.
All derived fields degrade to safe falsy values when the gateway is down
or the status file is absent/corrupt (never a spurious "busy" that would
wedge the consumer). active_sessions (the 5-min DB recency heuristic the
SPA reads) is left exactly as-is — new signal, new fields.

Tests (behaviour contracts, not snapshots): the pure derivation contract
across every running/state/count/liveness combination; /api/status
integration for busy, idle-drainable, draining, down, stale-busy-file,
corrupt-count, and timeout surfacing; and /health/detailed parity.
2026-06-21 17:22:52 +05:30
Zheng Tao
491579fa05 fix(whatsapp): resolve bridge dir with HERMES_HOME mirror in Docker
In Docker the install tree (/opt/hermes) is read-only, so npm install for
the WhatsApp bridge fails with EACCES. Add resolve_whatsapp_bridge_dir() in
whatsapp_common.py: when the install dir is read-only, mirror the bridge
source into a writable HERMES_HOME location and use that. Both the
adapter and the 'hermes whatsapp' CLI resolve through the shared helper so
the install and runtime paths agree.

Fixes #49561
2026-06-20 17:05:27 -07:00
Teknium
5600105478 refactor(gateway): migrate slack/dingtalk/whatsapp/matrix/feishu/telegram/wecom/email/sms adapters to bundled plugins
Salvage of PR #41284 onto current main. Relocates the last 9 inline messaging
adapters (+ satellites: telegram_network, feishu_comment/_rules/meeting_invite,
wecom_crypto, wecom_callback) from gateway/platforms/ into self-contained
bundled plugins under plugins/platforms/<x>/, discovered via the platform
registry. Strips the per-platform core touchpoints from gateway/run.py,
gateway/config.py, hermes_cli/gateway.py, hermes_cli/setup.py, and
tools/send_message_tool.py.

Carries forward the migration fixes (explicit enabled:false honored,
get_connected_platforms forces discovery, plugin is_connected via
gateway.get_env_value, logs --component gateway matches plugins.platforms.*,
matrix hidden on Windows).

Additionally ports config keys main added since the PR base: the matrix
plugin's _apply_yaml_config now also covers allowed_users,
ignore_user_patterns, process_notices, and session_scope (the inline
gateway/config.py matrix block gained these in the 1340 commits the PR sat
open; they would otherwise have been silently dropped on deletion).
2026-06-20 10:26:45 -07:00
kshitijk4poor
26d9a3c710 fix(signal): FIFO-evict the quote-detection timestamp cache
`_sent_message_timestamps` (the reply-to-own-message quote cache) used a
`set` evicted with `set.pop()`, which removes an ARBITRARY element — so once
more than the cap (500) outbound timestamps are tracked, a still-recent
timestamp could be dropped while older ones survive, missing a genuine
reply-to-own-message. Convert it to an OrderedDict with FIFO (oldest-first)
eviction, mirroring the recently-hardened echo ring (#31250). This closes the
same bug class on the sibling cache.

Adds a regression test asserting oldest-first eviction + MRU promotion.
2026-06-20 21:00:46 +05:30
w31rdm4ch1nZ
332f88f6a6 fix(signal): harden recently-sent echo ring with LRU + TTL 2026-06-20 20:50:52 +05:30
kshitijk4poor
32a97a20af fix(signal): strip self-mention in all groups, not just require_mention
Review follow-up on the salvaged self-mention strip (#31217): the original
only stripped the bot's rendered @<number>/@<uuid> self-mention inside the
`require_mention=true` branch, so groups with require_mention=false still
leaked it into the agent text. Hoist the strip to run for every group message
(fixing the whole bug class), and collapse the doubled space a mid-sentence
removal leaves while preserving intentional newlines.
2026-06-20 16:27:28 +05:30
Kailigithub
40b6ac9ac7 fix(signal): send explicit stop-typing RPC when cancelling indicator 2026-06-20 16:23:41 +05:30
Rick Ratmansky
96b10327b6 fix(signal): strip bot self-mention from group messages before agent dispatch 2026-06-20 16:23:41 +05:30
lkz-de
96db7c6883 fix(signal): preserve quoted reply context
Carry Signal quote metadata through gateway events so replies to assistant messages include the quoted context without personalizing comments.
2026-06-20 15:16:53 +05:30
kshitij
ff50a88617
Merge pull request #49558 from NousResearch/salvage/env-var-guards-48735 2026-06-20 15:11:54 +05:30
kshitijk4poor
a7dd98c860 fix(env): guard remaining malformed int/float env var casts with utils helpers
Widen the env_float() guard from #48735 across the whole bug class: a
non-numeric value (e.g. a stale .env "HERMES_API_TIMEOUT=abc" or a typo'd
port) raised an unhandled ValueError and crashed adapter/agent init.

Converts 22 genuinely-unguarded first-party int/float(os.getenv()) sites to
the canonical utils.env_int / utils.env_float helpers (the established house
pattern), instead of duplicating per-module helpers or inline try/except:

- gateway/config.py: WECOM_CALLBACK_PORT, BLUEBUBBLES_WEBHOOK_PORT
- gateway/platforms/email.py: EMAIL_IMAP/SMTP_PORT, EMAIL_POLL_INTERVAL
- gateway/platforms/feishu.py: dedup cache + text/media batch settings
- gateway/platforms/wecom.py, discord/adapter.py: text batch delays
- gateway/platforms/telegram.py: media batch delay, TELEGRAM_WEBHOOK_PORT
- gateway/platforms/whatsapp.py: WHATSAPP_NPM_INSTALL_TIMEOUT
- hermes_cli/auth.py: CODEX/XAI refresh timeouts
- agent/chat_completion_helpers.py: API/stream read/stale timeouts
- run_agent.py, agent/auxiliary_client.py: API + nous timeouts

Sites already guarded by try/except or local helpers are left untouched.
The HERMES_MAX_ITERATIONS sites are already guarded on main via
_current_max_iterations(), so they are not included.
2026-06-20 14:54:36 +05:30
kshitijk4poor
abafba0762 refactor(signal): correct STT-fallback comment, type the markdown wrapper, make AAC test portable
Review follow-up on the salvaged AAC + markdown changes:
- Fix an inaccurate comment claiming the STT layer has a sniff-and-remux
  fallback (verified: no such fallback exists; the ffmpeg-absent path caches
  raw ADTS and STT may reject it).
- Type the _markdown_to_signal wrapper as tuple[str, list[str]] to match the
  shared helper instead of a bare tuple.
- Replace the hardcoded /home/pi/... test fixture with a runtime-generated
  ADTS AAC sample so the remux round-trip actually runs in CI (skips only
  when ffmpeg is absent) instead of always-skipping.
2026-06-20 14:24:29 +05:30
jasnoorgill
da34fca2bb fix(signal): detect ADTS AAC voice notes and remux to MP4
Android Signal delivers voice notes as raw ADTS AAC frames, which
share the `0xFF 0xFx` sync word with MPEG-1/2 Layer 3 (MP3). The
`_guess_extension` byte-signature test in gateway/platforms/signal.py
was matching both, so ADTS AAC was being misclassified as MP3 — saved
to disk with the wrong extension and rejected by every major STT API
(Groq, OpenAI) because their server-side format sniffers inspect the
actual codec, not the file extension.

Two changes:

1. Tighten the MP3 vs ADTS disambiguator. ADTS packs `ID`,
   `layer`, and `protection_absent` into bits 3-0 of byte 1, where
   `ID=0` and `layer=00` for AAC. Real MP3 has `ID=1` and
   `layer` in {01, 10, 11}. The mask `0xF6` against target `0xF0`
   cleanly separates them.

2. Remux raw ADTS AAC to MP4 container at the cache step via
   `ffmpeg -c:a copy`. Single demux/remux, no re-encode, no quality
   loss, sub-100ms on a Pi 5. The cached file is a normal `.m4a`
   that all major STT providers accept. ffmpeg is a transitive
   dependency of many other Hermes features (TTS, video skills) so
   this isn't a new install requirement; the remux degrades
   gracefully to a no-op if ffmpeg is missing.

The new helper `_remux_aac_to_m4a` is unit-tested with a real
Android voice note from the audio cache that originally triggered
the bug, plus synthetic ADTS frames for the byte-level
disambiguator and garbage-input graceful failure.

Closes the gap that broke transcription for any Android Signal user
sending voice messages to Hermes.
2026-06-20 13:48:05 +05:30
lkz-de
905820b59f fix(signal): share markdown formatting across send paths
Route Signal send paths through shared markdown formatting helpers and render markdown bullets consistently as Unicode bullets. Add coverage for Signal formatting and send_message integration.
2026-06-20 13:47:14 +05:30
joaomarcos
3a6c171e9e fix(gateway): log signal transport response and bubble cron live adapter errors 2026-06-19 16:59:38 -07:00
joaomarcos
5649b8649a Fix silent delivery failures in Signal live adapter (#49260) 2026-06-19 16:59:38 -07:00
kshitijk4poor
d4e7dd609d refactor(windows): tidy managed-node resolver helpers
Behavior-preserving cleanups on the managed-node resolver:
- Hoist _candidate_node_command_names() out of the inner dir loop in
  find_hermes_node_executable (computed once, not per directory).
- Drop redundant os.environ.copy() at the two with_hermes_node_path(
  os.environ.copy()) sites \u2014 the helper already copies os.environ when
  called with no argument (verified env-equivalent).
- Add reciprocal keep-in-sync comments between iter_hermes_node_dirs()
  (hermes_constants.py) and hermesManagedNodePathEntries() (electron
  main.cjs), which mirror the same platform-ordering rule across the
  Python/Node boundary.
2026-06-20 02:12:16 +05:30
helix4u
7a7b56d498 fix(windows): prefer managed node for whatsapp and desktop 2026-06-20 02:00:37 +05:30
Teknium
26e76a75e5
feat(telegram): opt-in Online/Offline bot status indicator (#49134)
Sets the Telegram bot's short description (the line under its name) to
"Online" on gateway connect and "Offline" on clean disconnect, gated
behind extra.status_indicator (off by default).

Telegram bots have no presence/online dot — that's a user-account
feature the Bot API doesn't expose for bots. The short description is
the closest available surface, so this gives users a way to tell whether
the gateway is up from the bot's profile.

- New extra.status_indicator flag (+ status_online/status_offline text
  overrides), read in __init__ via config.extra — no config-schema change.
- _set_status_indicator() helper: best-effort, swallows API errors so it
  never blocks connect/disconnect; truncates to Telegram's 120-char cap.
- Wired Online after _mark_connected(), Offline at top of disconnect()
  while the bot HTTP client is still alive.
- 9 unit tests + Telegram docs section.

Requested by @ilTrumpista, cc @Teknium.
2026-06-19 11:38:39 -07:00
teknium1
a58287afcb
Merge remote-tracking branch 'origin/main' into pr48275-rebase
# Conflicts:
#	cron/scheduler.py
2026-06-19 07:40:29 -07:00
Ben Barclay
f35abb122a feat(gateway): multiplex phase 1 — HTTP-inbound /p/<profile>/ routing (webhook)
Serve webhook inbound for multiple profiles off the one shared listener via a
URL prefix, with no second port bound.

- SessionSource gains a 'profile' field (round-trips through to_dict/from_dict;
  omitted when unset so existing serialization is unchanged). It carries which
  profile an inbound message was routed to.
- WebhookAdapter registers /p/{profile}/webhooks/{route_name} alongside the
  existing /webhooks/{route_name}. _resolve_request_profile validates the
  prefix against profiles_to_serve(): None when absent or multiplexing is off
  (ignored, handled as default — no spurious 404), the profile name when valid,
  _PROFILE_REJECTED (→ 404) when the profile isn't served. The resolved profile
  is stamped onto the SessionSource.
- session-key namespacing and the per-turn home/credential scope now prefer
  source.profile: SessionStore._resolve_profile_for_key(source),
  _session_key_for_source fallback, and _resolve_profile_home_for_source all
  honor it (→ the agent turn resolves that profile's config/skills/credentials
  via the Phase 2 _profile_runtime_scope).

Constraint: routing inbound needs no per-profile platform credential, but the
agent still needs the routed profile's provider key — delivered by Phase 2's
secret scope. api_server (OpenAI-compatible surface) profile routing is a
focused follow-on; its source-construction path differs from webhook's.

Tests: SessionSource.profile round-trip + namespace drive; _resolve_request_
profile accept/reject/ignore matrix.
2026-06-19 07:34:15 -07:00
infinitycrew39
460b1e50e5 fix(gateway): refresh max_turns before resolving runtime budget 2026-06-19 06:31:13 -07:00
Ben
b75757d4aa feat(cron): wire on_jobs_changed, cron.chronos config, docs + agent↔NAS contract
Phase 4F (F.1 + F.2 + F.3, agent side). F.4 is the operator-run live smoke
(needs a NAS deployment); recorded in the PR, not code.

F.1 — on_jobs_changed wiring:
- cron/scheduler.py: _notify_provider_jobs_changed() — resolve the active
  provider, call on_jobs_changed(), swallow errors. Lives in scheduler.py (not
  jobs.py) so the store stays free of provider imports (no import cycle).
- Wired at the consumer surfaces AFTER a successful mutation: the cronjob model
  tool (tools/cronjob_tools.py, create/update/remove/pause/resume) — which the
  `hermes cron` CLI also routes through — and the REST handlers
  (gateway/platforms/api_server.py, same five). Built-in's no-op default = zero
  behavior change on the default path. Sleeping-agent direct jobs.json writes
  (no tool/CLI/REST) are covered by reconcile-on-wake in start().

F.2 — config: cron.chronos.{portal_url,callback_url,expected_audience,
nas_jwks_url}. All non-secret; the agent holds no scheduler creds and the
outbound provision call reuses the existing Nous token (no token key). Additive
deep-merge key, no version literal.

F.3 — docs:
- docs/chronos-managed-cron-contract.md: authoritative agent↔NAS wire contract
  (the three agent-cron endpoints + inbound /api/cron/fire + the 3-hop trust
  model + at-most-once/re-arm semantics). This is what the NAS-side agent builds
  against.
- cron-internals.md: "Managed cron (Chronos) for scale-to-zero" section.
- cli-commands.md: cron.provider accepts chronos + the cron.chronos.* keys.
- User docs name no scheduler vendor (QStash is a NAS-internal detail).

INVARIANT re-verified: zero qstash/upstash hits across plugins/cron, gateway,
hermes_cli, tools, website/docs (the one remaining repo hit is an unrelated
Context7 MCP comment in tools/mcp_tool.py).

Tests: test_jobs_changed_notify (5) — notify calls provider hook, swallows
errors, built-in harmless, tool create/remove notify. Full cron + chronos +
webhook + config + api_server_jobs suites green (504 in the cron+chronos+webhook
run).
2026-06-18 15:11:32 +10:00
Ben
3fc7b624d8 feat(cron,gateway): NAS-JWT fire verifier + /api/cron/fire webhook (Chronos)
Phase 4E (E.1 + E.2). The inbound side of Chronos: NAS POSTs the agent when a
one-shot fires; the agent verifies a NAS-minted JWT and runs the job.

E.1 — plugins/cron/chronos/verify.py:
- verify_nas_fire_token(token, expected_audience, jwks_or_key, issuer): verifies
  signature against the NAS JWKS (RS/ES family; symmetric rejected), aud == this
  agent, exp/nbf, iss, and purpose == "cron_fire" (so a general agent JWT can't
  be replayed against the fire endpoint). Returns claims or None; never raises.
  Crypto delegated to PyJWT[crypto] (already a declared dep) — no hand-rolled
  JWT, no new dependency. No key configured → refuse (never unsigned-decode a
  security boundary).
- get_fire_verifier(): pluggable indirection so the DQ-4 escape hatch
  (direct per-job cron-key) can swap in with no handler change.

E.2 — gateway/platforms/api_server.py:
- POST /api/cron/fire (registered only when _CRON_AVAILABLE). Authenticated by
  the NAS-JWT via get_fire_verifier() — NOT API_SERVER_KEY (NAS holds no API
  key; this is the only inbound that triggers remote job execution, so it gets
  its own purpose-scoped check). Verifier args come from cron.chronos.* config.
  401 on bad/missing/forged token. 400 on missing job_id. On success: 202 +
  fire_due runs in the background (so a long agent turn never trips NAS's HTTP
  timeout); the store CAS claim inside fire_due de-dupes a scheduler retry.

Tests:
- test_chronos_verify (11): REAL RS256 signing — valid→claims, wrong-aud,
  missing/wrong purpose, expired, wrong-iss, tampered-signature (attacker key),
  no-key-refuse, empty-token, JWKS-URL key resolution, get_fire_verifier.
- test_cron_fire_webhook (5): valid→202+fire, invalid→401+no-fire, missing
  token→401, missing job_id→400, and fire path does NOT require API_SERVER_KEY.
api_server regression suites (214) green.

E.3 (NAS endpoints) is a separate cross-repo PR; the wire contract lands next
(docs/chronos-managed-cron-contract.md).
2026-06-18 14:46:33 +10:00
Sierra (Hermes Agent)
01ae9b853e fix(telegram): resolve replies to rich (sendRichMessage) messages
Telegram does not echo a sendRichMessage's content back in
reply_to_message (.text/.caption empty, .api_kwargs None), so replies
to rich sends (briefings, the gateway's own rich finals) arrived with
no quotable text and the [Replying to: ...] injection was skipped.

Remember message_id -> text at send time in a best-effort JSON index
(gateway/rich_sent_store.py), and recover it on inbound when text and
caption are both empty. Best-effort and no-throw throughout: any
failure degrades to prior behavior and never breaks a send or message.

Salvaged from #47375 by @x1erra. Dropped the cross-platform run.py
reply-prefix rewrite (out of scope; bloated every reply on every
platform) and scrubbed a docstring reference to an out-of-repo script.
Kept the inbound reply_to logging enrichment used to verify the fix.
2026-06-16 13:04:20 -07:00
Wolfram Ravenwolf
16fc717091 fix(mattermost): harden delivery hygiene
PROBLEM: Mattermost threads can become invalid or enormous, exposing two failure modes: internal scratch/reasoning/commentary displays could leak into persistent Mattermost threads via global display toggles, while rejected threaded user-visible replies could disappear unless every failed send fell back flat. A broad flat fallback would pollute channels with tool/status/progress noise.

SOLUTION: Require explicit Mattermost platform opt-in for scratch displays, keep using the existing notify=True metadata marker for user-visible final text/media/file replies, and allow the Mattermost plugin adapter to flat-fallback only notify-worthy sends whose threaded POST failure looks like a broken root/thread. Keep tool/status/progress and other non-notify sends thread-strict. Add regression tests for display opt-in, notify-only broken-thread fallback, generic API failure suppression, and stream notify metadata.

Verification: tests/gateway/test_mattermost.py tests/gateway/test_stream_consumer.py tests/gateway/test_stream_consumer_thread_routing.py tests/gateway/test_stream_consumer_fresh_final.py tests/gateway/test_stream_consumer_draft.py; tests/gateway/test_session_api.py tests/gateway/test_status_command.py tests/gateway/test_resume_command.py tests/hermes_cli/test_commands.py; py_compile touched gateway files; git diff --check.

Session: Mattermost thread 6qg8e9dd1pd9pkhi74xyaa1mry, 2026-06-01.
2026-06-16 06:34:54 -07:00
Rory Evans
e65d74bc6f fix(gateway): accept metadata kwarg in WhatsApp/email send_image
`BasePlatformAdapter.send_multiple_images` passes `metadata=metadata` to
`send_image` / `send_image_file` / `send_animation` on every send. The
WhatsApp and email `send_image` overrides stopped their signature at
`reply_to`, so any image delivered as a URL (the common case — image-gen
backends return URLs) raised:

    TypeError: send_image() got an unexpected keyword argument "metadata"

and the image silently failed to send. Their sibling overrides
(`send_image_file` / `send_video` / `send_voice` / `send_document`)
already absorb it via **kwargs, which is why only plain image-URL sends
broke.

- whatsapp/email `send_image`: accept `metadata` (matches the base
  signature); WhatsApp forwards it to the super() text fallback.
- Add `tests/gateway/test_media_metadata_contract.py`: asserts WhatsApp +
  email accept it, plus a best-effort sweep over every adapter so the next
  slip fails at test time instead of in production.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 06:23:53 -07:00
Teknium
a6364bfa08
fix(telegram): edit streamed previews in place as rich (Bot API 10.1) (#46890)
Streamed Telegram replies that finalize through editMessageText were
converted to MarkdownV2, which has no table syntax and rewrites pipe
tables into bullet lists — users saw a table while streaming that
collapsed to a list at the last moment.

Finalize now edits the existing preview IN PLACE via Bot API 10.1's
editMessageText rich_message parameter when the content has constructs
the legacy path degrades (tables, task lists, <details>, block math).
No fresh send + delete, so no duplicate-preview flicker — the reason
#46206 reverted the fresh-final re-send path. prefers_fresh_final_streaming
stays False; the in-place edit replaces it.

- _needs_rich_rendering(): rich reserved for table/task-list/details/math
  (adapted from #45995, @YonganZhang); plain replies stay on MarkdownV2.
- _try_edit_rich(): editMessageText + rich_message via do_api_request,
  mirroring _try_send_rich's fallback/latch/transient contract.
- edit_message finalize tries rich in place before the 4,096 overflow
  pre-flight (rich cap is 32,768), falling back to legacy on rejection.
- rich_messages default flipped back to True (DEFAULT_CONFIG + adapter).
- docs (en + zh-Hans) + cli-config example updated to default-on.

Closes the root cause behind #45911 / #46009.
2026-06-16 05:26:04 -07:00
Teknium
a1f51feb72
fix(telegram): avoid rich final duplicate previews (#46206) 2026-06-14 11:13:38 -07:00
Teknium
efbe1635dd
fix(gateway): include replied-to media attachments (#46107) 2026-06-14 04:51:50 -07:00
Teknium
9459057d7f
fix(telegram): guard rich details math crash (#46102) 2026-06-14 04:22:22 -07:00
Teknium
cf7d5932f8 fix(email): make IPv4 SMTP fallback use supported sockets 2026-06-14 04:16:26 -07:00
liuhao1024
04d4471d79 fix(email): use SMTP_SSL for port 465 and fall back to IPv4 on timeout
Port 465 expects implicit TLS (SMTP_SSL) from the first byte. The email
adapter always used SMTP() + starttls(), which is correct for port 587
but hangs/fails on port 465 providers (e.g., Swiss ISPs).

Additionally, when the SMTP host has AAAA DNS records but IPv6 is
unreachable, socket.create_connection() tries IPv6 first and hangs
until timeout. Add an IPv4 fallback via AF_INET socket.

Extract _connect_smtp() helper to consolidate the 4 duplicate SMTP
connection sites into a single method with correct protocol selection
and IPv6 fallback logic.
2026-06-14 04:16:26 -07:00
Teknium
5105c3651a
perf(api-server): normalize chat content linearly (#46079) 2026-06-14 03:25:49 -07:00
Teknium
afc8615509
perf(webhook): prune request caches incrementally (#46065) 2026-06-14 02:40:54 -07:00
Justin Sunseri
12682d96b9 feat(telegram): restore rich messages opt-out
Salvages PR #45840's client-compatibility opt-out while keeping rich messages enabled by default via telegram.extra.rich_messages: true.
2026-06-13 21:45:49 -07:00
ITheEqualizer
57c2a55be4 fix(telegram): harden rich message fallback handling
Carry forward focused follow-ups from PR #45741: treat PTB's raw Bot API 10.1 response shapes safely, recognize real missing-endpoint errors, preserve link preview settings on rich sends, and lock the rich limit to Telegram's character-based cap.
2026-06-13 14:34:53 -07:00
ITheEqualizer
7c0605bf22 fix(telegram): preserve rich formatting on stream final 2026-06-13 13:44:45 -07:00
Que0x
fc46354580 fix(security): fail closed when an own-policy gateway adapter has no allowlist
Own-policy adapters (WhatsApp, WeCom, Weixin, QQBot, Yuanbao) default dm_policy/group_policy to "open", which forwards every sender. The gateway's adapter-trust shortcut in _is_user_authorized blanket-trusted those platforms when no env allowlist was set, so an operator who enabled one with only credentials authorized the entire external network -- the fail-open SECURITY.md section 2.6 forbids ("an allowlist is required for every enabled network-exposed adapter").

Trust the adapter only when its effective policy for the chat type is an actual "allowlist" restriction (the case #34515 was protecting). "open"/"pairing"/anything else falls through to default-deny, where {PLATFORM}_ALLOW_ALL_USERS / GATEWAY_ALLOW_ALL_USERS and the pairing flow remain the explicit opt-ins.
2026-06-13 07:18:54 -07:00