Commit graph

864 commits

Author SHA1 Message Date
Teknium
80717a157f fix(discord): route DM role-auth opt-in through config.yaml (not env var)
Per repo policy, ~/.hermes/.env is for secrets only. Guild IDs are
behavioral configuration, not secrets. Replacing the
DISCORD_DM_ROLE_AUTH_GUILD env var from the original fix with
discord.dm_role_auth_guild in config.yaml.

- New module-level _read_dm_role_auth_guild() helper reads
  hermes_cli.config.read_raw_config()['discord']['dm_role_auth_guild'].
  Fails closed on any parse error (safe default = DM role-auth off).
- DEFAULT_CONFIG['discord'] gains dm_role_auth_guild: '' with a comment
  documenting the opt-in.
- Tests patch hermes_cli.config.read_raw_config directly (via the
  _set_dm_role_auth_guild helper) instead of setenv/delenv. 12 tests
  in test_discord_roles_dm_scope pass; no env var involvement.
- Docstring + module docstring + comments updated to reference
  discord.dm_role_auth_guild.
- E2E verified with real imports across 6 scenarios: unset, int,
  string, garbage, zero, and (crucially) env-var-only-no-config all
  return None except the valid int/string cases. Env var has zero
  effect — policy compliance confirmed.
2026-05-07 05:51:56 -07:00
Teknium
5c045b8f6c fix(discord): extend role-scope fix to slash surface + fixture update
Sibling-site fix: _evaluate_slash_authorization was the fourth
_is_allowed_user caller and didn't pass guild/is_dm through, so slash
interactions would take the DM branch regardless of whether they came
from a guild channel. Now reads interaction.guild + in_dm and forwards.

Also updates test_discord_slash_auth fixture (_make_interaction) so
the SimpleNamespace guild mock has a get_member(uid)->None method —
required by the new guild-scoped fallback path in _is_allowed_user.
Tests exercising positive role paths still work via user.roles.

Three new regression tests in test_discord_roles_dm_scope:
- Slash DM + role in mutual public guild → rejected
- Slash in guild B + role only in guild A → rejected
- Slash in guild B + role in guild B → allowed (positive control)

368 Discord tests pass. test_discord_free_channel_skips_auto_thread
also fails on clean main (pre-existing, unrelated to this fix).
2026-05-07 05:51:56 -07:00
0xyg3n
ef1e565570 fix(discord): scope DISCORD_ALLOWED_ROLES to originating guild (CVSS 8.1)
The initial DISCORD_ALLOWED_ROLES implementation (#11608, merged from #9873)
scans every mutual guild when resolving a user's roles. This allows a
cross-guild DM bypass:

1. Bot is in both public server A and private server B.
2. User holds the allowed role in server A only.
3. User DMs the bot. The role check finds the role in A and authorizes the
   DM, granting access as if the user were trusted in server B.

Fix:
- DMs (no guild context) disable role-based auth by default. Opt-in via
  DISCORD_DM_ROLE_AUTH_GUILD=<guild_id> restricts role lookup to one
  explicitly-trusted guild.
- Guild messages check roles only in the originating guild
  (message.guild), never in other mutual guilds.
- Reject cached author.roles when the Member came from a different guild
  than the current message.

Backwards compatibility:
- DISCORD_ALLOWED_USERS behavior is unchanged (still works in both DMs
  and guild messages).
- Deployments that rely on roles in guild channels continue to work;
  role checks are now strictly scoped to that guild.
- Deployments that intentionally want role-based DM auth can opt into a
  single trusted guild via DISCORD_DM_ROLE_AUTH_GUILD.

Tests: 9 new regression guards in
tests/gateway/test_discord_roles_dm_scope.py covering the bypass path,
the opt-in path, cross-guild guild-message bypass, and backwards-compat
user-ID paths. 47/47 discord-auth tests pass.

Refs: #11608 (initial implementation), #7871 (feature request),
  #9873 (PR author credit @0xyg3n)
2026-05-07 05:51:56 -07:00
altmazza0-star
8308d18339 fix(gateway): preserve max turns after env reload 2026-05-07 05:49:16 -07:00
pingchesu
43a6645718 docs: clarify API server tool execution locality 2026-05-07 05:30:37 -07:00
leon7609
d34f03c32a feat(gateway): support [[as_document]] directive for skill media routing
Skills that produce large/lossless images (e.g. info-graph, where a
rendered JPG is 1-2 MB) currently lose quality in Telegram delivery
because `_IMAGE_EXTS` membership routes the file through
`send_multiple_images` → `sendMediaGroup`, which Telegram's server
re-encodes to JPEG @ 1280px max edge. The original bytes only survive
when the file goes through `send_document`, which the dispatch tables
in three places (`_process_message_background`, `_deliver_media_from_response`,
and the `send_message` tool's telegram path) only reach for files
whose extension is NOT in `_IMAGE_EXTS`.

This commit adds an `[[as_document]]` directive that mirrors the
existing `[[audio_as_voice]]` shape: a skill emits the directive once
in its response, and every image-extension MEDIA: file in that response
is delivered via `send_document` instead of `send_multiple_images` /
`sendPhoto`. The directive is detected at the dispatch sites (which see
the raw response) and the directive string is stripped from the
user-visible cleaned text in `extract_media` so it never leaks.

Granularity is intentionally all-or-nothing per response, matching
[[audio_as_voice]]'s scope. Skills that need fine control can split into
two responses.

Verified the targeted use case: info-graph emits

    信息图已生成(...)
    [[as_document]]
    MEDIA:/tmp/info-graph-x/infographic.jpg

→ Telegram receives `infographic.jpg` via sendDocument, original 1MB
JPEG bytes preserved, no recompression. Forwarding and download
filenames stay clean (`infographic.jpg`).

Tests: +3 cases in TestExtractMedia covering directive strip, isolation
from voice flag, and coexistence with [[audio_as_voice]]. All
113 pre-existing media/extract/send tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 05:20:10 -07:00
teknium1
333598cb0e fix(gateway): cap cached session sources with LRU eviction
Follow-up on top of Zyproth's session-source cache: swap the unbounded
dict for an OrderedDict with a 512-entry LRU cap so long-running
gateways can't accumulate stale entries for dead sessions forever.

- self._session_sources is now an OrderedDict
- _cache_session_source() move_to_end + popitem(last=False) above cap
- _get_cached_session_source() move_to_end on hit (LRU read bump)
- restart_test_helpers.py wires OrderedDict + _session_sources_max
2026-05-07 05:16:38 -07:00
Zyproth
176b93575a fix(gateway): preserve thread routing from cached live session sources 2026-05-07 05:16:38 -07:00
Teknium
ecaafe5f22 test(weixin): update timeout assertion for asyncio.wait_for migration 2026-05-07 05:10:04 -07:00
Zyproth
6e8f1e09a9 fix(gateway): use monotonic deadlines in QR onboarding flows 2026-05-07 05:09:39 -07:00
thelumiereguy
8a96fa48c1 fix(gateway): avoid duplicated responses history 2026-05-07 05:07:59 -07:00
Teknium
38b1c7dce5 refactor(gateway): simplify auto-resume + extend to crash recovery
Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape,
wider coverage.

Changes:
- Drop the synthetic '[System note: ...]' in the internal MessageEvent.
  The existing _is_resume_pending branch in _handle_message_with_agent
  (run.py ~L13738) already injects a reason-aware recovery system note
  on the next turn.  With kyan's text in place the model saw two stacked
  system notes.  Now the event text is empty and the existing injection
  path owns the wording.
- Drop SessionStore.list_resume_pending() as a new public method.  The
  filter is 8 lines inline in _schedule_resume_pending_sessions() —
  one caller, no other pluggability need.
- Add 'restart_interrupted' to the auto-resume reason set.  That's the
  reason SessionStore.suspend_recently_active() stamps on sessions
  recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker).
  Previously those sessions had to wait for a real user message to
  auto-resume; now they continue automatically at startup like
  drain-timeout interruptions do.
- Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so
  future reasons (e.g. 'manual_resume_request') can be opted in with
  one line.

Test coverage added:
- drain-timeout + crash-recovery both scheduled
- stale entries skipped (outside freshness window)
- suspended entries skipped (suspended > resume_pending)
- originless entries skipped (no routing target)
- disallowed reasons skipped (graceful forward-compat)

E2E verified end-to-end with a real on-disk SessionStore: 2 eligible
sessions scheduled, 2 ineligible skipped, empty-text internal events
delivered to the adapter.

Co-authored-by: Kevin Yan <kevyan1998@gmail.com>
2026-05-07 05:05:34 -07:00
Kevin Yan
961a3535fa fix(gateway): preserve resume marker on interrupted restart 2026-05-07 05:05:34 -07:00
Kevin Yan
fad684b1f3 feat(gateway): auto-resume interrupted sessions after restart 2026-05-07 05:05:34 -07:00
Teknium
bf843adf05
feat(gateway): opt-in cleanup of temporary progress bubbles (#21186)
When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running ' Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Telegram); silently no-ops elsewhere. Off by default.
Failed runs skip cleanup so bubbles stay as breadcrumbs.

Minimal plumbing: base.py's existing post_delivery_callback slot now chains
new registrations onto any existing callback (with per-callback exception
isolation) rather than clobbering. Stale-generation registrations are
rejected so they can't step on a fresher run's callbacks. This lets the
cleanup callback coexist with the background-review release hook already
registered on the same slot.

Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>
2026-05-07 05:04:37 -07:00
mrcoferland
bd0c54d171 fix: route Telegram image documents through photo handling 2026-05-07 04:51:46 -07:00
Teknium
5a3cadf6eb fix(discord): narrow rate-limit catch and move sync state under gateway/
Two follow-ups on top of helix4u's slash-command sync hardening:

- Only suppress exceptions that are actually Discord 429 rate limits
  (discord.RateLimited, HTTPException with status 429, or a clearly
  rate-limit-named duck type). Arbitrary failures that happen to expose
  a retry_after attribute now re-raise to the outer handler instead of
  silently swallowing a cooldown.
- Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root
  stops collecting ad-hoc runtime files.

Added a test verifying unrelated exceptions don't get misclassified as
rate limits.
2026-05-06 18:12:35 -07:00
helix4u
d797755a1c fix(gateway): wait for systemd restart readiness 2026-05-06 18:12:35 -07:00
Guillaume Meyer
7df6115199 feat(gateway): also gate pre-restart "Gateway restarting" notification
Extend the gateway_restart_notification flag to cover
_notify_active_sessions_of_shutdown — the message that fires just
before drain ("⚠️ Gateway restarting — Your current task will be
interrupted. Send any message after restart and I'll try to resume
where you left off.") sent to active sessions and home channels.

Same operator/end-user reasoning: on a Slack workspace shared with
end users, "Gateway restarting" reads as "the bot is broken" — the
operator should be able to suppress it consistently with the other
two lifecycle pings rather than having a partial opt-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:39:43 -07:00
Guillaume Meyer
b71f80e6ce feat(gateway): per-platform gateway_restart_notification flag
Adds an opt-out toggle on PlatformConfig that gates both restart
lifecycle pings: the "♻ Gateway restarted" message sent to the chat
that issued /restart, and the "♻️ Gateway online" home-channel
startup notification. Defaults to True so existing deployments are
unaffected.

The motivating split is operator vs. end-user surfaces: a back-channel
like Telegram should keep these pings, while a Slack workspace shared
with end users should not surface gateway lifecycle noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:39:43 -07:00
Yuqian
441ef75d15 fix(feishu): keep topic replies in threads
Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.
2026-05-06 10:52:51 -07:00
Teknium
d5357f816d refactor(telegram): make typing thread-id resolver symmetric with send
Mirror _message_thread_id_for_typing() with _message_thread_id_for_send():
both now map the General forum topic (thread id "1") to None upfront.

That removes the need for the retry-without-thread fallback in send_typing()
entirely — if _message_thread_id_for_typing() returns a non-None value, it's
a real user-created topic and falling back to the root chat is never correct.
If Telegram rejects the typing action (e.g. topic deleted mid-session), we
swallow it at debug level instead of bleeding the indicator into All Messages.

Updates the General-topic typing regression test to assert the new single-call
contract.
2026-05-05 13:28:08 -07:00
helix4u
41545f7ec5 fix(telegram): keep DM topic typing scoped 2026-05-05 13:28:08 -07:00
Es1la
a877c3f6d9 fix(feishu): tolerate malformed dedup timestamps
Salvages @Es1la's PR #13632 — a non-numeric timestamp in the persisted
feishu dedup state crashed adapter startup with ValueError/TypeError
from the unguarded float() call. Wrap the float() conversion in
try/except; skip the bad key and keep loading the rest.

The original PR also restructured existing TestDedupTTL tests to use
tempfile.TemporaryDirectory + HERMES_HOME patching — that was
test-hygiene scope creep unrelated to the bug. Kept only the
malformed-timestamp fix and added a focused regression test.
2026-05-05 10:15:09 -07:00
hharry11
247c9d468c fix(gateway): ensure deterministic thread eviction in helpers 2026-05-05 10:13:55 -07:00
Teknium
285c208cf7 fix(gateway): also tolerate malformed env vars in custom human-delay mode
Widens @Krionex's PR #16933 fix to cover the second bug class at the sibling
site. natural mode used to pass env values through int() before the PR
caught mis-typed values crashing the gateway; custom mode had the exact
same bug one branch away (HERMES_HUMAN_DELAY_MIN_MS=oops in custom mode
still crashed). Same try/except/fallback pattern, scoped to the two
int() calls that feed random.uniform().
2026-05-05 06:11:38 -07:00
Krionex
3b16c590e0 fix(gateway): ignore malformed custom delay env vars in natural mode 2026-05-05 06:11:38 -07:00
JC的AI分身
80b386a472 fix(feishu): refresh bot identity during hydration 2026-05-05 06:04:20 -07:00
Teknium
314361733f test(api_server): _run_agent result now carries session_id for #16938 2026-05-05 06:01:03 -07:00
Teknium
fe8560fc12
feat(api-server): X-Hermes-Session-Key header for long-term memory scoping (#20199)
* feat(api-server): X-Hermes-Session-Key header for long-term memory scoping

API Server integrations (Open WebUI, custom web UIs) can now pass a stable
per-channel identifier via X-Hermes-Session-Key that scopes long-term memory
(Honcho, etc.) independently of the transcript-scoped X-Hermes-Session-Id.
This matches the native gateway's session_key / session_id split: one stable
key per assistant channel, many independent transcripts that rotate on /new.

- _create_agent and _run_agent accept gateway_session_key and pass it to
  AIAgent(gateway_session_key=...), which is already honored by the Honcho
  memory provider (plugins/memory/honcho/client.py resolve_session_name).
- New shared helper _parse_session_key_header applies the same API-key
  gate, control-character sanitization, and a 256-char length cap as the
  existing session-id header.
- All three agent endpoints honor the header: /v1/chat/completions,
  /v1/responses, /v1/runs. JSON and SSE responses echo it back.
- /v1/capabilities advertises session_key_header so clients can
  feature-detect.

Closes #20060.

Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>

* chore: AUTHOR_MAP entry for manateelazycat

---------

Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>
2026-05-05 05:34:47 -07:00
Asher Morse
6b76ea4707 fix(gateway): load reply_to_mode from config.yaml for Discord and Telegram
The YAML-to-env-var bridge in load_gateway_config() mapped every Discord
and Telegram config key (require_mention, auto_thread, reactions, etc.)
except reply_to_mode. Users setting discord.reply_to_mode or
telegram.reply_to_mode in ~/.hermes/config.yaml got no effect — the
adapter only read the env var, which nothing populated from YAML.

Add the missing bridge for both platforms, following the existing pattern.
Top-level <platform>.reply_to_mode preferred, falls back to
<platform>.extra.reply_to_mode, env var never overwritten. Handles YAML
1.1 bare `off` → Python False coercion.

This is a re-submission of the work from #9837 and #13930, which both
implemented the same fix but neither landed (see co-authors below).

Co-authored-by: Matteo De Agazio <hypnosis.mda@gmail.com>
Co-authored-by: ishardo <239075732+ishardo@users.noreply.github.com>
2026-05-05 04:58:23 -07:00
0xsir0000
f6b68f0f50 fix(gateway): keep DoH-confirmed Telegram IPs that match system DNS (#14520)
discover_fallback_ips() filtered out any DoH-resolved IP that also appeared
in the system resolver's answer set, on the assumption that the system IP
was unreachable. When DoH and system DNS agreed (a common case), the
function returned the hardcoded _SEED_FALLBACK_IPS list instead — and on
networks where those seed addresses are not routable, the Telegram fallback
transport had nothing usable to retry against and polling failed.

Drop the system_ips exclusion so DoH-confirmed IPs are preserved regardless
of system DNS overlap. The TelegramFallbackTransport already tries the
primary path first via system DNS, then falls through to the IP-rewrite
path on connect failure; including the same IP in both lanes lets a
transient primary failure recover via the explicit IP route instead of
escalating to seed addresses.

Update the two tests that codified the old exclusion to reflect the new,
inclusion-by-default behaviour.

Fixes #14520
2026-05-05 04:42:59 -07:00
Teknium
2a285d5ec2
fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924) (#20184)
* revert(gateway): remove stale-code self-check and auto-restart

Removes the _detect_stale_code / _trigger_stale_code_restart mechanism
introduced in #17648 and iterated in #19740. On every incoming message
the gateway compared the boot-time git HEAD SHA to the current SHA on
disk, and if they differed it would reply with

    Gateway code was updated in the background --
    restarting this gateway so your next message runs
    on the new code. Please retry in a moment.

and then kick off a graceful restart. This is unwanted behaviour:
users who run a long-lived gateway and do their own ad-hoc git
operations on the checkout end up with their chat interrupted and
the current message dropped every time HEAD moves, with no way to
opt out.

If an operator really needs the old protection against stale
sys.modules after "hermes update", the SIGKILL-survivor sweep in
hermes update (hermes_cli/main.py, also tagged #17648) already
handles the supervisor-respawn case on its own.

Removed:
  gateway/run.py:
    - _STALE_CODE_SENTINELS, _GIT_SHA_CACHE_TTL_SECS
    - _read_git_head_sha(), _compute_repo_mtime() module helpers
    - class-level _boot_wall_time / _boot_repo_mtime / _boot_git_sha /
      _stale_code_restart_triggered defaults
    - __init__ boot-snapshot block (_boot_*, _cached_current_sha*,
      _repo_root_for_staleness, _stale_code_notified)
    - _current_git_sha_cached(), _detect_stale_code(),
      _trigger_stale_code_restart() methods
    - stale-code check + user-facing restart notice at the top of
      _handle_message()
  tests/gateway/test_stale_code_self_check.py (deleted, 412 lines)

No new logic added. Zero remaining references to any removed
symbol. Gateway test suite passes the same 4589 tests it passed
before; the 3 pre-existing unrelated failures (discord free-channel,
feishu bot admission, teams typing) are unchanged by this commit.

* fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924)

Per-delta _strip_think_blocks ran at _fire_stream_delta and destroyed
downstream state. When MiniMax-M2.7 / DeepSeek / Qwen3 streamed a tag
split across deltas (delta1='<think>', delta2='Let me check'), the
regex case-2 match erased delta1 entirely, so CLI/gateway state
machines never learned a block was open and leaked delta2 as content.
Raw consumers (ACP, api_server, TTS) had no downstream defense at all.

Replace the per-delta regex with a stateful StreamingThinkScrubber
that survives delta boundaries:
  - Closed <tag>X</tag> pairs always stripped (matches _strip_think_blocks
    case 1).
  - Unterminated open at block boundary enters a block; content
    discarded until close tag arrives.  At end-of-stream, held
    content is dropped.
  - Orphan close tags stripped without boundary gating.
  - Partial tags at delta boundaries held back until resolved.
  - Block-boundary rule (start-of-stream, after \n, or
    whitespace-only since last \n) preserves prose that mentions
    tag names.

Reset at turn start alongside the existing context scrubber; flush at
turn end so a benign '<' held back at end-of-stream reaches the UI.

E2E-verified on live OpenRouter->MiniMax-m2 streams: closed pairs
strip cleanly, first word of post-block content is preserved, pure
content passes through unchanged.  Stefan's screenshot case (#17924)
— 'Let me check' getting chopped to ' me check' — no longer happens.

Final _strip_think_blocks calls on completed strings (final_response,
replay, compression) are preserved; only the streaming per-delta call
site switched to the scrubber.
2026-05-05 04:33:38 -07:00
simbam99
8ad5e98f8d fix(gateway): preserve pending update prompts across restarts 2026-05-05 03:59:39 -07:00
Aamir Jawaid
2333b7a7ec fix(tests): patch TypingActivityInput after mock on Python <3.12
The SDK requires Python >=3.12 so CI (3.11) falls to the except
ImportError branch, leaving TypingActivityInput=None. After loading
the adapter module, explicitly restore it from the mock so
test_send_typing doesn't silently no-op.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 20:59:18 -07:00
Stephen Schoettler
f7918c9349 test(teams): mock ClientOptions in adapter tests 2026-05-04 20:50:24 -07:00
helix4u
b632290166 fix(gateway): handle planned service stops 2026-05-04 16:00:49 -07:00
briandevans
9e2628ee7c test(discord): annotate make_attachment content_type as Optional[str]
Copilot review: the helper accepted None in one test but was annotated str.
Matches actual usage where no-content-type attachments are a tested scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:36:47 -07:00
Albert.Zhou
fd9c32c0f2 fix(email): drop non-allowlisted senders before dispatch to prevent mail loops
Add EMAIL_ALLOWED_USERS check in EmailAdapter._dispatch_message()
to silently discard emails from senders not in the allowlist.  This
prevents the adapter from creating thread context and dispatching a
MessageEvent for unauthorized senders, which could race with the
gateway authorization check and result in SMTP replies being sent
despite the handler returning None.

Test: tests/gateway/test_email.py::TestDispatchMessage::test_non_allowlisted_sender_dropped
Test: tests/gateway/test_email.py::TestDispatchMessage::test_allowlisted_sender_proceeds
Test: tests/gateway/test_email.py::TestDispatchMessage::test_empty_allowlist_allows_all
2026-05-04 12:35:22 -07:00
Teknium
d90f73bcec
fix(gateway): use git HEAD SHA, not file mtimes, for stale-code check (#19740)
The stale-code self-check (Issue #17648) used sentinel-file mtimes to
decide whether the gateway survived a `hermes update` with stale
`sys.modules`. That signal false-positives on any write to the
sentinel files — including agent-driven edits during Hermes-on-Hermes
dev sessions. Telling the agent to patch `run_agent.py` would flip
the check to True on the next user message and force a gateway
restart even though no update happened.

Switch the signal to `git rev-parse HEAD`. Agent file edits don't
move HEAD; `hermes update` (git pull) always does. Reading .git/HEAD
directly (no subprocess) with a 5s cache keeps the overhead negligible
on bursty chats. Non-git installs short-circuit to False — the
stale-modules class can't occur without a git-backed update path, so
there's nothing to detect.

The legacy `_compute_repo_mtime` helper is kept but unused by
detection, reserved as a fallback hook for future pip-install update
paths.

- _read_git_head_sha(): resolves HEAD across main checkout, worktree
  (follows `gitdir:` + `commondir` pointers), and packed-refs layouts.
- _current_git_sha_cached(): per-runner 5s SHA cache.
- _detect_stale_code(): boot SHA vs current SHA, returns False when
  either is unavailable.
- Tests cover all four layouts, the agent-edits-don't-trigger
  regression, and cache behavior.

Refs #17648.
2026-05-04 12:33:21 -07:00
teknium1
d35efb9898 feat(telegram): /topic off + help + auth gate + screenshot debounce
Four production-readiness additions to topic mode:

1. /topic off — clean disable path. Flips telegram_dm_topic_mode.enabled
   to 0 and clears telegram_dm_topic_bindings for this chat. Previously
   users had to edit state.db with sqlite3 to turn the feature off.
   Idempotent: calling /topic off when the chat was never enabled
   returns a friendly no-op message.

2. /topic help — inline usage printed in the DM so users don't have to
   visit docs to discover /topic off, /topic <session-id>, etc.

3. Authorization gate. /topic mutates SQLite side tables and flips the
   root DM into a lobby, so the action must be authorized. Now calls
   self._is_user_authorized(source); unauthorized DMs get a refusal
   instead of activation. Defense in depth on top of the gateway's
   existing pre-route auth.

4. BotFather screenshot debounce. A user repeatedly running /topic
   while Threads Settings is still disabled would previously re-upload
   the same screenshot every time. Now rate-limited to one send per
   5 minutes per chat. /topic off resets the counter so re-enabling
   starts fresh.

Command-def args hint updated: /topic [off|help|session-id].

Docs:
- New /topic subcommands table at the top of the multi-session section
- Disable instructions updated to recommend /topic off first, with the
  raw SQL fallback kept for bulk cleanup
- Under-the-hood list extended with the capability-hint debounce and
  the authorization gate

Tests (6 new):
- /topic help returns usage and doesn't create topic tables
- /topic off disables mode AND clears bindings
- /topic off is idempotent when never enabled
- Unauthorized users get refusal, no tables created
- Capability-hint debounce is per-chat
- /topic off resets both lobby and capability debounce counters

All 402 targeted tests pass. Full gateway sweep: 4809/4810
(pre-existing test_teams::test_send_typing unrelated).
2026-05-04 12:07:17 -07:00
teknium1
1381c89e56 fix(telegram): polish topic mode — CASCADE, General-topic handling, rename guard, debounce
Five follow-ups to topic mode based on integration audit:

1. ON DELETE CASCADE on telegram_dm_topic_bindings.session_id. Session
   pruning (manual /delete, auto-cleanup, any future prune job) would
   have thrown 'FOREIGN KEY constraint failed' for sessions bound to a
   topic. Migration bumped to v2, rebuilds the bindings table in place
   if FK lacks CASCADE. Idempotent; only runs once per DB.

2. Never auto-rename operator-declared topics. If an operator has
   extra.dm_topics configured AND a user runs /topic, messages in those
   pre-declared topics would previously trigger auto-rename and silently
   mutate operator config. _rename_telegram_topic_for_session_title now
   early-returns when _get_dm_topic_info returns a dict for this
   (chat_id, thread_id). Uses class-based lookup (not hasattr) so
   MagicMock test fixtures don't accidentally trip the guard.

3. General topic handling. Telegram's General (pinned top) topic in a
   forum-enabled private chat may send messages with message_thread_id=1
   or omit thread_id entirely depending on client. Both are now treated
   as the root lobby, not a topic lane. Prevents users from
   accidentally burning a session on the General topic.

4. Debounce the root-lobby reminder. 30-second cooldown per chat so a
   user who forgets topic mode is enabled and types ten messages in the
   root gets one reminder, not ten. Explicit command replies
   (/new-in-lobby, /topic <session-id>) still land every time.

5. Docs: added under-the-hood invariants for the above, plus a
   Downgrade section explaining that rolling back to a pre-/topic
   Hermes build leaves the DB tables orphaned but harmless — DMs just
   revert to native per-thread isolation.

Tests:
- test_operator_declared_topic_is_not_auto_renamed
- test_general_topic_is_treated_as_root_lobby
- test_lobby_reminder_is_debounced_per_chat
- test_binding_survives_session_deletion_via_cascade
- test_migration_rebuilds_v1_binding_table_with_cascade_fk

Validated: 4803/4804 tests pass (tests/gateway/ + tests/test_hermes_state.py).
Sole failure is a pre-existing test_teams::test_send_typing flake
unrelated to this PR.
2026-05-04 12:07:17 -07:00
teknium1
a7683d04a9 fix(telegram): harden DM topic binding — persist through switch_session, rebind on /new
Follow-up on @EmelyanenkoK's feat: add Telegram DM topic-mode sessions.

Three issues:

1. Split-brain session state. After get_or_create_session() returned a
   SessionEntry for a topic lane, the handler was mutating
   .session_id in place to the binding's target, but never persisting
   the switch through SessionStore. The sessions.json session_key →
   session_id map kept pointing at the lane's natural id; any reader
   that reloaded from disk saw the wrong id. Fixed by routing through
   SessionStore.switch_session(), which _save()s the mapping and ends
   the old session in SQLite like /resume does.

2. /new inside a topic was a one-message no-op. Reset created a new
   session but left the telegram_dm_topic_bindings row pointing at the
   old session_id, so the next message's binding lookup switched right
   back. Now _handle_reset_command rebinds the topic to the new
   session_id after reset.

3. is_telegram_session_linked_to_topic and
   list_unlinked_telegram_sessions_for_user both called
   apply_telegram_topic_migration() on read, contradicting the PR's
   own invariant that migration only runs on explicit /topic opt-in.
   They now tolerate missing topic tables and return empty/False.

Also: _telegram_topic_mode_enabled() now only treats True as enabled
(not any truthy return), so test fixtures with MagicMock session_db
don't accidentally flip every DM into lobby mode — this was breaking
4 pre-existing test_status_command tests.

Tests:
- New regression: /new inside a topic must update the binding row
  (test_new_inside_telegram_topic_rewrites_binding_to_new_session).
- _make_runner now stubs switch_session so existing restore tests
  still exercise the new code path.

Validated end-to-end with real SessionDB + SessionStore:
readers on fresh DB don't create topic tables; enable creates them;
binding override persists across SessionStore restart; /new rebinds
and the new id survives a restart.

Co-authored-by: EmelyanenkoK <emelyanenko.kirill@gmail.com>
2026-05-04 12:07:17 -07:00
EmelyanenkoK
25065283b3 fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
EmelyanenkoK
d6615d8ec7 feat: add Telegram DM topic-mode sessions 2026-05-04 12:07:17 -07:00
kshitijk4poor
54e78cadb2 test: add regression test for Teams interactive_setup import fix
Adapted from PR #19188 by @LeonSGP43 — mocks cli_output helpers and
verifies interactive_setup persists credentials to .env without
crashing. Also adds megastary to AUTHOR_MAP.
2026-05-04 06:54:27 -07:00
briandevans
ce22301dc6 test(sms): use clear=True in test_missing_phone_number_is_non_retryable
Prevents pre-existing TWILIO_PHONE_NUMBER or SMS_WEBHOOK_URL values in
the outer test environment from leaking into the assertion context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:25:09 -07:00
Pratik Rai
7a8ee8b29d fix(gateway): deduplicate Weixin messages by content fingerprint 2026-05-04 05:20:13 -07:00
OpenClaw Bot
0443484115 fix(qqbot): honor proxy env vars for websocket 2026-05-04 05:06:09 -07:00
Teknium
5b6d413476 fix(cli,gateway): surface title errors from /new <name>
The contributor's PR silently swallowed ValueError from
SessionDB.set_session_title() with bare except Exception: pass.
Users typing /new <title> with an already-in-use title got an
untitled session and no feedback.

Changes:
- cli.py: catch ValueError from both sanitize_title() and
  set_session_title(); print the error and mark the session
  untitled in the banner (never echo the rejected title back).
- gateway/run.py: append a warning note to the reset reply on
  title rejection; reflect the accepted title in the header.
- Add regression tests for the duplicate-title path in CLI and
  gateway.

Also map exx@example.com -> @exxmen in scripts/release.py.
2026-05-04 03:14:50 -07:00