hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-08 03:01:47 +00:00

Author	SHA1	Message	Date
paul-tian	4d4807585a	fix(gateway): honor configured goal turn budget	2026-05-07 06:31:08 -07:00
Teknium	0efc547962	fix(gateway): consolidate runtime-status writes + rate-limit failure logs Extracts the three try/write_runtime_status/except-log blocks into a shared _write_runtime_status_safe() helper. On failure, logs the first occurrence per (platform, context) at warning level and downgrades subsequent failures to debug — so a persistently broken status dir (permissions, ENOSPC) doesn't spam the log on every Telegram reconnect. Uses getattr for the _status_write_logged set so test harnesses that skip __init__ (object.__new__(Adapter)) don't break. Follow-up to the salvaged #21158.	2026-05-07 06:30:26 -07:00
wabrent	5d9061148f	fix(gateway): log platform status write failures instead of silently swallowing	2026-05-07 06:30:26 -07:00
Luciano Pacheco	f7b71aa0da	fix: use configured model for gateway auth fallback	2026-05-07 06:29:27 -07:00
Hedirman	a9ebee5f02	Fix WhatsApp long message splitting	2026-05-07 06:27:47 -07:00
Teknium	4d32f40306	fix(gateway): include exception detail in bootstrap warning output Follow-up to the salvaged warning. Without the exception string, operators see "config validation failed" with no hint why.	2026-05-07 06:26:45 -07:00
wabrent	926402dd13	fix(gateway): surface bootstrap failures to stderr instead of silently swallowing	2026-05-07 06:26:45 -07:00
wabrent	98ca0694d6	fix(gateway): log agent task failures instead of silently losing usage data	2026-05-07 06:25:03 -07:00
LeonSGP43	7244a1f0d3	fix(weixin): wrap long copy-unfriendly lines	2026-05-07 06:08:06 -07:00
LeonSGP43	31f22890ea	fix(matrix): defer reaction cleanup redactions	2026-05-07 06:05:44 -07:00
Teknium	80717a157f	fix(discord): route DM role-auth opt-in through config.yaml (not env var) Per repo policy, ~/.hermes/.env is for secrets only. Guild IDs are behavioral configuration, not secrets. Replacing the DISCORD_DM_ROLE_AUTH_GUILD env var from the original fix with discord.dm_role_auth_guild in config.yaml. - New module-level _read_dm_role_auth_guild() helper reads hermes_cli.config.read_raw_config()['discord']['dm_role_auth_guild']. Fails closed on any parse error (safe default = DM role-auth off). - DEFAULT_CONFIG['discord'] gains dm_role_auth_guild: '' with a comment documenting the opt-in. - Tests patch hermes_cli.config.read_raw_config directly (via the _set_dm_role_auth_guild helper) instead of setenv/delenv. 12 tests in test_discord_roles_dm_scope pass; no env var involvement. - Docstring + module docstring + comments updated to reference discord.dm_role_auth_guild. - E2E verified with real imports across 6 scenarios: unset, int, string, garbage, zero, and (crucially) env-var-only-no-config all return None except the valid int/string cases. Env var has zero effect — policy compliance confirmed.	2026-05-07 05:51:56 -07:00
Teknium	5c045b8f6c	fix(discord): extend role-scope fix to slash surface + fixture update Sibling-site fix: _evaluate_slash_authorization was the fourth _is_allowed_user caller and didn't pass guild/is_dm through, so slash interactions would take the DM branch regardless of whether they came from a guild channel. Now reads interaction.guild + in_dm and forwards. Also updates test_discord_slash_auth fixture (_make_interaction) so the SimpleNamespace guild mock has a get_member(uid)->None method — required by the new guild-scoped fallback path in _is_allowed_user. Tests exercising positive role paths still work via user.roles. Three new regression tests in test_discord_roles_dm_scope: - Slash DM + role in mutual public guild → rejected - Slash in guild B + role only in guild A → rejected - Slash in guild B + role in guild B → allowed (positive control) 368 Discord tests pass. test_discord_free_channel_skips_auto_thread also fails on clean main (pre-existing, unrelated to this fix).	2026-05-07 05:51:56 -07:00
0xyg3n	ef1e565570	fix(discord): scope DISCORD_ALLOWED_ROLES to originating guild (CVSS 8.1) The initial DISCORD_ALLOWED_ROLES implementation (#11608, merged from #9873) scans every mutual guild when resolving a user's roles. This allows a cross-guild DM bypass: 1. Bot is in both public server A and private server B. 2. User holds the allowed role in server A only. 3. User DMs the bot. The role check finds the role in A and authorizes the DM, granting access as if the user were trusted in server B. Fix: - DMs (no guild context) disable role-based auth by default. Opt-in via DISCORD_DM_ROLE_AUTH_GUILD=<guild_id> restricts role lookup to one explicitly-trusted guild. - Guild messages check roles only in the originating guild (message.guild), never in other mutual guilds. - Reject cached author.roles when the Member came from a different guild than the current message. Backwards compatibility: - DISCORD_ALLOWED_USERS behavior is unchanged (still works in both DMs and guild messages). - Deployments that rely on roles in guild channels continue to work; role checks are now strictly scoped to that guild. - Deployments that intentionally want role-based DM auth can opt into a single trusted guild via DISCORD_DM_ROLE_AUTH_GUILD. Tests: 9 new regression guards in tests/gateway/test_discord_roles_dm_scope.py covering the bypass path, the opt-in path, cross-guild guild-message bypass, and backwards-compat user-ID paths. 47/47 discord-auth tests pass. Refs: #11608 (initial implementation), #7871 (feature request), #9873 (PR author credit @0xyg3n)	2026-05-07 05:51:56 -07:00
altmazza0-star	8308d18339	fix(gateway): preserve max turns after env reload	2026-05-07 05:49:16 -07:00
liuhao1024	0d3593e514	fix: WhatsApp bridge process leak and disable config asymmetry - Add PID file mechanism to track bridge processes and kill stale ones on startup - Improve _kill_port_process() with lsof fallback when fuser is not available - Support explicit WhatsApp disable via config.yaml (whatsapp.enabled: false) - Respect WHATSAPP_ENABLED=false env var to disable WhatsApp Fixes #19124	2026-05-07 05:38:08 -07:00
pingchesu	43a6645718	docs: clarify API server tool execution locality	2026-05-07 05:30:37 -07:00
Teknium	fdb9e0f6a6	fix(kanban): auto-block workers that exit without completing (#20894 ) (#21214 ) When a kanban worker subprocess exits rc=0 but its task is still in status='running', the agent almost certainly answered the task conversationally without calling kanban_complete or kanban_block. The dispatcher used to classify this as a generic crash and respawn, which loops forever on small local models (gemma4-e2b q4 etc.) that keep returning clean but unproductive output. Dispatcher changes: - The waitpid reap loop at the top of dispatch_once now records each reaped child's raw exit status in a bounded module registry (_recent_worker_exits, TTL 600s, size cap 4096). - _classify_worker_exit distinguishes clean_exit / nonzero_exit / signaled / unknown using os.WIFEXITED / WIFSIGNALED. - detect_crashed_workers consults the classification when a worker is found dead. clean_exit → protocol_violation event + immediate circuit-breaker trip (failure_limit=1). Everything else keeps the existing crashed-event + counter behavior. - DispatchResult.auto_blocked now includes protocol-violation trips. Gateway fix (Bug A in #20894): - gateway.run._notify_active_sessions_of_shutdown snapshots self.adapters with list(...) before iterating. adapter.send() can hit a fatal-error path that pops the adapter from the dict, which was raising 'RuntimeError: dictionary changed size during iteration' during shutdown. Regression tests: - test_detect_crashed_workers_protocol_violation_auto_blocks verifies rc=0 + still-running → status=blocked on first occurrence with protocol_violation + gave_up events and NO crashed event. - test_detect_crashed_workers_nonzero_exit_uses_default_limit verifies non-zero exits keep the existing 2-strike behavior. Closes #20894.	2026-05-07 05:24:16 -07:00
leon7609	d34f03c32a	feat(gateway): support [[as_document]] directive for skill media routing Skills that produce large/lossless images (e.g. info-graph, where a rendered JPG is 1-2 MB) currently lose quality in Telegram delivery because `_IMAGE_EXTS` membership routes the file through `send_multiple_images` → `sendMediaGroup`, which Telegram's server re-encodes to JPEG @ 1280px max edge. The original bytes only survive when the file goes through `send_document`, which the dispatch tables in three places (`_process_message_background`, `_deliver_media_from_response`, and the `send_message` tool's telegram path) only reach for files whose extension is NOT in `_IMAGE_EXTS`. This commit adds an `[[as_document]]` directive that mirrors the existing `[[audio_as_voice]]` shape: a skill emits the directive once in its response, and every image-extension MEDIA: file in that response is delivered via `send_document` instead of `send_multiple_images` / `sendPhoto`. The directive is detected at the dispatch sites (which see the raw response) and the directive string is stripped from the user-visible cleaned text in `extract_media` so it never leaks. Granularity is intentionally all-or-nothing per response, matching [[audio_as_voice]]'s scope. Skills that need fine control can split into two responses. Verified the targeted use case: info-graph emits 信息图已生成（...） [[as_document]] MEDIA:/tmp/info-graph-x/infographic.jpg → Telegram receives `infographic.jpg` via sendDocument, original 1MB JPEG bytes preserved, no recompression. Forwarding and download filenames stay clean (`infographic.jpg`). Tests: +3 cases in TestExtractMedia covering directive strip, isolation from voice flag, and coexistence with [[audio_as_voice]]. All 113 pre-existing media/extract/send tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 05:20:10 -07:00
teknium1	333598cb0e	fix(gateway): cap cached session sources with LRU eviction Follow-up on top of Zyproth's session-source cache: swap the unbounded dict for an OrderedDict with a 512-entry LRU cap so long-running gateways can't accumulate stale entries for dead sessions forever. - self._session_sources is now an OrderedDict - _cache_session_source() move_to_end + popitem(last=False) above cap - _get_cached_session_source() move_to_end on hit (LRU read bump) - restart_test_helpers.py wires OrderedDict + _session_sources_max	2026-05-07 05:16:38 -07:00
Zyproth	176b93575a	fix(gateway): preserve thread routing from cached live session sources	2026-05-07 05:16:38 -07:00
Teknium	fb1ce793e6	feat(security): enable secret redaction by default (#17691 , #20785 ) (#21193 ) Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor already wired into send_message_tool, logs, and tool output actually runs on a fresh install. - agent/redact.py: env-var default "" → "true" - hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True; two config-template comments rewritten - gateway/run.py + cli.py: startup log / banner warning when the user has explicitly opted out, so the downgrade is visible in agent.log and at CLI banner time - docs/reference/environment-variables.md: description reconciled - tests: flipped the default-pin, restructured the force=True regression test to explicit-false instead of unset Users who need raw credential values (redactor development) can still opt out via security.redact_secrets: false in config.yaml or HERMES_REDACT_SECRETS=false in .env. Closes #17691. Addresses #20785 (short-term output-pipeline recommendation).	2026-05-07 05:10:33 -07:00
chenlinfeng	3a0d52d579	fix(weixin): replace all aiohttp ClientTimeout with asyncio.wait_for() aiohttp ClientTimeout uses BaseTimerContext which calls loop.call_later() internally. When invoked via asyncio.run_coroutine_threadsafe() from cron jobs, this triggers "Timeout context manager should be used inside a task" errors, causing message delivery failures. Replace all direct ClientTimeout usage with asyncio.wait_for(): - _upload_ciphertext: CDN upload (120s timeout) - _download_bytes: CDN download (configurable timeout) - _download_remote_media: remote media fetch (30s timeout) Also set total=None on _send_session to disable aiohttp built-in timeout, and change trust_env=True to False to bypass proxy for WeChat CDN connections.	2026-05-07 05:10:04 -07:00
Zyproth	6e8f1e09a9	fix(gateway): use monotonic deadlines in QR onboarding flows	2026-05-07 05:09:39 -07:00
thelumiereguy	8a96fa48c1	fix(gateway): avoid duplicated responses history	2026-05-07 05:07:59 -07:00
Teknium	38b1c7dce5	refactor(gateway): simplify auto-resume + extend to crash recovery Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape, wider coverage. Changes: - Drop the synthetic '[System note: ...]' in the internal MessageEvent. The existing _is_resume_pending branch in _handle_message_with_agent (run.py ~L13738) already injects a reason-aware recovery system note on the next turn. With kyan's text in place the model saw two stacked system notes. Now the event text is empty and the existing injection path owns the wording. - Drop SessionStore.list_resume_pending() as a new public method. The filter is 8 lines inline in _schedule_resume_pending_sessions() — one caller, no other pluggability need. - Add 'restart_interrupted' to the auto-resume reason set. That's the reason SessionStore.suspend_recently_active() stamps on sessions recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker). Previously those sessions had to wait for a real user message to auto-resume; now they continue automatically at startup like drain-timeout interruptions do. - Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so future reasons (e.g. 'manual_resume_request') can be opted in with one line. Test coverage added: - drain-timeout + crash-recovery both scheduled - stale entries skipped (outside freshness window) - suspended entries skipped (suspended > resume_pending) - originless entries skipped (no routing target) - disallowed reasons skipped (graceful forward-compat) E2E verified end-to-end with a real on-disk SessionStore: 2 eligible sessions scheduled, 2 ineligible skipped, empty-text internal events delivered to the adapter. Co-authored-by: Kevin Yan <kevyan1998@gmail.com>	2026-05-07 05:05:34 -07:00
Kevin Yan	961a3535fa	fix(gateway): preserve resume marker on interrupted restart	2026-05-07 05:05:34 -07:00
Kevin Yan	fad684b1f3	feat(gateway): auto-resume interrupted sessions after restart	2026-05-07 05:05:34 -07:00
mwnickerson	411cfa26e3	fix: auto-block repeated kanban retries	2026-05-07 05:05:20 -07:00
Teknium	bf843adf05	feat(gateway): opt-in cleanup of temporary progress bubbles (#21186 ) When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress) is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still working...' notices, and status-callback messages after the final response is delivered successfully. Currently effective on adapters that implement delete_message (Telegram); silently no-ops elsewhere. Off by default. Failed runs skip cleanup so bubbles stay as breadcrumbs. Minimal plumbing: base.py's existing post_delivery_callback slot now chains new registrations onto any existing callback (with per-callback exception isolation) rather than clobbering. Stale-generation registrations are rejected so they can't step on a fresher run's callbacks. This lets the cleanup callback coexist with the background-review release hook already registered on the same slot. Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>	2026-05-07 05:04:37 -07:00
ambition0802	7c0766e06a	fix(gateway): translate inbound document host paths to container paths for Docker backend When terminal.backend is docker, inbound documents uploaded via messaging platforms (Telegram, Slack, Discord, Feishu, Email, etc.) are cached at a host path under ~/.hermes/cache/documents, but the container sandbox only sees them at the auto-mounted /root/.hermes/cache/documents path. This PR adds to_agent_visible_cache_path() in tools/credential_files.py (the natural sibling to get_cache_directory_mounts()) and calls it at the document-context-injection site in gateway/run.py so the agent always receives a path it can open directly, matching the mount layout already established by get_cache_directory_mounts() (#4846). Scope: only Docker backend for now; other backends use different mount semantics and are left unchanged until verified. Fixes #18787	2026-05-07 05:02:26 -07:00
mrcoferland	bd0c54d171	fix: route Telegram image documents through photo handling	2026-05-07 04:51:46 -07:00
Teknium	5a3cadf6eb	fix(discord): narrow rate-limit catch and move sync state under gateway/ Two follow-ups on top of helix4u's slash-command sync hardening: - Only suppress exceptions that are actually Discord 429 rate limits (discord.RateLimited, HTTPException with status 429, or a clearly rate-limit-named duck type). Arbitrary failures that happen to expose a retry_after attribute now re-raise to the outer handler instead of silently swallowing a cooldown. - Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root stops collecting ad-hoc runtime files. Added a test verifying unrelated exceptions don't get misclassified as rate limits.	2026-05-06 18:12:35 -07:00
helix4u	d797755a1c	fix(gateway): wait for systemd restart readiness	2026-05-06 18:12:35 -07:00
Guillaume Meyer	7df6115199	feat(gateway): also gate pre-restart "Gateway restarting" notification Extend the gateway_restart_notification flag to cover _notify_active_sessions_of_shutdown — the message that fires just before drain ("⚠️ Gateway restarting — Your current task will be interrupted. Send any message after restart and I'll try to resume where you left off.") sent to active sessions and home channels. Same operator/end-user reasoning: on a Slack workspace shared with end users, "Gateway restarting" reads as "the bot is broken" — the operator should be able to suppress it consistently with the other two lifecycle pings rather than having a partial opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 13:39:43 -07:00
Guillaume Meyer	b71f80e6ce	feat(gateway): per-platform gateway_restart_notification flag Adds an opt-out toggle on PlatformConfig that gates both restart lifecycle pings: the "♻ Gateway restarted" message sent to the chat that issued /restart, and the "♻️ Gateway online" home-channel startup notification. Defaults to True so existing deployments are unaffected. The motivating split is operator vs. end-user surfaces: a back-channel like Telegram should keep these pings, while a Slack workspace shared with end users should not surface gateway lifecycle noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 13:39:43 -07:00
kshitijk4poor	28299afc21	chore: follow-up cleanup for Feishu topic thread fix - Remove dead metadata.get('reply_to') fallback in _send_raw_message; nothing in the codebase ever sets 'reply_to' inside a metadata dict — the key only appears as a top-level send_voice() keyword argument - Simplify _status_thread_metadata construction in run.py to use a single dict literal instead of create-then-mutate pattern; the or-{} guard was dead since source.thread_id implies _progress_thread_id is also set for Feishu - Add yuqian@zmetasoft.com to AUTHOR_MAP for contributor attribution	2026-05-06 10:52:51 -07:00
Yuqian	441ef75d15	fix(feishu): keep topic replies in threads Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.	2026-05-06 10:52:51 -07:00
Teknium	a0fedfbb1b	feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709 ) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, .so/.dylib/.dll, .mp4/.mov, .zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.	2026-05-06 05:44:35 -07:00
kshitijk4poor	aa88dcc57b	fix: salvage batch — compaction guidance, memory authority, cache eviction after compression - Fix /compact → /compress in context-overflow tips (closes #20020) - Evict cached agent after session hygiene and /compress so system prompt refreshes with current SOUL.md, memory, and skills - Restore memory authority across compaction: change 'informational background data' to 'authoritative reference data' in memory block and SUMMARY_PREFIX, with backward-compatible regex Based on: - PR #20027 by @LeonSGP43 - PR #18767 by @MacroAnarchy - PR #17380 by @vominh1919 PR #17121 boundary marker fix already merged to main (`2eef395e1`). PR #9262 user-message anchoring already on main via _ensure_last_user_message_in_tail().	2026-05-05 22:33:45 -07:00
bogerman1	3188e63b05	fix(api_server): SSE token batching + error handling for Open WebUI performance Reduces SSE event rate ~500/turn → ~20/turn via 50ms text-delta batching in _dispatch(), which eliminates markdown re-render storms on Open WebUI. Also: - Trim tool_call.arguments in the response.completed event to 100KB (prevents silent hangs on 848KB+ single-line SSE events). - Catch-all exception handlers in _write_sse_responses() + _write_sse_chat_completion() emit a proper error chunk instead of TransferEncodingError from incomplete chunked encoding when the agent crashes mid-stream. - MAX_REQUEST_BYTES 1MB → 10MB; pass client_max_size to aiohttp Application to avoid silent 400s on truncated request bodies for long conversations. Salvage of #17552 (api_server portion only). The contrib/openwebui-filter/ payload from that PR — Open WebUI Filter Function + benchmark writeup — is a client-side user-installable add-on and doesn't need to live in the repo; dropped here. Closes #17537. Co-authored-by: bogerman1 <93757150+bogerman1@users.noreply.github.com>	2026-05-05 15:13:36 -07:00
Moonyeah	f0d278412f	feat(gateway): respect kanban.max_spawn config to limit concurrent tasks The dispatch_once function already accepts a max_spawn parameter but the gateway was calling it without passing any value, effectively ignoring the configuration. This change reads kanban.max_spawn from config.yaml and passes it through, allowing users to limit concurrent kanban tasks. This prevents resource exhaustion scenarios where kanban dispatcher spawns too many parallel workers on constrained hardware.	2026-05-05 15:09:28 -07:00
Michel Belleau	5f8e59b0f1	docs(discord): fix Server Members Intent + SSRC-mapping drift; add /voice join slash Choice Salvage of #11350. Kept: - Code: add an explicit /voice join Choice in the slash UI (runner accepts both 'join' and 'channel' but only 'channel' was in autocomplete). - Docs: Server Members Intent is conditional (only needed if DISCORD_ALLOWED_USERS contains usernames); SSRC → user_id mapping uses the voice websocket SPEAKING opcode, not the Members intent. Dropped from the original PR: - HERMES_DISCORD_VOICE_PACKET_DUMP — this env var doesn't exist on main (it was in a different PR that isn't merged). - DISCORD_PROXY docs — already documented on current main. - DISCORD_ALLOW_MENTION_* docs — already on main. - "barge-in mode" rewrite — current main actually does pause the listener during TTS (VoiceReceiver.pause() at discord.py:192); there is no barge_in_guard/barge_in_rms on main. Co-authored-by: Michel Belleau <michel.belleau@malaiwah.com>	2026-05-05 13:50:43 -07:00
Teknium	d5357f816d	refactor(telegram): make typing thread-id resolver symmetric with send Mirror _message_thread_id_for_typing() with _message_thread_id_for_send(): both now map the General forum topic (thread id "1") to None upfront. That removes the need for the retry-without-thread fallback in send_typing() entirely — if _message_thread_id_for_typing() returns a non-None value, it's a real user-created topic and falling back to the root chat is never correct. If Telegram rejects the typing action (e.g. topic deleted mid-session), we swallow it at debug level instead of bleeding the indicator into All Messages. Updates the General-topic typing regression test to assert the new single-call contract.	2026-05-05 13:28:08 -07:00
helix4u	41545f7ec5	fix(telegram): keep DM topic typing scoped	2026-05-05 13:28:08 -07:00
Siddharth Balyan	3b750715a3	fix: resolve lazy session creation regressions (#18370 fallout) (#20363 ) Fix three regressions introduced by PR #18370 (lazy session creation): 1. _finalize_session() uses stale session_key after compression (#20001) 2. session_key not synced after auto-compression in run_conversation (#20001) 3. pending_title ValueError leaves title wedged forever (#19029) 4. Gateway silently swallows null responses when agent did work (#18765) 5. One-time cleanup for accumulated ghost compression continuations (#20001) Changes: - tui_gateway/server.py: _finalize_session() now uses agent.session_id (falls back to session_key when agent is None). Refactor _sync_session_key_after_compress() with clear_pending_title and restart_slash_worker policy flags. Call it post-run_conversation() to sync session_key after auto-compression. Add ValueError handler to pending_title flush. - gateway/run.py: Extract _normalize_empty_agent_response() helper that consolidates failed/partial/null response handling. Surfaces user-facing error when agent did work (api_calls > 0) but returned no text. - hermes_state.py: Add finalize_orphaned_compression_sessions() — marks ghost continuation sessions as ended (non-destructive, preserves data). - cli.py: One-time startup migration for orphaned compression sessions. Test changes: - tests/test_tui_gateway_server.py: Update pending_title ValueError test for post-#18370 architecture (title applied post-message, not at create). - tests/test_lazy_session_regressions.py: 14 new regression tests covering all fixed paths.	2026-05-06 01:11:49 +05:30
Traemond Anderson	60235dba5e	feat(cli): add list_picker_providers for credential-filtered picker The Telegram/Discord /model pickers currently call list_authenticated_providers(), which returns every provider whose credentials resolve locally and every model in its curated snapshot. Two failure modes fall out: - OpenRouter rows can include IDs the live catalog no longer carries. - Provider rows can surface with zero callable models (e.g. a slug whose credential pool entry exists but has nothing behind it). list_picker_providers() wraps the base function and post-processes the result so the interactive picker only shows models the user can actually select: - OpenRouter's models come from fetch_openrouter_models() (live-catalog filtered against the curated OPENROUTER_MODELS snapshot). - Rows with an empty models list are dropped, except custom endpoints (is_user_defined=True with an api_url) where the user may enter model ids manually. - All other fields pass through unchanged. The gateway /model handler switches to the new helper for the interactive picker payload only. Typed /model <name> and the text fallback list stay on list_authenticated_providers() so nothing is hidden from power users or platforms without a picker. Covered by nine focused unit tests in tests/hermes_cli/test_list_picker_providers.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:18:58 -07:00
Es1la	a877c3f6d9	fix(feishu): tolerate malformed dedup timestamps Salvages @Es1la's PR #13632 — a non-numeric timestamp in the persisted feishu dedup state crashed adapter startup with ValueError/TypeError from the unguarded float() call. Wrap the float() conversion in try/except; skip the bad key and keep loading the rest. The original PR also restructured existing TestDedupTTL tests to use tempfile.TemporaryDirectory + HERMES_HOME patching — that was test-hygiene scope creep unrelated to the bug. Kept only the malformed-timestamp fix and added a focused regression test.	2026-05-05 10:15:09 -07:00
hharry11	247c9d468c	fix(gateway): ensure deterministic thread eviction in helpers	2026-05-05 10:13:55 -07:00
sprmn24	ecc909de38	fix(session): serialize JSONL transcript appends under existing lock	2026-05-05 09:57:31 -07:00
WuTianyi	8e18d10318	fix(feishu): force text mode for markdown tables Feishu post-type 'md' elements do not render markdown tables. When table content is sent as post (triggered by bold matching _MARKDOWN_HINT_RE), the message appears blank on the client. Add _MARKDOWN_TABLE_RE to detect markdown table syntax and force text mode for table content, ensuring it is visible as plain text.	2026-05-05 09:57:14 -07:00

1 2 3 4 5 ...

1479 commits