hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-08 03:01:47 +00:00

Author	SHA1	Message	Date
0xyg3n	ef1e565570	fix(discord): scope DISCORD_ALLOWED_ROLES to originating guild (CVSS 8.1) The initial DISCORD_ALLOWED_ROLES implementation (#11608, merged from #9873) scans every mutual guild when resolving a user's roles. This allows a cross-guild DM bypass: 1. Bot is in both public server A and private server B. 2. User holds the allowed role in server A only. 3. User DMs the bot. The role check finds the role in A and authorizes the DM, granting access as if the user were trusted in server B. Fix: - DMs (no guild context) disable role-based auth by default. Opt-in via DISCORD_DM_ROLE_AUTH_GUILD=<guild_id> restricts role lookup to one explicitly-trusted guild. - Guild messages check roles only in the originating guild (message.guild), never in other mutual guilds. - Reject cached author.roles when the Member came from a different guild than the current message. Backwards compatibility: - DISCORD_ALLOWED_USERS behavior is unchanged (still works in both DMs and guild messages). - Deployments that rely on roles in guild channels continue to work; role checks are now strictly scoped to that guild. - Deployments that intentionally want role-based DM auth can opt into a single trusted guild via DISCORD_DM_ROLE_AUTH_GUILD. Tests: 9 new regression guards in tests/gateway/test_discord_roles_dm_scope.py covering the bypass path, the opt-in path, cross-guild guild-message bypass, and backwards-compat user-ID paths. 47/47 discord-auth tests pass. Refs: #11608 (initial implementation), #7871 (feature request), #9873 (PR author credit @0xyg3n)	2026-05-07 05:51:56 -07:00
altmazza0-star	8308d18339	fix(gateway): preserve max turns after env reload	2026-05-07 05:49:16 -07:00
liuhao1024	0d3593e514	fix: WhatsApp bridge process leak and disable config asymmetry - Add PID file mechanism to track bridge processes and kill stale ones on startup - Improve _kill_port_process() with lsof fallback when fuser is not available - Support explicit WhatsApp disable via config.yaml (whatsapp.enabled: false) - Respect WHATSAPP_ENABLED=false env var to disable WhatsApp Fixes #19124	2026-05-07 05:38:08 -07:00
pingchesu	43a6645718	docs: clarify API server tool execution locality	2026-05-07 05:30:37 -07:00
Teknium	fdb9e0f6a6	fix(kanban): auto-block workers that exit without completing (#20894 ) (#21214 ) When a kanban worker subprocess exits rc=0 but its task is still in status='running', the agent almost certainly answered the task conversationally without calling kanban_complete or kanban_block. The dispatcher used to classify this as a generic crash and respawn, which loops forever on small local models (gemma4-e2b q4 etc.) that keep returning clean but unproductive output. Dispatcher changes: - The waitpid reap loop at the top of dispatch_once now records each reaped child's raw exit status in a bounded module registry (_recent_worker_exits, TTL 600s, size cap 4096). - _classify_worker_exit distinguishes clean_exit / nonzero_exit / signaled / unknown using os.WIFEXITED / WIFSIGNALED. - detect_crashed_workers consults the classification when a worker is found dead. clean_exit → protocol_violation event + immediate circuit-breaker trip (failure_limit=1). Everything else keeps the existing crashed-event + counter behavior. - DispatchResult.auto_blocked now includes protocol-violation trips. Gateway fix (Bug A in #20894): - gateway.run._notify_active_sessions_of_shutdown snapshots self.adapters with list(...) before iterating. adapter.send() can hit a fatal-error path that pops the adapter from the dict, which was raising 'RuntimeError: dictionary changed size during iteration' during shutdown. Regression tests: - test_detect_crashed_workers_protocol_violation_auto_blocks verifies rc=0 + still-running → status=blocked on first occurrence with protocol_violation + gave_up events and NO crashed event. - test_detect_crashed_workers_nonzero_exit_uses_default_limit verifies non-zero exits keep the existing 2-strike behavior. Closes #20894.	2026-05-07 05:24:16 -07:00
leon7609	d34f03c32a	feat(gateway): support [[as_document]] directive for skill media routing Skills that produce large/lossless images (e.g. info-graph, where a rendered JPG is 1-2 MB) currently lose quality in Telegram delivery because `_IMAGE_EXTS` membership routes the file through `send_multiple_images` → `sendMediaGroup`, which Telegram's server re-encodes to JPEG @ 1280px max edge. The original bytes only survive when the file goes through `send_document`, which the dispatch tables in three places (`_process_message_background`, `_deliver_media_from_response`, and the `send_message` tool's telegram path) only reach for files whose extension is NOT in `_IMAGE_EXTS`. This commit adds an `[[as_document]]` directive that mirrors the existing `[[audio_as_voice]]` shape: a skill emits the directive once in its response, and every image-extension MEDIA: file in that response is delivered via `send_document` instead of `send_multiple_images` / `sendPhoto`. The directive is detected at the dispatch sites (which see the raw response) and the directive string is stripped from the user-visible cleaned text in `extract_media` so it never leaks. Granularity is intentionally all-or-nothing per response, matching [[audio_as_voice]]'s scope. Skills that need fine control can split into two responses. Verified the targeted use case: info-graph emits 信息图已生成（...） [[as_document]] MEDIA:/tmp/info-graph-x/infographic.jpg → Telegram receives `infographic.jpg` via sendDocument, original 1MB JPEG bytes preserved, no recompression. Forwarding and download filenames stay clean (`infographic.jpg`). Tests: +3 cases in TestExtractMedia covering directive strip, isolation from voice flag, and coexistence with [[audio_as_voice]]. All 113 pre-existing media/extract/send tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 05:20:10 -07:00
teknium1	333598cb0e	fix(gateway): cap cached session sources with LRU eviction Follow-up on top of Zyproth's session-source cache: swap the unbounded dict for an OrderedDict with a 512-entry LRU cap so long-running gateways can't accumulate stale entries for dead sessions forever. - self._session_sources is now an OrderedDict - _cache_session_source() move_to_end + popitem(last=False) above cap - _get_cached_session_source() move_to_end on hit (LRU read bump) - restart_test_helpers.py wires OrderedDict + _session_sources_max	2026-05-07 05:16:38 -07:00
Zyproth	176b93575a	fix(gateway): preserve thread routing from cached live session sources	2026-05-07 05:16:38 -07:00
Teknium	fb1ce793e6	feat(security): enable secret redaction by default (#17691 , #20785 ) (#21193 ) Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor already wired into send_message_tool, logs, and tool output actually runs on a fresh install. - agent/redact.py: env-var default "" → "true" - hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True; two config-template comments rewritten - gateway/run.py + cli.py: startup log / banner warning when the user has explicitly opted out, so the downgrade is visible in agent.log and at CLI banner time - docs/reference/environment-variables.md: description reconciled - tests: flipped the default-pin, restructured the force=True regression test to explicit-false instead of unset Users who need raw credential values (redactor development) can still opt out via security.redact_secrets: false in config.yaml or HERMES_REDACT_SECRETS=false in .env. Closes #17691. Addresses #20785 (short-term output-pipeline recommendation).	2026-05-07 05:10:33 -07:00
chenlinfeng	3a0d52d579	fix(weixin): replace all aiohttp ClientTimeout with asyncio.wait_for() aiohttp ClientTimeout uses BaseTimerContext which calls loop.call_later() internally. When invoked via asyncio.run_coroutine_threadsafe() from cron jobs, this triggers "Timeout context manager should be used inside a task" errors, causing message delivery failures. Replace all direct ClientTimeout usage with asyncio.wait_for(): - _upload_ciphertext: CDN upload (120s timeout) - _download_bytes: CDN download (configurable timeout) - _download_remote_media: remote media fetch (30s timeout) Also set total=None on _send_session to disable aiohttp built-in timeout, and change trust_env=True to False to bypass proxy for WeChat CDN connections.	2026-05-07 05:10:04 -07:00
Zyproth	6e8f1e09a9	fix(gateway): use monotonic deadlines in QR onboarding flows	2026-05-07 05:09:39 -07:00
thelumiereguy	8a96fa48c1	fix(gateway): avoid duplicated responses history	2026-05-07 05:07:59 -07:00
Teknium	38b1c7dce5	refactor(gateway): simplify auto-resume + extend to crash recovery Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape, wider coverage. Changes: - Drop the synthetic '[System note: ...]' in the internal MessageEvent. The existing _is_resume_pending branch in _handle_message_with_agent (run.py ~L13738) already injects a reason-aware recovery system note on the next turn. With kyan's text in place the model saw two stacked system notes. Now the event text is empty and the existing injection path owns the wording. - Drop SessionStore.list_resume_pending() as a new public method. The filter is 8 lines inline in _schedule_resume_pending_sessions() — one caller, no other pluggability need. - Add 'restart_interrupted' to the auto-resume reason set. That's the reason SessionStore.suspend_recently_active() stamps on sessions recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker). Previously those sessions had to wait for a real user message to auto-resume; now they continue automatically at startup like drain-timeout interruptions do. - Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so future reasons (e.g. 'manual_resume_request') can be opted in with one line. Test coverage added: - drain-timeout + crash-recovery both scheduled - stale entries skipped (outside freshness window) - suspended entries skipped (suspended > resume_pending) - originless entries skipped (no routing target) - disallowed reasons skipped (graceful forward-compat) E2E verified end-to-end with a real on-disk SessionStore: 2 eligible sessions scheduled, 2 ineligible skipped, empty-text internal events delivered to the adapter. Co-authored-by: Kevin Yan <kevyan1998@gmail.com>	2026-05-07 05:05:34 -07:00
Kevin Yan	961a3535fa	fix(gateway): preserve resume marker on interrupted restart	2026-05-07 05:05:34 -07:00
Kevin Yan	fad684b1f3	feat(gateway): auto-resume interrupted sessions after restart	2026-05-07 05:05:34 -07:00
mwnickerson	411cfa26e3	fix: auto-block repeated kanban retries	2026-05-07 05:05:20 -07:00
Teknium	bf843adf05	feat(gateway): opt-in cleanup of temporary progress bubbles (#21186 ) When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress) is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still working...' notices, and status-callback messages after the final response is delivered successfully. Currently effective on adapters that implement delete_message (Telegram); silently no-ops elsewhere. Off by default. Failed runs skip cleanup so bubbles stay as breadcrumbs. Minimal plumbing: base.py's existing post_delivery_callback slot now chains new registrations onto any existing callback (with per-callback exception isolation) rather than clobbering. Stale-generation registrations are rejected so they can't step on a fresher run's callbacks. This lets the cleanup callback coexist with the background-review release hook already registered on the same slot. Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>	2026-05-07 05:04:37 -07:00
ambition0802	7c0766e06a	fix(gateway): translate inbound document host paths to container paths for Docker backend When terminal.backend is docker, inbound documents uploaded via messaging platforms (Telegram, Slack, Discord, Feishu, Email, etc.) are cached at a host path under ~/.hermes/cache/documents, but the container sandbox only sees them at the auto-mounted /root/.hermes/cache/documents path. This PR adds to_agent_visible_cache_path() in tools/credential_files.py (the natural sibling to get_cache_directory_mounts()) and calls it at the document-context-injection site in gateway/run.py so the agent always receives a path it can open directly, matching the mount layout already established by get_cache_directory_mounts() (#4846). Scope: only Docker backend for now; other backends use different mount semantics and are left unchanged until verified. Fixes #18787	2026-05-07 05:02:26 -07:00
mrcoferland	bd0c54d171	fix: route Telegram image documents through photo handling	2026-05-07 04:51:46 -07:00
Teknium	5a3cadf6eb	fix(discord): narrow rate-limit catch and move sync state under gateway/ Two follow-ups on top of helix4u's slash-command sync hardening: - Only suppress exceptions that are actually Discord 429 rate limits (discord.RateLimited, HTTPException with status 429, or a clearly rate-limit-named duck type). Arbitrary failures that happen to expose a retry_after attribute now re-raise to the outer handler instead of silently swallowing a cooldown. - Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root stops collecting ad-hoc runtime files. Added a test verifying unrelated exceptions don't get misclassified as rate limits.	2026-05-06 18:12:35 -07:00
helix4u	d797755a1c	fix(gateway): wait for systemd restart readiness	2026-05-06 18:12:35 -07:00
Guillaume Meyer	7df6115199	feat(gateway): also gate pre-restart "Gateway restarting" notification Extend the gateway_restart_notification flag to cover _notify_active_sessions_of_shutdown — the message that fires just before drain ("⚠️ Gateway restarting — Your current task will be interrupted. Send any message after restart and I'll try to resume where you left off.") sent to active sessions and home channels. Same operator/end-user reasoning: on a Slack workspace shared with end users, "Gateway restarting" reads as "the bot is broken" — the operator should be able to suppress it consistently with the other two lifecycle pings rather than having a partial opt-out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 13:39:43 -07:00
Guillaume Meyer	b71f80e6ce	feat(gateway): per-platform gateway_restart_notification flag Adds an opt-out toggle on PlatformConfig that gates both restart lifecycle pings: the "♻ Gateway restarted" message sent to the chat that issued /restart, and the "♻️ Gateway online" home-channel startup notification. Defaults to True so existing deployments are unaffected. The motivating split is operator vs. end-user surfaces: a back-channel like Telegram should keep these pings, while a Slack workspace shared with end users should not surface gateway lifecycle noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 13:39:43 -07:00
kshitijk4poor	28299afc21	chore: follow-up cleanup for Feishu topic thread fix - Remove dead metadata.get('reply_to') fallback in _send_raw_message; nothing in the codebase ever sets 'reply_to' inside a metadata dict — the key only appears as a top-level send_voice() keyword argument - Simplify _status_thread_metadata construction in run.py to use a single dict literal instead of create-then-mutate pattern; the or-{} guard was dead since source.thread_id implies _progress_thread_id is also set for Feishu - Add yuqian@zmetasoft.com to AUTHOR_MAP for contributor attribution	2026-05-06 10:52:51 -07:00
Yuqian	441ef75d15	fix(feishu): keep topic replies in threads Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.	2026-05-06 10:52:51 -07:00
Teknium	a0fedfbb1b	feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709 ) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, .so/.dylib/.dll, .mp4/.mov, .zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.	2026-05-06 05:44:35 -07:00
kshitijk4poor	aa88dcc57b	fix: salvage batch — compaction guidance, memory authority, cache eviction after compression - Fix /compact → /compress in context-overflow tips (closes #20020) - Evict cached agent after session hygiene and /compress so system prompt refreshes with current SOUL.md, memory, and skills - Restore memory authority across compaction: change 'informational background data' to 'authoritative reference data' in memory block and SUMMARY_PREFIX, with backward-compatible regex Based on: - PR #20027 by @LeonSGP43 - PR #18767 by @MacroAnarchy - PR #17380 by @vominh1919 PR #17121 boundary marker fix already merged to main (`2eef395e1`). PR #9262 user-message anchoring already on main via _ensure_last_user_message_in_tail().	2026-05-05 22:33:45 -07:00
bogerman1	3188e63b05	fix(api_server): SSE token batching + error handling for Open WebUI performance Reduces SSE event rate ~500/turn → ~20/turn via 50ms text-delta batching in _dispatch(), which eliminates markdown re-render storms on Open WebUI. Also: - Trim tool_call.arguments in the response.completed event to 100KB (prevents silent hangs on 848KB+ single-line SSE events). - Catch-all exception handlers in _write_sse_responses() + _write_sse_chat_completion() emit a proper error chunk instead of TransferEncodingError from incomplete chunked encoding when the agent crashes mid-stream. - MAX_REQUEST_BYTES 1MB → 10MB; pass client_max_size to aiohttp Application to avoid silent 400s on truncated request bodies for long conversations. Salvage of #17552 (api_server portion only). The contrib/openwebui-filter/ payload from that PR — Open WebUI Filter Function + benchmark writeup — is a client-side user-installable add-on and doesn't need to live in the repo; dropped here. Closes #17537. Co-authored-by: bogerman1 <93757150+bogerman1@users.noreply.github.com>	2026-05-05 15:13:36 -07:00
Moonyeah	f0d278412f	feat(gateway): respect kanban.max_spawn config to limit concurrent tasks The dispatch_once function already accepts a max_spawn parameter but the gateway was calling it without passing any value, effectively ignoring the configuration. This change reads kanban.max_spawn from config.yaml and passes it through, allowing users to limit concurrent kanban tasks. This prevents resource exhaustion scenarios where kanban dispatcher spawns too many parallel workers on constrained hardware.	2026-05-05 15:09:28 -07:00
Michel Belleau	5f8e59b0f1	docs(discord): fix Server Members Intent + SSRC-mapping drift; add /voice join slash Choice Salvage of #11350. Kept: - Code: add an explicit /voice join Choice in the slash UI (runner accepts both 'join' and 'channel' but only 'channel' was in autocomplete). - Docs: Server Members Intent is conditional (only needed if DISCORD_ALLOWED_USERS contains usernames); SSRC → user_id mapping uses the voice websocket SPEAKING opcode, not the Members intent. Dropped from the original PR: - HERMES_DISCORD_VOICE_PACKET_DUMP — this env var doesn't exist on main (it was in a different PR that isn't merged). - DISCORD_PROXY docs — already documented on current main. - DISCORD_ALLOW_MENTION_* docs — already on main. - "barge-in mode" rewrite — current main actually does pause the listener during TTS (VoiceReceiver.pause() at discord.py:192); there is no barge_in_guard/barge_in_rms on main. Co-authored-by: Michel Belleau <michel.belleau@malaiwah.com>	2026-05-05 13:50:43 -07:00
Teknium	d5357f816d	refactor(telegram): make typing thread-id resolver symmetric with send Mirror _message_thread_id_for_typing() with _message_thread_id_for_send(): both now map the General forum topic (thread id "1") to None upfront. That removes the need for the retry-without-thread fallback in send_typing() entirely — if _message_thread_id_for_typing() returns a non-None value, it's a real user-created topic and falling back to the root chat is never correct. If Telegram rejects the typing action (e.g. topic deleted mid-session), we swallow it at debug level instead of bleeding the indicator into All Messages. Updates the General-topic typing regression test to assert the new single-call contract.	2026-05-05 13:28:08 -07:00
helix4u	41545f7ec5	fix(telegram): keep DM topic typing scoped	2026-05-05 13:28:08 -07:00
Siddharth Balyan	3b750715a3	fix: resolve lazy session creation regressions (#18370 fallout) (#20363 ) Fix three regressions introduced by PR #18370 (lazy session creation): 1. _finalize_session() uses stale session_key after compression (#20001) 2. session_key not synced after auto-compression in run_conversation (#20001) 3. pending_title ValueError leaves title wedged forever (#19029) 4. Gateway silently swallows null responses when agent did work (#18765) 5. One-time cleanup for accumulated ghost compression continuations (#20001) Changes: - tui_gateway/server.py: _finalize_session() now uses agent.session_id (falls back to session_key when agent is None). Refactor _sync_session_key_after_compress() with clear_pending_title and restart_slash_worker policy flags. Call it post-run_conversation() to sync session_key after auto-compression. Add ValueError handler to pending_title flush. - gateway/run.py: Extract _normalize_empty_agent_response() helper that consolidates failed/partial/null response handling. Surfaces user-facing error when agent did work (api_calls > 0) but returned no text. - hermes_state.py: Add finalize_orphaned_compression_sessions() — marks ghost continuation sessions as ended (non-destructive, preserves data). - cli.py: One-time startup migration for orphaned compression sessions. Test changes: - tests/test_tui_gateway_server.py: Update pending_title ValueError test for post-#18370 architecture (title applied post-message, not at create). - tests/test_lazy_session_regressions.py: 14 new regression tests covering all fixed paths.	2026-05-06 01:11:49 +05:30
Traemond Anderson	60235dba5e	feat(cli): add list_picker_providers for credential-filtered picker The Telegram/Discord /model pickers currently call list_authenticated_providers(), which returns every provider whose credentials resolve locally and every model in its curated snapshot. Two failure modes fall out: - OpenRouter rows can include IDs the live catalog no longer carries. - Provider rows can surface with zero callable models (e.g. a slug whose credential pool entry exists but has nothing behind it). list_picker_providers() wraps the base function and post-processes the result so the interactive picker only shows models the user can actually select: - OpenRouter's models come from fetch_openrouter_models() (live-catalog filtered against the curated OPENROUTER_MODELS snapshot). - Rows with an empty models list are dropped, except custom endpoints (is_user_defined=True with an api_url) where the user may enter model ids manually. - All other fields pass through unchanged. The gateway /model handler switches to the new helper for the interactive picker payload only. Typed /model <name> and the text fallback list stay on list_authenticated_providers() so nothing is hidden from power users or platforms without a picker. Covered by nine focused unit tests in tests/hermes_cli/test_list_picker_providers.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:18:58 -07:00
Es1la	a877c3f6d9	fix(feishu): tolerate malformed dedup timestamps Salvages @Es1la's PR #13632 — a non-numeric timestamp in the persisted feishu dedup state crashed adapter startup with ValueError/TypeError from the unguarded float() call. Wrap the float() conversion in try/except; skip the bad key and keep loading the rest. The original PR also restructured existing TestDedupTTL tests to use tempfile.TemporaryDirectory + HERMES_HOME patching — that was test-hygiene scope creep unrelated to the bug. Kept only the malformed-timestamp fix and added a focused regression test.	2026-05-05 10:15:09 -07:00
hharry11	247c9d468c	fix(gateway): ensure deterministic thread eviction in helpers	2026-05-05 10:13:55 -07:00
sprmn24	ecc909de38	fix(session): serialize JSONL transcript appends under existing lock	2026-05-05 09:57:31 -07:00
WuTianyi	8e18d10318	fix(feishu): force text mode for markdown tables Feishu post-type 'md' elements do not render markdown tables. When table content is sent as post (triggered by bold matching _MARKDOWN_HINT_RE), the message appears blank on the client. Add _MARKDOWN_TABLE_RE to detect markdown table syntax and force text mode for table content, ensuring it is visible as plain text.	2026-05-05 09:57:14 -07:00
Teknium	b014a3d315	test(cron): update _isolate_tick_lock fixture for _get_lock_paths After PR #13725 replaced the module-level _LOCK_DIR/_LOCK_FILE constants with a dynamic _get_lock_paths() helper, the xdist-isolation fixture needs to patch the function instead of the removed constants.	2026-05-05 09:57:06 -07:00
邓taoyuan	969bfff449	fix: merge _get_hermes_home() dynamic resolution and feishu receive_id_type detection - scheduler.py: Replace static _hermes_home with dynamic _get_hermes_home() function to support profile switching at runtime (HERMES_HOME override) - scheduler.py: Replace static _LOCK_DIR/_LOCK_FILE with _get_lock_paths() function for profile-aware lock path resolution - feishu.py: Add receive_id_type detection (oc_/ou_ -> open_id, else chat_id) to fix Feishu API '[230001] ext=invalid receive_id' error for user DMs	2026-05-05 09:57:06 -07:00
Teknium	7de3c86c5a	feat(i18n): add display.language for static message translation (zh/ja/de/es) (#20231 ) * revert(gateway): remove stale-code self-check and auto-restart Removes the _detect_stale_code / _trigger_stale_code_restart mechanism introduced in #17648 and iterated in #19740. On every incoming message the gateway compared the boot-time git HEAD SHA to the current SHA on disk, and if they differed it would reply with Gateway code was updated in the background -- restarting this gateway so your next message runs on the new code. Please retry in a moment. and then kick off a graceful restart. This is unwanted behaviour: users who run a long-lived gateway and do their own ad-hoc git operations on the checkout end up with their chat interrupted and the current message dropped every time HEAD moves, with no way to opt out. If an operator really needs the old protection against stale sys.modules after "hermes update", the SIGKILL-survivor sweep in hermes update (hermes_cli/main.py, also tagged #17648) already handles the supervisor-respawn case on its own. Removed: gateway/run.py: - _STALE_CODE_SENTINELS, _GIT_SHA_CACHE_TTL_SECS - _read_git_head_sha(), _compute_repo_mtime() module helpers - class-level _boot_wall_time / _boot_repo_mtime / _boot_git_sha / _stale_code_restart_triggered defaults - __init__ boot-snapshot block (_boot_, _cached_current_sha, _repo_root_for_staleness, _stale_code_notified) - _current_git_sha_cached(), _detect_stale_code(), _trigger_stale_code_restart() methods - stale-code check + user-facing restart notice at the top of _handle_message() tests/gateway/test_stale_code_self_check.py (deleted, 412 lines) No new logic added. Zero remaining references to any removed symbol. Gateway test suite passes the same 4589 tests it passed before; the 3 pre-existing unrelated failures (discord free-channel, feishu bot admission, teams typing) are unchanged by this commit. * feat(i18n): add display.language for static message translation (zh/ja/de/es) Adds a thin-slice i18n layer covering the highest-impact static user-facing messages: the CLI dangerous-command approval prompt and a handful of gateway slash-command replies (restart-drain, goal cleared, approval expired, config read/save errors). Out of scope (stays English): agent responses, log lines, tool outputs, slash-command descriptions, error tracebacks. Infrastructure: - agent/i18n.py: catalog loader, t() helper, language resolution (HERMES_LANGUAGE env var > display.language config > en) - locales/{en,zh,ja,de,es}.yaml: ~19 translated strings per language - display.language in DEFAULT_CONFIG (hermes_cli/config.py) Tests: - tests/agent/test_i18n.py: 21 tests covering catalog parity, placeholder parity across locales, fallback behavior, env-var override, alias normalization, missing-key graceful degradation. Docs: - website/docs/user-guide/configuration.md: display.language entry plus a short section explaining scope so users don't expect agent responses to translate via this knob.	2026-05-05 08:03:07 -07:00
Teknium	285c208cf7	fix(gateway): also tolerate malformed env vars in custom human-delay mode Widens @Krionex's PR #16933 fix to cover the second bug class at the sibling site. natural mode used to pass env values through int() before the PR caught mis-typed values crashing the gateway; custom mode had the exact same bug one branch away (HERMES_HUMAN_DELAY_MIN_MS=oops in custom mode still crashed). Same try/except/fallback pattern, scoped to the two int() calls that feed random.uniform().	2026-05-05 06:11:38 -07:00
Krionex	3b16c590e0	fix(gateway): ignore malformed custom delay env vars in natural mode	2026-05-05 06:11:38 -07:00
JC的AI分身	80b386a472	fix(feishu): refresh bot identity during hydration	2026-05-05 06:04:20 -07:00
Teknium	314361733f	test(api_server): _run_agent result now carries session_id for #16938	2026-05-05 06:01:03 -07:00
vominh1919	7f735b4db2	fix: return effective session_id after context compression (#16938 ) When context compression rotates the agent's session_id to a new child session, the API server was still returning the stale parent session_id in the X-Hermes-Session-Id response header. This caused external clients to keep sending the old session_id, loading uncompressed parent history instead of the compressed continuation. Fix: _run_agent() now includes the effective session_id in its result dict, and the response header uses it instead of the original provided session_id.	2026-05-05 06:01:03 -07:00
Teknium	fe8560fc12	feat(api-server): X-Hermes-Session-Key header for long-term memory scoping (#20199 ) * feat(api-server): X-Hermes-Session-Key header for long-term memory scoping API Server integrations (Open WebUI, custom web UIs) can now pass a stable per-channel identifier via X-Hermes-Session-Key that scopes long-term memory (Honcho, etc.) independently of the transcript-scoped X-Hermes-Session-Id. This matches the native gateway's session_key / session_id split: one stable key per assistant channel, many independent transcripts that rotate on /new. - _create_agent and _run_agent accept gateway_session_key and pass it to AIAgent(gateway_session_key=...), which is already honored by the Honcho memory provider (plugins/memory/honcho/client.py resolve_session_name). - New shared helper _parse_session_key_header applies the same API-key gate, control-character sanitization, and a 256-char length cap as the existing session-id header. - All three agent endpoints honor the header: /v1/chat/completions, /v1/responses, /v1/runs. JSON and SSE responses echo it back. - /v1/capabilities advertises session_key_header so clients can feature-detect. Closes #20060. Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com> * chore: AUTHOR_MAP entry for manateelazycat --------- Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>	2026-05-05 05:34:47 -07:00
chengoak	8f4c0bf088	fix(wecom): pad base64 AES key before decode WeCom doesn't pad base64 aeskey, causing Python strict mode decode failure on media/image/file messages. Add automatic padding before base64 decode: aes_key + '=' * ((4 - len(aes_key) % 4) % 4). Salvages the AES padding fix from @chengoak's PR #17040. The SSRF whitelist entry for a private COS bucket hostname was dropped as it belongs in user config, not the built-in trusted-private-IP-hosts list. The debug-level full-body info log was dropped to avoid logging potentially sensitive message content at INFO level.	2026-05-05 05:00:41 -07:00
Asher Morse	6b76ea4707	fix(gateway): load reply_to_mode from config.yaml for Discord and Telegram The YAML-to-env-var bridge in load_gateway_config() mapped every Discord and Telegram config key (require_mention, auto_thread, reactions, etc.) except reply_to_mode. Users setting discord.reply_to_mode or telegram.reply_to_mode in ~/.hermes/config.yaml got no effect — the adapter only read the env var, which nothing populated from YAML. Add the missing bridge for both platforms, following the existing pattern. Top-level <platform>.reply_to_mode preferred, falls back to <platform>.extra.reply_to_mode, env var never overwritten. Handles YAML 1.1 bare `off` → Python False coercion. This is a re-submission of the work from #9837 and #13930, which both implemented the same fix but neither landed (see co-authors below). Co-authored-by: Matteo De Agazio <hypnosis.mda@gmail.com> Co-authored-by: ishardo <239075732+ishardo@users.noreply.github.com>	2026-05-05 04:58:23 -07:00
0xsir0000	f6b68f0f50	fix(gateway): keep DoH-confirmed Telegram IPs that match system DNS (#14520 ) discover_fallback_ips() filtered out any DoH-resolved IP that also appeared in the system resolver's answer set, on the assumption that the system IP was unreachable. When DoH and system DNS agreed (a common case), the function returned the hardcoded _SEED_FALLBACK_IPS list instead — and on networks where those seed addresses are not routable, the Telegram fallback transport had nothing usable to retry against and polling failed. Drop the system_ips exclusion so DoH-confirmed IPs are preserved regardless of system DNS overlap. The TelegramFallbackTransport already tries the primary path first via system DNS, then falls through to the IP-rewrite path on connect failure; including the same IP in both lanes lets a transient primary failure recover via the explicit IP route instead of escalating to seed addresses. Update the two tests that codified the old exclusion to reflect the new, inclusion-by-default behaviour. Fixes #14520	2026-05-05 04:42:59 -07:00

1 2 3 4 5 ...

1467 commits