The Ink TUI (\`hermes --tui\` + dashboard \`/chat\`) had no wiring for the
background self-improvement review. When the review fired and patched
a skill or saved a memory entry, the change landed but the user had
no visual indication it happened — only the CLI had a print surface
for the '💾 Self-improvement review: …' line.
Changes:
- tui_gateway/server.py: in _init_session, attach
agent.background_review_callback to an _emit('review.summary',
sid, {text}) closure. Wrapped in try/except so agents with locked
attribute slots don't break session startup.
- ui-tui/src/app/createGatewayEventHandler.ts: handle 'review.summary'
by routing ev.payload.text through sys(…), matching the existing
'background.complete' pattern. Empty / whitespace payloads are
ignored so the transcript never gets a blank system line.
- ui-tui/src/gatewayTypes.ts: extend the GatewayEvent discriminated
union with { type: 'review.summary', payload?: { text?: string } }.
Gateway platforms (Telegram, Discord, Slack, …) already route the
review summary via background_review_callback → post-delivery queue
in gateway/run.py, so they pick up the new 'Self-improvement review:'
prefix from the companion run_agent change with no platform edits.
Tests:
- tests/tui_gateway/test_review_summary_callback.py (Python, 2 tests):
_init_session attaches a callback that emits the right event; the
callback path survives agents that can't accept the attribute.
- ui-tui/src/__tests__/createGatewayEventHandler.test.ts (vitest, 2
new cases): review.summary events feed sys(...) with the full text;
empty / missing payloads are no-ops.
- TypeScript type-check passes.
- tui_gateway suite: 64/64 pass.
When the self-improvement background review fires after a turn, it runs
in a bg thread and emits a ' 💾 <summary>' line to announce what it
saved to memory or skills. Two problems made this invisible to users
even when the review successfully modified a skill:
1. The print went through `_cprint` (prompt_toolkit's print_formatted_text)
on a bg thread while the CLI's PromptSession was live. Direct
print_formatted_text races with the input-area redraw and the line
can land behind/above the prompt, scrolled off without the user
seeing it.
2. The message said only '💾 Skill created.' / '💾 Memory updated'
with no indication that the self-improvement loop was the one doing
this. Users who did catch the line couldn't tell the background
review from some other agent action.
Fixes:
- `_cprint` now detects when it's called from a non-app thread with a
running prompt_toolkit Application, and routes through
`run_in_terminal` via `loop.call_soon_threadsafe`. That pauses the
input, prints the line above the prompt, and redraws — the normal
prompt_toolkit contract for bg-thread output. Direct-print fallback
preserved for the no-app / same-thread / import-error paths. Affects
every bg-thread emission, not just the review summary (curator
summaries and auxiliary failure prints benefit too).
- The summary now reads ' 💾 Self-improvement review: <summary>' in
both the CLI and the gateway `background_review_callback` path, so
the origin is unambiguous.
Tests:
- New `tests/cli/test_cprint_bg_thread.py` covers all five routing
branches (no app, app-not-running, cross-thread schedule, same-thread
direct, app-loop-attribute-error, import-error).
- New case in `tests/run_agent/test_background_review.py` asserts the
attributed prefix shows up in both `_safe_print` and
`background_review_callback`.
Live E2E: exercised _cprint from a bg thread inside a real Application
event loop; confirmed get_app_or_none() sees the app, call_soon_threadsafe
schedules run_in_terminal, and the inner _pt_print runs.
Builds on #16855 (@lsdsjy) which fixed DeepSeek v4 reasoning_content
replay via model_extra fallback + capturing tool_calls at method entry.
Kimi / Moonshot thinking mode enforces the same echo-back contract and
hits the same 400 when a tool-call turn is persisted without
reasoning_content.
- _build_assistant_message: pad branch now uses _needs_thinking_reasoning_pad()
(DeepSeek OR Kimi) instead of _needs_deepseek_tool_reasoning() alone.
- Extract _needs_thinking_reasoning_pad() and reuse it in
_copy_reasoning_content_for_api so both sites share one predicate.
- tests/run_agent/test_deepseek_reasoning_content_echo.py: add
TestBuildAssistantMessagePadsStrictProviders parametrized over DeepSeek
(attr=None, attr-absent), Kimi (attr=None), Moonshot (via base_url),
and an OpenRouter negative control that must NOT pad. Proven to fail
2/5 cases on Kimi/Moonshot without this change.
- scripts/release.py: add AUTHOR_MAP entries for lsdsjy and season179.
Refs #17400.
Co-authored-by: season179 <season.saw@gmail.com>
Alongside the existing 'least recently used' section, surface two more
rankings so users can see which of their agent-created skills actually
get exercised:
- 'most used (top 5)' — sorted by use_count descending. Hidden when every
skill has use_count=0 (noise suppression on fresh installs).
- 'least used (top 5)' — sorted by use_count ascending. Always shown
when the catalog is non-empty.
use_count started tracking real agent skill activation in PR #17932
(bump_use wired into skill_view tool + slash invocation + --skill
preload), so these rankings are now meaningful.
Tests: 3 new in tests/hermes_cli/test_curator_status.py — happy path
with mixed use_counts, zero-use suppression of the most-used section,
and the no-skills clean-empty case.
Treat skill views and edits as activity when curator reports and applies lifecycle transitions, so recently loaded or patched skills are not displayed or transitioned as never used.\n\nAdds regression tests for activity derivation, automatic transitions, and CLI status output.
restore_skill() in tools/skill_usage.py used archive_root.iterdir(), which
only walked the top level of .archive/. Skills archived under nested layouts
(e.g. .archive/openclaw-imports/<skill>/ from older archive paths or
external imports) were invisible to both the exact-match and prefix-match
candidate scans, surfacing as a misleading "skill '<name>' not found in
archive" error even though the directory existed on disk.
Switch both candidate scans to archive_root.rglob('*') so the lookup
descends into category subdirectories.
Fixes#17942
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
* change(nix): dedupe nix lockfile checking scripts in ci
* feat(nix): make .#fix-lockfiles run --apply if no args passed
* fix(nix): use same nodejs version everywhere & small lints
- prevent lockfile thrashing while using nix :3
- use lib.getExe instead of raw /bin/ paths
- use inputs'.self instead of passing system in manually
* fix(nix): update lock files yet again (hopefully for the last time)
* fix(nix): align indentation of collision check echo
---------
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Fixes HTTP 404 errors when using Anthropic-compatible providers (Kimi Coding, MiniMax, MiniMax-CN) for auxiliary tasks.
Root cause: `_to_openai_base_url()` rewrites `/anthropic` → `/v1` so the OpenAI SDK hits the right endpoint. But the rewritten URL was then passed to `_maybe_wrap_anthropic`, whose `_endpoint_speaks_anthropic_messages` detector only fires on `/anthropic` or `api.kimi.com/coding`. Detector saw `/v1` → returned False → no Anthropic wrap → 404 on every aux call.
Fix: preserve the raw base_url before rewriting and pass it to `_maybe_wrap_anthropic` for transport detection, while still giving the rewritten URL to the OpenAI client constructor.
Closes#17705, #17413, #17086, #10469.
Co-authored-by: oak <chengoak@users.noreply.github.com>
* fix(nix): replace magic-nix-cache with Cachix
magic-nix-cache caused recurring CI failures (TwirpErrorResponse
ResourceExhausted) by hitting GitHub Actions Cache's 10 GB limit and
200 req/min rate limit. This was flagged as 'unfixable infra flake' in
#17836 but is actually a fixable architecture choice.
Switch to Cachix (dedicated binary cache, no GHA quota dependency):
- Replace DeterminateSystems/magic-nix-cache-action with cachix/cachix-action
- Add cachix-auth-token input to nix-setup composite action
- Pass CACHIX_AUTH_TOKEN secret through all three nix workflows
- continue-on-error: true so cache failures never block CI
Cache 'hermes-agent' is public at hermes-agent.cachix.org.
Devs can pull locally with: cachix use hermes-agent
* fix: correct cachix-action commit SHA pin
---------
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Widen #17818 to cover the dominant 'agent actively used this skill' path:
when the model calls the skill_view tool, bump use_count alongside view_count.
The slash-command and --skill preload paths (covered by the cherry-picked
commit) only catch user-initiated invocation; most skill activation happens
via the agent calling skill_view to consume an indexed skill.
Curator's stale-timer keys off last_used_at (agent/curator.py:233), so
without this wire-up agent-created skills would transition to stale
simultaneously regardless of actual use.
bump_use() existed and was tested but had zero production call sites —
use_count stayed 0 for all skills, breaking Curator's stale-detection
logic which relies on last_used_at.
Wire bump_use() into:
1. build_skill_invocation_message() — when a user invokes /skill-name
2. build_preloaded_skills_prompt() — when a skill is preloaded at session start
Both are the canonical 'a skill is actively being used' moments, distinct
from 'browsing' (bump_view in skill_view tool call).
Closes#17782
Belt-and-suspenders on top of @briandevans' #17758 fix. The in-band
drain hand-off (await->create_task + session-guard preservation)
changed cleanup semantics in three places that the original PR
reasoned about but didn't test directly. Pin each invariant so a
future refactor can't silently regress them:
1. Normal single-message path still releases _active_sessions[sk] and
_session_tasks[sk] through end-of-finally. The #17758 follow-up
moved _release_session_guard under
if current_task is self._session_tasks.get(session_key)
For the 99%-common case current_task IS the stored task, so the
guard must still fire. Test would fail if the conditional were
ever tightened in a way that dropped the normal path.
2. Drain-task cancellation releases the session. If the drain task
spawned by the in-band hand-off is cancelled mid-handler (e.g.
/stop fired while draining a follow-up), its own finally must
fire _release_session_guard. Without this a cancel would leave
the session permanently pinned busy.
3. Late-arrival drain still spawns when no in-band drain preceded
it. Pre-existing path, but the #17758 follow-up added a
re-queue branch that only fires when ownership was already
handed off. When no handoff happened the else branch must still
spawn a fresh drain task — otherwise a message arriving during
stop_typing gets silently dropped.
All three tests pass against current main. Zero production code
changes.
Widen #17639 to the fourth sibling site (tools/skills_tool.py _EXCLUDED_SKILL_DIRS)
and register leoneparise in scripts/release.py AUTHOR_MAP so CI release script
resolves the contributor.
Archived skills (moved to ~/.hermes/skills/.archive/ by the curator)
were still surfaced in the <available_skills> system prompt under a
fake '.archive' category, causing the agent to load and try to use
deprecated skills. The os.walk in iter_skill_index_files() only
excluded .git/.github/.hub.
Add '.archive' to EXCLUDED_SKILL_DIRS, and to the two other places
that hardcode the same exclusion tuple (gateway/run.py and
agent/skill_commands.py).
Three fixes bundled for curator reliability on existing installs and
broken/partial installs:
1. run_agent.py: defer `import fire` into the __main__ block. `fire` is
only used by `fire.Fire(main)` when running run_agent.py directly as
a CLI — it is NOT needed for library usage. Importing it at module
top made `from run_agent import AIAgent` from a daemon thread (e.g.
the curator's forked review agent) crash with ModuleNotFoundError
on broken/partial installs where `fire` isn't present.
2. hermes_cli/config.py: add version 22 → 23 migration that writes the
`curator` + `auxiliary.curator` sections to config.yaml with their
defaults, only filling keys the user hasn't overridden. Existing
configs from before PR #16049 / the April 2026 `auxiliary.curator`
unification had neither section on disk, so users couldn't see or
edit the settings in their config.yaml (runtime deep-merge papered
over it at read time, but the file never reflected reality).
3. hermes_cli/config.py: `ensure_hermes_home()` now pre-creates
`~/.hermes/logs/curator/` alongside cron/sessions/logs/memories on
every CLI launch. Managed-mode (NixOS) variant mkdir's it
defensively after the activation-script existence checks, since the
activation script may not know about this subpath.
4. agent/curator.py: `_reports_root()` mkdir's the dir at call time as
belt-and-suspenders for entry paths that bypass both
ensure_hermes_home() and the v23 migration (gateway-only installs,
bare library use).
E2E validated in isolated HERMES_HOME: fresh install gets full defaults
seeded; partial-override config keeps user's `enabled: false` and
custom `interval_hours` while filling the missing keys; re-running the
migration is a no-op.
The #1630 fix introduced a blanket ``agent_failed_early`` transcript skip
to prevent context-overflow sessions from looping. That guard also
triggers for unrelated transient failures (429 rate limits, read
timeouts, connection resets, provider 5xx) which have nothing to do with
session size — and it silently drops the user's message, so the agent
has no memory of the last turn on retry.
Split the failure classification in ``GatewayRunner._run_agent``:
* Context-overflow (``compression_exhausted`` flag, explicit
context-length phrases, or generic 400 with a long history) → keep
the existing skip, preserving the #1630/#9893 fix.
* Anything else that failed → persist just the user message so the
conversation survives a retry.
Use specific multi-word phrases (``context length``, ``token limit``,
``prompt is too long``, etc.) to match ``run_agent.py``'s own
classifier; bare ``exceed`` false-positively flagged "rate limit
exceeded" as context overflow.
Covered by new tests in ``tests/gateway/test_7100_transient_failure_transcript.py``
and the existing #1630 suite still passes.
Existing test_tar_pipe_commands asserted the literal substring
'tar xf - -C /' in ssh_str, which is no longer present after the
#17767 fix adds --no-overwrite-dir between 'tar xf -' and '-C /'.
Split the one substring check into three independent assertions for
the tar stdin mode, the new --no-overwrite-dir flag (regression guard
for #17767), and the extract target.
_set_nested unconditionally replaced any non-dict value with an empty
dict when walking the dotted path, which silently destroyed list-typed
config nodes the moment someone set a value with a numeric index
(e.g. 'hermes config set custom_providers.0.api_key NEW'). Any sibling
entries and any fields inside the targeted entry that the user didn't
write were lost.
Fix:
- _set_nested now detects list nodes and navigates by numeric index,
and preserves both dicts AND lists at intermediate positions (scalars
are still replaced so bare-scalar -> nested overrides keep working).
- set_config_value drops its duplicated navigation logic and calls
_set_nested instead -- single source of truth for the rules.
Regression tests (tests/hermes_cli/test_set_config_value.py):
- test_indexed_set_preserves_sibling_list_entries -- exact #17876 repro
- test_indexed_set_preserves_non_targeted_fields -- inner-dict fields survive
- test_deeper_nesting_through_list -- dict -> list -> dict -> scalar path
35/35 existing + new tests pass.
E2E-verified with the issue's repro against a real on-disk config.yaml --
list stays a list, entry 0 updated, entry 1 intact.
Closes#17876
When hermes model picker switches to a custom_providers entry, the slug
assignment can write the literal string 'custom' to model.provider if a
prior failed switch already left that value in config.yaml.
Two fixes:
1. model_switch.py: filter out bare 'custom' in slug assignment, always
resolve to canonical custom:<name> form
2. providers.py: resolve_custom_provider() self-heals bare 'custom' by
falling back to the first valid custom_providers entry
Closes#17478
Long-lived Gateway processes were sending duplicate tool names to
providers that enforce uniqueness:
- DeepSeek: 'Tool names must be unique.'
- Xiaomi MiMo: 'tools contains duplicate names: lcm_expand'
- Moonshot/Kimi: 'function name lcm_grep is duplicated'
TUI was unaffected because TUI runs with quiet_mode=False and skips the
cache entirely.
Root cause (two layered bugs)
- model_tools.get_tool_definitions(quiet_mode=True) memoizes its result
in _tool_defs_cache. The cache-hit path returned list(cached) (safe),
but the FIRST uncached call stored and returned the SAME object.
run_agent.py mutates self.tools (memory + LCM context-engine schemas)
in-place, so the very first agent init in a Gateway process
poisoned the cache, and every subsequent init appended LCM schemas
again on top of the already-polluted list.
- run_agent.py's context-engine injection (lcm_grep / lcm_describe /
lcm_expand) had no dedup, unlike the memory-tools injection right
above it which already skips already-present names.
Fix (defense in depth, per the issue's suggested fix)
- model_tools.get_tool_definitions: on the uncached branch, cache the
computed list but return list(result) to the caller. Same pattern as
the cache-hit path.
- run_agent.py: build _existing_tool_names from self.tools and skip
schemas whose names are already present, mirroring the memory-tools
block. This also defends against plugin paths that may register the
same schemas via ctx.register_tool().
Tests (tests/test_get_tool_definitions_cache_isolation.py)
- test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without
it, first-call alias caused all the symptoms.
- test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays.
- test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent
appending lcm_grep / lcm_expand to the returned list and asserts the
next call doesn't see them.
- test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the
long-lived Gateway accumulation pattern across 5 agent inits.
- test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI
was fine.
5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.
When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).
Root cause: two separate context values existed independently:
- context_compressor.context_length = config value (e.g. 65536) ✓
- _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config
Changes:
1. Cap Ollama num_ctx to config context_length (run_agent.py)
When model.context_length is explicitly set and no explicit
ollama_num_ctx override exists, cap the auto-detected GGUF value
to the user's context_length. This is the core fix — it prevents
Ollama from allocating more VRAM than the user budgeted.
2. Pass config_context_length through all secondary call sites
Several paths called get_model_context_length() without the config
override, falling through to the 256K default fallback:
- cli.py: @-reference expansion and /model switch display
- gateway/run.py: @-reference expansion and /model switch display
- tui_gateway/server.py: @-reference expansion
- hermes_cli/model_switch.py: resolve_display_context_length()
3. Normalize root-level context_length in config (hermes_cli/config.py)
_normalize_root_model_keys() now migrates root-level context_length
into the model section, matching existing behavior for provider and
base_url. Users who wrote `context_length: 65536` at the YAML root
instead of under `model:` had it silently ignored.
4. Fix misleading comments (agent/model_metadata.py)
DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
as two comments stated.
Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
The busy-session handler (_handle_active_session_busy_message) bypassed the
authorization gate that the cold path enforces via _is_user_authorized(). In
shared-thread contexts (Slack threads, Telegram forum topics, Discord threads)
where thread_sessions_per_user=False (the default), all participants share one
session_key. An unauthorized user posting in the same thread as an authorized
user would hit the active-session branch, skip the auth check, and have their
text merged into _pending_messages or injected via agent.interrupt().
This commit adds the same _is_user_authorized() check at the top of the busy
handler, before any message queuing, steering, or interrupt logic. Unauthorized
messages are silently dropped (return True) with a warning log — matching the
cold-path behavior.
Affected platforms: Slack, Telegram, Discord, any adapter with shared-session
thread contexts.
Closes#17775
The `gemini` provider also serves Gemma (e.g. `gemma-4-31b-it`) and
historically other Google models like PaLM. Those reject
`extra_body.thinking_config` with HTTP 400:
Unknown name "thinking_config": Cannot find field
`_build_gemini_thinking_config()` was unconditionally producing a
config dict for any model on the `gemini` / `google-gemini-cli`
provider, which `ChatCompletionsTransport.build_kwargs` then dropped
into `extra_body["thinking_config"]`. The result: every chat turn for
Gemma users on the gemini provider blew up at the API edge.
The fix is the same shape Hermes already uses for the Gemini-2.5 vs
Gemini-3 family clamping: normalise the model id, strip an
`OpenRouter`-style `google/` prefix, and short-circuit early when the
result doesn't start with `gemini`. We return `None` rather than
`{"includeThoughts": False}`, because the API rejects the field name
itself — even the polite "off" form trips the same 400.
Three regression tests cover Gemma with reasoning enabled, Gemma with
reasoning disabled, and the `google/gemma-…` OpenRouter-style id; the
existing Gemini-2.5 / Gemini-3 / `google/gemini-…` cases keep passing
because the Gemini guard fires after the prefix strip.
Fixes#17426
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ports PR #17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.
Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
GIFs peeled off and routed through send_animation (albums don't support
animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
over); URL images downloaded into BytesIO so they render inline; forum
channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
SMTP size governs); remote URLs remain linked in body (parity with
existing send_image)
All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.
Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.
Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.
Co-authored-by: Maxence Groine <maxence@groine.fr>