When a Telegram /restart fires and PTB's graceful-shutdown `get_updates`
ACK call times out ("When polling for updates is restarted, updates may
be received twice" in gateway.log), the new gateway receives the same
/restart again and restarts a second time — a self-perpetuating loop.
Record the triggering update_id in `.restart_last_processed.json` when
handling /restart. On the next process, reject a /restart whose
update_id <= the recorded one as a stale redelivery. 5-minute staleness
guard so an orphaned marker can't block a legitimately new /restart.
- gateway/platforms/base.py: add `platform_update_id` to MessageEvent
- gateway/platforms/telegram.py: propagate `update.update_id` through
_build_message_event for text/command/location/media handlers
- gateway/run.py: write dedup marker in _handle_restart_command;
_is_stale_restart_redelivery checks it before processing /restart
- tests/gateway/test_restart_redelivery_dedup.py: 9 new tests covering
fresh restart, redelivery, staleness window, cross-platform,
malformed-marker resilience, and no-update_id (CLI) bypass
Only active for Telegram today (the one platform with monotonic
cross-session update ordering); other platforms return False from
_is_stale_restart_redelivery and proceed normally.
Error messages that tell users to install optional extras now use
{sys.executable} -m pip install ... instead of a bare 'pip install
hermes-agent[extra]' string. Under the curl installer, bare 'pip'
resolves to system pip, which either fails with PEP 668
externally-managed-environment or installs into the wrong Python.
Affects: hermes dashboard, hermes web server startup, mcp_serve,
hermes doctor Bedrock check, CLI voice mode, voice_mode tool runtime
error, Discord voice-channel join failure message.
* fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace
interrupt() previously only flagged the agent's _execution_thread_id.
Tools running inside _execute_tool_calls_concurrent execute on
ThreadPoolExecutor worker threads whose tids are distinct from the
agent's, so is_interrupted() inside those tools returned False no matter
how many times the gateway called .interrupt() — hung ssh / curl / long
make-builds ran to their own timeout.
Changes:
- run_agent.py: track concurrent-tool worker tids in a per-agent set,
fan interrupt()/clear_interrupt() out to them, and handle the
register-after-interrupt race at _run_tool entry. getattr fallback
for the tracker so test stubs built via object.__new__ keep working.
- tools/environments/base.py: opt-in _wait_for_process trace (ENTER,
per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT
DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1.
- tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target
tid, set snapshot) behind the same env flag.
- tests: new regression test runs a polling tool on a concurrent worker
and asserts is_interrupted() flips to True within ~1s of interrupt().
Second new test guards clear_interrupt() clearing tracked worker bits.
Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env
subset 216 pass.
* fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log
AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when
quiet_mode=True (the CLI default). This would silently swallow every
INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation
added in the parent commit — confirmed by running hermes chat -q with
the flag and finding zero trace lines in agent.log even though
_wait_for_process was clearly executing (subprocess pid existed).
Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets
its own logger level to INFO at import time, overriding the 'tools'
parent-level filter. Scoped to the opt-in case only, so production
(quiet_mode default) logs stay quiet as designed.
Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes
'_wait_for_process ENTER/EXIT' lines to agent.log as expected.
* fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses
Tool subprocesses spawned by the local environment backend use
os.setsid so they run in their own process group. Before this fix,
SIGTERM/SIGHUP to the hermes CLI killed the main thread via
KeyboardInterrupt but the worker thread running _wait_for_process
never got a chance to call _kill_process — Python exited, the child
was reparented to init (PPID=1), and the subprocess ran to its
natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM
to the agent until manual cleanup).
Changes:
- cli.py _signal_handler (interactive) + _signal_handler_q (-q mode):
route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll
loop sees the per-thread interrupt flag and calls _kill_process
(os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default
1.5s) gives the worker time to complete its SIGTERM+SIGKILL
escalation before KeyboardInterrupt unwinds main.
- tools/environments/base.py _wait_for_process: wrap the poll loop in
try/except (KeyboardInterrupt, SystemExit) so the cleanup fires
even on paths the signal handlers don't cover (direct sys.exit,
unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace
line when HERMES_DEBUG_INTERRUPT=1.
- New regression test: injects KeyboardInterrupt into a running
_wait_for_process via PyThreadState_SetAsyncExc, verifies the
subprocess process group is dead within 3s of the exception and
that KeyboardInterrupt re-raises cleanly afterward.
Validation:
| Before | After |
|---------------------------------------------------------|--------------------|
| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM | dies within 2 s |
| No INTERRUPT DETECTED in trace | INTERRUPT DETECTED fires + killing process group |
| tests/tools/test_local_interrupt_cleanup | 1/1 pass |
| tests/run_agent/test_concurrent_interrupt | 4/4 pass |
Extend forum support from PR #10145:
- REST path (_send_discord): forum thread creation now uploads media
files as multipart attachments on the starter message in a single
call. Previously media files were silently dropped on the forum
path.
- Websocket media paths (_send_file_attachment, send_voice, send_image,
send_animation — covers send_image_file, send_video, send_document
transitively): forum channels now go through a new _forum_post_file
helper that creates a thread with the file as starter content,
instead of failing via channel.send(file=...) which forums reject.
- _send_to_forum chunk follow-up failures are collected into
raw_response['warnings'] so partial-send outcomes surface.
- Process-local probe cache (_DISCORD_CHANNEL_TYPE_PROBE_CACHE) avoids
GET /channels/{id} on every uncached send after the first.
- Dedup of TestSendDiscordMedia that the PR merge-resolution left
behind.
- Docs: Forum Channels section under website/docs/user-guide/messaging/discord.md.
Tests: 117 passed (22 new for forum+media, probe cache, warnings).
Follow-up to #11909: surface the legacy-unit warning where users are most
likely to see it. After a 'hermes update', if a pre-rename hermes.service
is still installed alongside the current hermes-gateway.service, print
the list of legacy units + the 'hermes gateway migrate-legacy' command.
Profile-safe: reuses _find_legacy_hermes_units() which is an explicit
allowlist of hermes.service only — profile units never match.
Platform-gated: only prints on systemd hosts (the rename is Linux-only).
Non-blocking: just prints, never prompts, so gateway-spawned
hermes update --gateway runs aren't affected.
* fix(gateway): detect legacy hermes.service units from pre-rename installs
Older Hermes installs used a different service name (hermes.service) before
the rename to hermes-gateway.service. When both units remain installed, they
fight over the same bot token — after PR #5646's signal-recovery change,
this manifests as a 30-second SIGTERM flap loop between the two services.
Detection is an explicit allowlist (no globbing) plus an ExecStart content
check, so profile units (hermes-gateway-<profile>.service) and unrelated
third-party services named 'hermes' are never matched.
Wired into systemd_install, systemd_status, gateway_setup wizard, and the
main hermes setup flow — anywhere we already warn about scope conflicts now
also warns about legacy units.
* feat(gateway): add migrate-legacy command + install-time removal prompt
- New hermes_cli.gateway.remove_legacy_hermes_units() removes legacy
unit files with stop → disable → unlink → daemon-reload. Handles user
and system scopes separately; system scope returns path list when not
running as root so the caller can tell the user to re-run with sudo.
- New 'hermes gateway migrate-legacy' subcommand (with --dry-run and -y)
routes to remove_legacy_hermes_units via gateway_command dispatch.
- systemd_install now offers to remove legacy units BEFORE installing
the new hermes-gateway.service, preventing the SIGTERM flap loop that
hits users who still have pre-rename hermes.service around.
Profile units (hermes-gateway-<profile>.service) remain untouched in
all paths — the legacy allowlist is explicit (_LEGACY_SERVICE_NAMES)
and the ExecStart content check further narrows matches.
* fix(gateway): mark --replace SIGTERM as planned so target exits 0
PR #5646 made SIGTERM exit the gateway with code 1 so systemd's
Restart=on-failure revives it after unexpected kills. But when a user has
two gateway units fighting for the same bot token (e.g. legacy
hermes.service + hermes-gateway.service from a pre-rename install), the
--replace takeover itself becomes the 'unexpected' SIGTERM — the loser
exits 1, systemd revives it 30s later, and the cycle flaps indefinitely.
Before calling terminate_pid(), --replace now writes a short-lived marker
file naming the target PID + start_time. The target's shutdown_signal_handler
consumes the marker and, when it names this process, leaves
_signal_initiated_shutdown=False so the final exit code stays 0.
Staleness defences:
- PID + start_time combo prevents PID reuse matching an old marker
- Marker older than 60s is treated as stale and discarded
- Marker is unlinked on first read even if it doesn't match this process
- Replacer clears the marker post-loop + on permission-denied give-up
- AI Cards: how to configure ``card_template_id`` for streaming rich replies
- Emoji reactions: 🤔Thinking → 🥳Done lifecycle
- Per-platform display settings (streaming, tool_progress, reasoning, etc.)
- Installation: switch to the ``hermes-agent[dingtalk]`` extra (adds
alibabacloud-dingtalk alongside dingtalk-stream)
- Messaging capability matrix updated to reflect images, audio, video,
and threading support
Cherry-picked from #10985 by pedh, adapted to current main:
* Keeps main's full group-chat gating (require_mention + allowed_users +
free_response_chats + mention_patterns) — PR's simpler subset dropped.
* Keeps main's fire-and-forget process() dispatch + session_webhook
fallback for SDK >= 0.24.
* Picks up PR's REQUIRES_EDIT_FINALIZE capability flag on
BasePlatformAdapter + finalize kwarg on edit_message(), plumbed through
stream_consumer. Default False so Telegram/Slack/Discord/Matrix stay
on the zero-overhead fast path.
* DingTalk AI Card lifecycle: per-chat _message_contexts, two-card flow
(tool-progress + final response) with sibling auto-close driven by
reply_to, idempotent 🤔Thinking → 🥳Done swap, $alibabacloud-dingtalk$
for media URL resolution (replaces raw HTTP that was 403-ing).
* pyproject: dingtalk extra now dingtalk-stream>=0.20,<1 +
alibabacloud-dingtalk>=2.0.0 + qrcode.
Closes#10991
Co-authored-by: pedh
ShellFileOperations captured the terminal env's cwd at __init__ time and
used that stale value for every subsequent _exec() call. When the user
ran `cd` via the terminal tool, `env.cwd` updated but `ops.cwd` did not.
Relative paths passed to patch_replace / read_file / write_file / search
then targeted the ORIGINAL directory instead of the current one.
Observed symptom in agent sessions:
terminal: cd .worktrees/my-branch
patch hermes_cli/main.py <old> <new>
→ returns {"success": true} with a plausible unified diff
→ but `git diff` in the worktree shows nothing
→ the patch landed in the main repo's checkout of main.py instead
The diff looked legitimate because patch_replace computes it from the
IN-MEMORY content vs new_content, not by re-reading the file. The
write itself DID succeed — it just wrote to the wrong directory's copy
of the same-named file.
Fix: _exec() now resolves cwd from live sources in this order:
1. Explicit `cwd` arg (if provided by the caller)
2. Live `self.env.cwd` (tracks `cd` commands run via terminal)
3. Init-time `self.cwd` (fallback when env has no cwd attribute)
Includes a 5-test regression suite covering:
- cd followed by relative read follows live cwd
- the exact reported bug: patch_replace with relative path after cd
- explicit cwd= arg still wins over env.cwd
- env without cwd attribute falls back to init-time cwd
- patch_replace success reflects real file state (safety rail)
Co-authored-by: teknium1 <teknium@nousresearch.com>
persist_nous_credentials() now accepts an optional label kwarg which
gets embedded in providers.nous under the 'label' key.
_seed_from_singletons() prefers the embedded label over the
auto-derived label_from_token() fingerprint when materialising the
pool entry, so re-seeding on every load_pool('nous') preserves the
user's chosen label.
auth_commands.py threads --label through to the helper, restoring
parity with how other OAuth providers (anthropic, codex, google,
qwen) honor the flag.
Tests: 4 new (embed, reseed-survives, no-label fallback, end-to-end
through auth_add_command). All 390 nous/auth/credential_pool tests
pass.
Review feedback on the original commit: the helper wrote a pool entry
with source `manual:device_code` while `_seed_from_singletons()` upserts
with `device_code` (no `manual:` prefix), so the pool grew a duplicate
row on every `load_pool()` after login.
Normalise: the helper now writes `providers.nous` and delegates the pool
write entirely to `_seed_from_singletons()` via a follow-up
`load_pool()` call. The canonical source is `device_code`; the helper
never materialises a parallel `manual:device_code` entry.
- `persist_nous_credentials()` loses its `label` and `source` kwargs —
both are now derived by the seed path from the singleton state.
- CLI and web dashboard call sites simplified accordingly.
- New test `test_persist_nous_credentials_idempotent_no_duplicate_pool_entries`
asserts that two consecutive persists leave exactly one pool row and
no stray `manual:` entries.
- Existing `test_auth_add_nous_oauth_persists_pool_entry` updated to
assert the canonical source and single-entry invariant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`hermes auth add nous --type oauth` only wrote credential_pool.nous,
leaving providers.nous empty. When the Nous agent_key's 24h TTL expired,
run_agent.py's 401-recovery path called resolve_nous_runtime_credentials
(which reads providers.nous), got AuthError "Hermes is not logged into
Nous Portal", caught it as logger.debug (suppressed at INFO level), and
the agent died with "Non-retryable client error" — no signal to the
user that recovery even tried.
Introduce persist_nous_credentials() as the single source of truth for
Nous device-code login persistence. Both auth_commands (CLI) and
web_server (dashboard) now route through it, so pool and providers
stay in sync at write time.
Why: CLI-provisioned profiles couldn't recover from agent_key expiry,
producing silent daily outages 24h after first login. PR #6856/#6869
addressed adjacent issues but assumed providers.nous was populated;
this one wasn't being written.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash). Non-aggregator users already got their main
model. Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.
After: 'auto' means "use my main chat model" for every user,
regardless of provider type. Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers). Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.
Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model). Aggregator strict vision
backends are fallbacks, not the primary path.
Changes:
- agent/auxiliary_client.py: _resolve_auto() drops the
`_AGGREGATOR_PROVIDERS` guard. resolve_vision_provider_client()
auto branch unifies aggregator and exotic-provider paths —
everyone goes through resolve_provider_client() with main_model.
Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
the guard we just removed).
- hermes_cli/main.py: aux config menu copy updated to reflect
the new semantics ("'auto' means 'use my main model'").
- tests/agent/test_auxiliary_main_first.py: 12 regression tests
covering OpenRouter/Nous/DeepSeek main paths, runtime-override
wins, explicit-config wins, vision override preservation for
exotic providers, and fallback-chain activation when the main
provider has no working client.
Co-authored-by: teknium1 <teknium@nousresearch.com>
Follow-up polish on top of the cherry-picked #11023 commit.
- feishu_comment_rules.py: replace import-time "~/.hermes" expanduser fallback
with get_hermes_home() from hermes_constants (canonical, profile-safe).
- tools/feishu_doc_tool.py, tools/feishu_drive_tool.py: drop the
asyncio.get_event_loop().run_until_complete(asyncio.to_thread(...)) dance.
Tool handlers run synchronously in a worker thread with no running loop, so
the RuntimeError branch was always the one that executed. Calls client.request
directly now. Unused asyncio import removed.
- tests/gateway/test_feishu.py: add register_p2_customized_event to the mock
EventDispatcher builder so the existing adapter test matches the new handler
registration for drive.notice.comment_add_v1.
- scripts/release.py: map liujinkun@bytedance.com -> liujinkun2025 for
contributor attribution on release notes.
- Full comment handler: parse drive.notice.comment_add_v1 events, build
timeline, run agent, deliver reply with chunking support.
- 5 tools: feishu_doc_read, feishu_drive_list_comments,
feishu_drive_list_comment_replies, feishu_drive_reply_comment,
feishu_drive_add_comment.
- 3-tier access control rules (exact doc > wildcard "*" > top-level >
defaults) with per-field fallback. Config via
~/.hermes/feishu_comment_rules.json, mtime-cached hot-reload.
- Self-reply filter using generalized self_open_id (supports future
user-identity subscriptions). Receiver check: only process events
where the bot is the @mentioned target.
- Smart timeline selection, long text chunking, semantic text extraction,
session sharing per document, wiki link resolution.
Change-Id: I31e82fd6355173dbcc400b8934b6d9799e3137b9
Follow-up to the cherry-picked contributor fix:
- Extract `_remember_chat_req_id()` and bound it at DEDUP_MAX_SIZE like
`_reply_req_ids` — the unbounded dict would grow forever on a long-
running gateway with many chats.
- Move the cache write to AFTER the group/DM policy check so we don't
cache req_ids from blocked senders.
- Revert the undocumented `is_group` change: the contributor flipped
`chattype == 'group'` to `bool(chatid)`, which wasn't mentioned in
the PR description and weakens the signal (chattype is the explicit
hint; relying on chatid presence assumes DMs never carry it). Keep
the original check.
- Drop the defensive `getattr(self, '_last_chat_req_ids', {})` reads
at both send sites — the attribute is initialized in __init__.
- Update `test_send_uses_passive_reply_stream_...` → `_markdown_...`
to match the new msgtype, and add a new TestWeComZombieSessionFix
class covering device_id presence in subscribe, per-chat req_id
caching + bounding, blocked-sender cache exclusion, and the group
APP_CMD_RESPONSE fallback path.
Previously users had to hand-edit config.yaml to route individual auxiliary
tasks (vision, compression, web_extract, etc.) to a specific provider+model.
Add a first-class picker reachable from the bottom of the existing `hermes
model` provider list.
Flow:
hermes model
→ Configure auxiliary models...
→ <task picker: 9 tasks, shows current setting inline>
→ <provider picker: authenticated providers + auto + custom>
→ <model picker: curated list + live pricing>
The aux picker does NOT re-run credential/OAuth setup; users authenticate
providers through the normal `hermes model` flow, then route aux tasks to
them here. `list_authenticated_providers()` gates the list to providers
the user has configured.
Also:
- 'Cancel' entry relabeled 'Leave unchanged' (sentinel still 'cancel'
internally, so dispatch logic is unchanged)
- 'Reset all to auto' entry to bulk-clear aux overrides; preserves
user-tuned timeout / download_timeout values
- Adds `title_generation` task to DEFAULT_CONFIG.auxiliary — the task
was called from agent/title_generator.py but was missing from defaults,
so config-backed timeout overrides never worked for it
Co-authored-by: teknium1 <teknium@nousresearch.com>
build_skills_system_prompt() was using the skill directory name (skill_name)
when appending to skills_by_category in all three code paths (snapshot cache,
cold filesystem scan, external dirs). This meant any skill whose directory name
differed from its frontmatter `name` field would appear under the wrong name in
the system prompt, causing LLM routing failures.
The snapshot entry already stores both skill_name (dir) and frontmatter_name
(declared); switch the three tuple appends to use frontmatter_name. Also fix
the external-dir dedup set (seen_skill_names) to track frontmatter names for
consistency with the local-skill tuples now stored under frontmatter_name.
Fixes#11777
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
Two accretion-over-time leaks that compound over long CLI / gateway
lifetimes. Both were flagged in the memory-leak audit.
## file_tools._read_tracker
_read_tracker[task_id] holds three sub-containers that grew unbounded:
read_history set of (path, offset, limit) tuples — 1 per unique read
dedup dict of (path, offset, limit) → mtime — same growth pattern
read_timestamps dict of resolved_path → mtime — 1 per unique path
A CLI session uses one stable task_id for its lifetime, so these were
uncapped. A 10k-read session accumulated ~1.5MB of tracker state that
the tool no longer needed (only the most recent reads are relevant for
dedup, consecutive-loop detection, and write/patch external-edit
warnings).
Fix: _cap_read_tracker_data() enforces hard caps on each container
after every add. Defaults: read_history=500, dedup=1000,
read_timestamps=1000. Eviction is insertion-order (Python 3.7+ dict
guarantee) for the dicts; arbitrary for the set (which only feeds
diagnostic summaries).
## process_registry._completion_consumed
Module-level set that recorded every session_id ever polled / waited /
logged. No pruning. Each entry is ~20 bytes, so the absolute leak is
small, but on a gateway processing thousands of background commands
per day the set grows until process exit.
Fix: _prune_if_needed() now discards _completion_consumed entries
alongside the session dict evictions it already performs (both the
TTL-based prune and the LRU-over-cap prune). Adds a final
belt-and-suspenders pass that drops any dangling entries whose
session_id no longer appears in _running or _finished.
Tests: tests/tools/test_accretion_caps.py — 9 cases
* Each container bound respected, oldest evicted
* No-op when under cap (no unnecessary work)
* Handles missing sub-containers without crashing
* Live read_file_tool path enforces caps end-to-end
* _completion_consumed pruned on TTL expiry
* _completion_consumed pruned on LRU eviction
* Dangling entries (no backing session) cleared
Broader suite: 3486 tests/tools + tests/cli pass. The single flake
(test_alias_command_passes_args) reproduces on unchanged main — known
cross-test pollution under suite-order load.
Replace the hardcoded 'kimi-for-coding' string check with the helper
from auxiliary_client so there is one source of truth for the list of
models with fixed-temperature contracts. Adding a new entry to
_FIXED_TEMPERATURE_MODELS now automatically covers flush_memories too.
Google-side 429 Code Assist errors now flow through Hermes' normal rate-limit
path (status_code on the exception, Retry-After preserved via error.response)
instead of being opaque RuntimeErrors. User sees a one-line capacity message
instead of a 500-char JSON dump.
Changes
- CodeAssistError grows status_code / response / retry_after / details attrs.
_extract_status_code in error_classifier picks up status_code and classifies
429 as FailoverReason.rate_limit, so fallback_providers triggers the same
way it does for SDK errors. run_agent.py line ~10428 already walks
error.response.headers for Retry-After — preserving the response means that
path just works.
- _gemini_http_error parses the Google error envelope (error.status +
error.details[].reason from google.rpc.ErrorInfo, retryDelay from
google.rpc.RetryInfo). MODEL_CAPACITY_EXHAUSTED / RESOURCE_EXHAUSTED / 404
model-not-found each produce a human-readable message; unknown shapes fall
back to the previous raw-body format.
- Drop gemma-4-26b-it from hermes_cli/models.py, hermes_cli/setup.py, and
agent/model_metadata.py — Google returned 404 for it today in local repro.
Kept gemma-4-31b-it (capacity-constrained but not retired).
Validation
| | Before | After |
|---------------------------|--------------------------------|-------------------------------------------|
| Error message | 'Code Assist returned HTTP 429: {500 chars JSON}' | 'Gemini capacity exhausted for gemini-2.5-pro (Google-side throttle...)' |
| status_code on error | None (opaque RuntimeError) | 429 |
| Classifier reason | unknown (string-match fallback) | FailoverReason.rate_limit |
| Retry-After honored | ignored | extracted from RetryInfo or header |
| gemma-4-26b-it picker | advertised (404s on Google) | removed |
Unit + E2E tests cover non-streaming 429, streaming 429, 404 model-not-found,
Retry-After header fallback, malformed body, and classifier integration.
Targeted suites: tests/agent/test_gemini_cloudcode.py (81 tests), full
tests/hermes_cli (2203 tests) green.
Co-authored-by: teknium1 <teknium@nousresearch.com>
Follow-up to WideLee's salvaged PR #11582.
Back-compat for QQ_HOME_CHANNEL → QQBOT_HOME_CHANNEL rename:
- gateway/config.py reads QQBOT_HOME_CHANNEL, falls back to QQ_HOME_CHANNEL
with a one-shot deprecation warning so users on the old name aren't
silently broken.
- cron/scheduler.py: _HOME_TARGET_ENV_VARS['qqbot'] now maps to the new
name; _get_home_target_chat_id falls back to the legacy name via a
_LEGACY_HOME_TARGET_ENV_VARS table.
- hermes_cli/status.py + hermes_cli/setup.py: honor both names when
displaying or checking for missing home channels.
- hermes_cli/config.py: keep legacy QQ_HOME_CHANNEL[_NAME] in
_EXTRA_ENV_KEYS so .env sanitization still recognizes them.
Scope cleanup:
- Drop qrcode from core dependencies and requirements.txt (remains in
messaging/dingtalk/feishu extras). _qqbot_render_qr already degrades
gracefully when qrcode is missing, printing a 'pip install qrcode' tip
and falling back to URL-only display.
- Restore @staticmethod on QQAdapter._detect_message_type (it doesn't
use self). Revert the test change that was only needed when it was
converted to an instance method.
- Reset uv.lock to origin/main; the PR's stale lock also included
unrelated changes (atroposlib source URL, hermes-agent version bump,
fastapi additions) that don't belong.
Verified E2E:
- Existing user (QQ_HOME_CHANNEL set): gateway + cron both pick up the
legacy name; deprecation warning logs once.
- Fresh user (QQBOT_HOME_CHANNEL set): gateway + cron use new name,
no warning.
- Both set: new name wins on both surfaces.
Targeted tests: 296 passed, 4 skipped (qqbot + cron + hermes_cli).
- Re-export _ssrf_redirect_guard from __init__.py
- Fix _parse_json @staticmethod using self._log_tag
- Update test_detect_message_type to call as instance method
- Fix mock.patch path for httpx.AsyncClient in adapter submodule
- Remove @staticmethod from _detect_message_type, _convert_silk_to_wav,
_convert_raw_to_wav, _convert_ffmpeg_to_wav so they can use self._log_tag
- Replace all remaining hardcoded "QQBot" log args with self._log_tag
- Downgrade STT routine flow logs (download, convert, success) from info to debug
- Keep warning level for actual failures (STT failed, ffmpeg error, empty transcript)
Three closely-related fixes for shutdown / lifecycle hygiene.
1. _release_running_agent_state(session_key) helper
----------------------------------------------------
Per-running-agent state lived in three dicts that drifted out of sync
across cleanup sites:
self._running_agents — AIAgent per session_key
self._running_agents_ts — start timestamp per session_key
self._busy_ack_ts — last busy-ack timestamp per session_key
Inventory before this PR:
8 sites: del self._running_agents[key]
— only 1 (stale-eviction) cleaned all three
— 1 cleaned _running_agents + _running_agents_ts only
— 6 cleaned _running_agents only
Each missed entry was a (str, float) tuple per session per gateway
lifetime — small, persistent, accumulates across thousands of
sessions over months. Per-platform leaks compounded.
This change adds a single helper that pops all three dicts in
lockstep, and replaces every bare 'del self._running_agents[key]'
site with it. Per-session state that PERSISTS across turns
(_session_model_overrides, _voice_mode, _pending_approvals,
_update_prompt_pending) is intentionally NOT touched here — those
have their own lifecycles tied to user actions, not turn boundaries.
2. _running_agents_ts cleared in _stop_impl
----------------------------------------
Was being missed alongside _running_agents.clear(); now included.
3. SessionDB close() in _stop_impl
---------------------------------
The SQLite WAL write lock stayed held by the old gateway connection
until Python actually exited — causing 'database is locked' errors
when --replace launched a new gateway against the same file. We
now explicitly close both self._db and self.session_store._db
inside _stop_impl, with try/except so a flaky close on one doesn't
block the other.
Tests
-----
tests/gateway/test_session_state_cleanup.py — 10 cases covering:
* helper pops all three dicts atomically
* idempotent on missing/empty keys
* preserves other sessions
* tolerates older runners without _busy_ack_ts attribute
* thread-safe under concurrent release
* regression guard: scans gateway/run.py and fails if a future
contributor reintroduces 'del self._running_agents[...]'
outside docstrings
* SessionDB close called on both holders during shutdown
* shutdown tolerates missing session_store
* shutdown tolerates close() raising on one db (other still closes)
Broader gateway suite: 3108 passed (vs 3100 on baseline) — failure
delta is +8 net passes; the 10 remaining failures are pre-existing
cross-test pollution / missing optional deps (matrix needs olm,
signal/telegram approval flake, dingtalk Mock wiring), all reproduce
on stashed baseline.