hermes-agent/hermes_cli
Teknium 0109547fa2
fix(update): handle conflicted git index during hermes update (#4735)
* fix(gateway): race condition, photo media loss, and flood control in Telegram

Three bugs causing intermittent silent drops, partial responses, and
flood control delays on the Telegram platform:

1. Race condition in handle_message() — _active_sessions was set inside
   the background task, not before create_task(). Two rapid messages
   could both pass the guard and spawn duplicate processing tasks.
   Fix: set _active_sessions synchronously before spawning the task
   (grammY sequentialize / aiogram EventIsolation pattern).

2. Photo media loss on dequeue — when a photo (no caption) was queued
   during active processing and later dequeued, only .text was
   extracted. Empty text → message silently dropped.
   Fix: _build_media_placeholder() creates text context for media-only
   events so they survive the dequeue path.

3. Progress message edits triggered Telegram flood control — rapid tool
   calls edited the progress message every 0.3s, hitting Telegram's
   rate limit (23s+ waits). This blocked progress updates and could
   cause stream consumer timeouts.
   Fix: throttle edits to 1.5s minimum interval, detect flood control
   errors and gracefully degrade to new messages. edit_message() now
   returns failure for flood waits >5s instead of blocking.

* fix(gateway): downgrade empty/None response log from WARNING to DEBUG

This warning fires on every successful streamed response (streaming
delivers the text, handler returns None via already_sent=True) and
on every queued message during active processing. Both are expected
behavior, not error conditions. Downgrade to DEBUG to reduce log noise.

* fix(gateway): prevent stuck sessions with agent timeout and staleness eviction

Three changes to prevent sessions from getting permanently locked:

1. Agent execution timeout (HERMES_AGENT_TIMEOUT, default 10min):
   Wraps run_in_executor with asyncio.wait_for so a hung API call or
   runaway tool can't lock a session indefinitely. On timeout, the
   agent is interrupted and the user gets an actionable error message.

2. Staleness eviction for _running_agents:
   Tracks start timestamps for each session entry. When a new message
   arrives and the entry is older than timeout + 1min grace, it's
   evicted as a leaked lock. Safety net for any cleanup path that
   fails to remove the entry.

3. Cron job timeout (HERMES_CRON_TIMEOUT, default 10min):
   Wraps run_conversation in a ThreadPoolExecutor with timeout so a
   hung cron job doesn't block the ticker thread (and all subsequent
   cron jobs) indefinitely.

Follows grammY runner's per-update timeout pattern and aiogram's
asyncio.wait_for approach for handler deadlines.

* fix(gateway): STT config resolution, stream consumer flood control fallback

Three targeted fixes from user-reported issues:

1. STT config resolution (transcription_tools.py):
   _has_openai_audio_backend() and _resolve_openai_audio_client_config()
   now check stt.openai.api_key/base_url in config.yaml FIRST, before
   falling back to env vars. Fixes voice transcription breaking when
   using a custom OpenAI-compatible endpoint via config.yaml.

2. Stream consumer flood control fallback (stream_consumer.py):
   When an edit fails mid-stream (e.g., Telegram flood control returns
   failure for waits >5s), reset _already_sent to False so the normal
   final send path delivers the complete response. Previously, a
   truncated partial was left as the final message.

3. Telegram edit_message comment alignment (telegram.py):
   Clarify that long flood waits return failure so streaming can fall
   back to a normal final send.

* refactor: simplify and harden PR fixes after review

- Fix cron ThreadPoolExecutor blocking on timeout: use shutdown(wait=False,
  cancel_futures=True) instead of context manager that waits indefinitely
- Extract _dequeue_pending_text() to deduplicate media-placeholder logic
  in interrupt and normal-completion dequeue paths
- Remove hasattr guards for _running_agents_ts: add class-level default
  so partial test construction works without scattered defensive checks
- Move `import concurrent.futures` to top of cron/scheduler.py
- Progress throttle: sleep remaining interval instead of busy-looping
  0.1s (~15 wakeups per 1.5s window → 1 wakeup)
- Deduplicate _load_stt_config() in transcription_tools.py:
  _has_openai_audio_backend() now delegates to _resolve_openai_audio_client_config()

* fix: move class-level attribute after docstring, clarify throttle comment

Follow-up nits for salvaged PR #4577:
- Move _running_agents_ts class attribute below the docstring so
  GatewayRunner.__doc__ is preserved.
- Add clarifying comment explaining the throttle continue behavior
  (batches queued messages during the throttle interval).

* fix(update): handle conflicted git index during hermes update

When the git index has unmerged entries (e.g. from an interrupted
merge or rebase), git stash fails with 'needs merge / could not
write index'. Detect this with git ls-files --unmerged and clear
the conflict state with git reset before attempting the stash.
Working-tree changes are preserved.

Reported by @LLMJunky — package-lock.json conflict from a prior
merge left the index dirty, blocking hermes update entirely.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-04-03 01:17:12 -07:00
..
__init__.py chore: release v0.6.0 (2026.3.30) (#3985) 2026-03-30 08:29:38 -07:00
auth.py fix: repair OpenCode model routing and selection (#4508) 2026-04-02 09:36:24 -07:00
auth_commands.py feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647) 2026-03-31 03:10:01 -07:00
banner.py chore: prepare Hermes for Homebrew packaging (#4099) 2026-03-30 17:34:43 -07:00
callbacks.py feat(approvals): make dangerous command approval timeout configurable (#3886) 2026-03-30 00:02:02 -07:00
checklist.py fix: add TTY guard to interactive CLI commands to prevent CPU spin (#3933) 2026-03-30 08:10:23 -07:00
claw.py feat: add post-migration cleanup for OpenClaw directories (#4100) 2026-03-30 17:39:08 -07:00
clipboard.py fix: clean up empty file after failed wl-paste clipboard extraction 2026-03-11 02:56:19 -07:00
codex_models.py fix: add gpt-5.4-mini to Codex fallback catalog (#3855) 2026-03-29 20:10:00 -07:00
colors.py feat: respect NO_COLOR env var and TERM=dumb (#4079) 2026-03-30 17:07:21 -07:00
commands.py fix(telegram): enforce 32-char limit on command names with collision avoidance (#4211) 2026-03-31 02:41:50 -07:00
config.py feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623) 2026-04-02 15:33:51 -07:00
copilot_auth.py chore: remove ~100 unused imports across 55 files (#3016) 2026-03-25 15:02:03 -07:00
cron.py fix(cron): stop truncating job IDs in list view (#4132) 2026-03-30 19:05:34 -07:00
curses_ui.py fix: add TTY guard to interactive CLI commands to prevent CPU spin (#3933) 2026-03-30 08:10:23 -07:00
default_soul.py fix: reset default SOUL.md to baseline identity text (#3159) 2026-03-26 01:34:27 -07:00
doctor.py feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623) 2026-04-02 15:33:51 -07:00
env_loader.py chore: remove ~100 unused imports across 55 files (#3016) 2026-03-25 15:02:03 -07:00
gateway.py fix: allow running gateway service as root for LXC/container environments (#4732) 2026-04-03 01:14:21 -07:00
main.py fix(update): handle conflicted git index during hermes update (#4735) 2026-04-03 01:17:12 -07:00
mcp_config.py fix: add TTY guard to interactive CLI commands to prevent CPU spin (#3933) 2026-03-30 08:10:23 -07:00
memory_setup.py feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623) 2026-04-02 15:33:51 -07:00
model_switch.py fix: repair OpenCode model routing and selection (#4508) 2026-04-02 09:36:24 -07:00
models.py fix: repair OpenCode model routing and selection (#4508) 2026-04-02 09:36:24 -07:00
nous_subscription.py Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-03-31 09:29:43 +09:00
pairing.py chore: fix 154 f-strings, simplify getattr/URL patterns, remove dead code (#3119) 2026-03-25 19:47:58 -07:00
plugins.py Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-03-31 09:29:43 +09:00
plugins_cmd.py chore: prepare Hermes for Homebrew packaging (#4099) 2026-03-30 17:34:43 -07:00
profiles.py fix: also exclude .env from default profile exports 2026-04-01 11:20:33 -07:00
runtime_provider.py fix: repair OpenCode model routing and selection (#4508) 2026-04-02 09:36:24 -07:00
setup.py fix: repair OpenCode model routing and selection (#4508) 2026-04-02 09:36:24 -07:00
skills_config.py fix: webhook platform support — skip home channel prompt, disable tool progress (salvage #4363) (#4660) 2026-04-02 14:00:22 -07:00
skills_hub.py fix(skills): validate hub bundle paths before install (#3986) 2026-03-30 08:37:19 -07:00
skin_engine.py refactor: consolidate get_hermes_home() and parse_reasoning_effort() (#3062) 2026-03-25 15:54:28 -07:00
status.py Merge branch 'main' into rewbs/tool-use-charge-to-subscription 2026-03-31 09:29:43 +09:00
tools_config.py fix: webhook platform support — skip home channel prompt, disable tool progress (salvage #4363) (#4660) 2026-04-02 14:00:22 -07:00
uninstall.py chore: fix 154 f-strings, simplify getattr/URL patterns, remove dead code (#3119) 2026-03-25 19:47:58 -07:00
webhook.py fix: replace user-facing hardcoded ~/.hermes paths with display_hermes_home() 2026-03-28 23:47:21 -07:00