Commit graph

4388 commits

Author SHA1 Message Date
Teknium
19142810ed
fix: /debug privacy — auto-delete pastes after 1 hour, add privacy notices (#10510)
- Pastes uploaded by /debug now auto-delete after 1 hour via a detached
  background process that sends DELETE to paste.rs
- CLI: shows privacy notice listing what data will be uploaded
- Gateway: only uploads summary report (system info + log tails), NOT
  full log files containing conversation content
- Added 'hermes debug delete <url>' for immediate manual deletion
- 16 new tests covering auto-delete scheduling, paste deletion, privacy
  notices, and the delete subcommand

Addresses user privacy concern where /debug uploaded full conversation
logs to a public paste service with no warning or expiry.
2026-04-15 13:40:27 -07:00
Teknium
2edbf15560
fix: enforce TTL in MessageDeduplicator + use yaml for gateway --config (#10306, #10216) (#10509)
Two gateway fixes:

1. MessageDeduplicator.is_duplicate() now checks TTL at query time (#10306)

   Previously, is_duplicate() returned True for any previously seen ID
   without checking its age — expired entries were only purged when cache
   size exceeded max_size.  On normal workloads that never overflow, message
   IDs stayed deduplicated forever instead of expiring after the TTL.

   Fix: check `now - timestamp < ttl` before returning True.  Expired
   entries are removed and treated as new messages.

2. Gateway --config flag now uses yaml.safe_load() (#10216)

   The --config CLI flag in gateway/run.py main() used json.load() to
   parse config files.  YAML is the only documented config format and
   every other config loader uses yaml.safe_load().  A YAML config file
   passed via --config would crash with json.JSONDecodeError.

Closes #10306
Closes #10216
2026-04-15 13:35:40 -07:00
Teknium
af4bf505b3
fix: add on_memory_write bridge to sequential tool execution path (#10174) (#10507)
The on_memory_write bridge that notifies external memory providers
(ClawMem, retaindb, supermemory, etc.) of built-in memory writes was
only present in the concurrent tool execution path (_invoke_tool).
The sequential path (_execute_tool_calls_sequential) — which handles
all single tool calls, the common case — was missing it entirely.

This meant external memory providers silently missed every single-call
memory write, which is the vast majority of memory operations.

Fix: add the identical bridge block to the sequential path, right
after the memory_tool call returns.

Closes #10174
2026-04-15 13:32:59 -07:00
helix4u
93f6f66872 fix(interrupt): preserve pre-start terminal interrupts 2026-04-15 13:29:57 -07:00
Teknium
a418ddbd8b
fix: add activity heartbeats to prevent false gateway inactivity timeouts (#10501)
Multiple gaps in activity tracking could cause the gateway's inactivity
timeout to fire while the agent is actively working:

1. Streaming wait loop had no periodic heartbeat — the outer thread only
   touched activity when the stale-stream detector fired (180-300s), and
   for local providers (Ollama) the stale timeout was infinity, meaning
   zero heartbeats. Now touches activity every 30s.

2. Concurrent tool execution never set the activity callback on worker
   threads (threading.local invisible across threads) and never set
   _current_tool. Workers now set the callback, and the concurrent wait
   uses a polling loop with 30s heartbeats.

3. Modal backend's execute() override had its own polling loop without
   any activity callback. Now matches _wait_for_process cadence (10s).
2026-04-15 13:29:05 -07:00
Teknium
0d25e1c146
fix: prevent premature loop exit when weak models return empty after substantive tool calls (#10472)
The _last_content_with_tools fallback was firing indiscriminately for ALL
content+tool turns, including mid-task narration alongside substantive
tools (terminal, search_files, etc.).  This caused the agent to exit
the loop with 'I'll scan the directory...' as the final answer instead
of nudging the model to continue processing tool results.

The fix restricts the fallback to housekeeping-only turns (memory, todo,
skill_manage, session_search) where the content genuinely IS the final
answer.  When substantive tools are present, the existing post-tool
nudge mechanism now fires instead, prompting the model to continue.

Affected models: xiaomi/mimo-v2-pro, GLM-5, and other weaker models
that intermittently return empty after tool results.

Reported by user Renaissance on Discord.
2026-04-15 13:28:09 -07:00
Teknium
6391b46779
fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)
The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
2026-04-15 13:16:28 -07:00
Teknium
d1d425e9d0 chore: add ZaynJarvis bytedance email to AUTHOR_MAP 2026-04-15 11:28:45 -07:00
zhiheng.liu
7cb06e3bb3 refactor(memory): drop on_session_reset — commit-only is enough
OV transparently handles message history across /new and /compress: old
messages stay in the same session and extraction is idempotent, so there's
no need to rebind providers to a new session_id. The only thing the
session boundary actually needs is to trigger extraction.

- MemoryProvider / MemoryManager: remove on_session_reset hook
- OpenViking: remove on_session_reset override (nothing to do)
- AIAgent: replace rotate_memory_session with commit_memory_session
  (just calls on_session_end, no rebind)
- cli.py / run_agent.py: single commit_memory_session call at the
  session boundary before session_id rotates
- tests: replace on_session_reset coverage with routing tests for
  MemoryManager.on_session_end

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
zhiheng.liu
8275fa597a refactor(memory): promote on_session_reset to base provider hook
Replace hasattr-forked OpenViking-specific paths with a proper base-class
hook. Collapse the two agent wrappers into a single rotate_memory_session
so callers don't orchestrate commit + rebind themselves.

- MemoryProvider: add on_session_reset(new_session_id) as a default no-op
- MemoryManager: on_session_reset fans out unconditionally (no hasattr,
  no builtin skip — base no-op covers it)
- OpenViking: rename reset_session -> on_session_reset; drop the explicit
  POST /api/v1/sessions (OV auto-creates on first message) and the two
  debug raise_for_status wrappers
- AIAgent: collapse commit_memory_session + reinitialize_memory_session
  into rotate_memory_session(new_sid, messages)
- cli.py / run_agent.py: replace hasattr blocks and the split calls with
  a single unconditional rotate_memory_session call; compression path
  now passes the real messages list instead of []
- tests: align with on_session_reset, assert reset does NOT POST /sessions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
zhiheng.liu
7856d304f2 fix(openviking): commit session on /new and context compression
The OpenViking memory provider extracts memories when its session is
committed (POST /api/v1/sessions/{id}/commit).  Before this fix, the
CLI had two code paths that changed the active session_id without ever
committing the outgoing OpenViking session:

1. /new (new_session() in cli.py) — called flush_memories() to write
   MEMORY.md, then immediately discarded the old session_id.  The
   accumulated OpenViking session was never committed, so all context
   from that session was lost before extraction could run.

2. /compress and auto-compress (_compress_context() in run_agent.py) —
   split the SQLite session (new session_id) but left the OpenViking
   provider pointing at the old session_id with no commit, meaning all
   messages synced to OpenViking were silently orphaned.

The gateway already handles session commit on /new and /reset via
shutdown_memory_provider() on the cached agent; the CLI path did not.

Fix: introduce a lightweight session-transition lifecycle alongside
the existing full shutdown path:

- OpenVikingMemoryProvider.reset_session(new_session_id): waits for
  in-flight background threads, resets per-session counters, and
  creates the new OV session via POST /api/v1/sessions — without
  tearing down the HTTP client (avoids connection overhead on /new).

- MemoryManager.restart_session(new_session_id): calls reset_session()
  on providers that implement it; falls back to initialize() for
  providers that do not.  Skips the builtin provider (no per-session
  state).

- AIAgent.commit_memory_session(messages): wraps
  memory_manager.on_session_end() without shutdown — commits OV session
  for extraction but leaves the provider alive for the next session.

- AIAgent.reinitialize_memory_session(new_session_id): wraps
  memory_manager.restart_session() — transitions all external providers
  to the new session after session_id has been assigned.

Call sites:
- cli.py new_session(): commit BEFORE session_id changes, reinitialize
  AFTER — ensuring OV extraction runs on the correct session and the
  new session is immediately ready for the next turn.
- run_agent._compress_context(): same pattern, inside the
  if self._session_db: block where the session_id split happens.

/compress and auto-compress are functionally identical at this layer:
both call _compress_context(), so both are fixed by the same change.

Tests added to tests/agent/test_memory_provider.py:
- TestMemoryManagerRestartSession: reset_session() routing, builtin
  skip, initialize() fallback, failure tolerance, empty-manager noop.
- TestOpenVikingResetSession: session_id update, per-session state
  clear, POST /api/v1/sessions call, API failure tolerance, no-client
  noop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
zhiheng.liu
f3ec4b3a16 Fix OpenViking integration issues: explicit session creation, better error logging 2026-04-15 11:28:45 -07:00
ZaynJarvis
5082a9f66c fix: wire agent/account/user params through _VikingClient
- Fix copy-paste bug: `self._agent = user` → `self._agent = agent`
  with new `agent` parameter in `_VikingClient.__init__`
- Read account/user/agent env vars in `initialize()` and pass them
  to all 4 `_VikingClient` instantiations so identity headers are
  consistently applied across health check, prefetch, sync, and
  memory write paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 11:28:45 -07:00
Zayn Jarvis
0c30385be2 chore: update doc 2026-04-15 11:28:45 -07:00
Zayn Jarvis
8b167af66b feat: add ov agent header 2026-04-15 11:28:45 -07:00
ZaynJarvis
990030c26e feat: add contrib map 2026-04-15 11:28:45 -07:00
ZaynJarvis
d2f85383e8 fix: change default OPENVIKING_ACCOUNT from root to default
- Change default OPENVIKING_ACCOUNT from 'root' to 'default'
- Add account and user config options to get_config_schema()
- Add session creation in initialize()
- Add reset_session() method
- Update docstring to reflect new default

This is a breaking change: existing users who relied on the 'root' account will need to either:
1. Set OPENVIKING_ACCOUNT=root in their environment, or
2. Migrate their data to the 'default' account

Future release will add support for OPENVIKING_ACCOUNT and OPENVIKING_USER in setup when API key is provided.

update desc for key setup
2026-04-15 11:28:45 -07:00
Teknium
2dc5f9d2d3
fix: light mode link/primary colors unreadable on white background (#10457)
Gold #FFD700 has 1.4:1 contrast ratio on white — barely visible.
Replace with dark amber palette (#8B6508 primary, #7A5800 links)
that passes WCAG AA (5.3:1 and 6.5:1 respectively).

Changes:
- :root primary palette → dark amber tones for light mode
- Explicit light mode link colors (#7A5800 / #5A4100 hover)
- Light mode sidebar active state with amber accent
- Light mode table header/border styling
- Footer hover color split by theme (gold for dark, amber for light)

Dark mode is completely unchanged.

Reported by @AbrahamMat7632
2026-04-15 11:17:44 -07:00
Teknium
f61cc464f0 fix: include thread_id in _parse_session_key and fix stale parts reference
_parse_session_key() now extracts the optional 6th part (thread_id) from
session keys, and _notify_active_sessions_of_shutdown uses _parsed.get()
instead of the removed 'parts' variable. Without this, shutdown notifications
silently failed (NameError caught by try/except) and forum topic routing
was lost.
2026-04-15 11:16:01 -07:00
kshitijk4poor
2276b72141 fix: follow-up improvements for watch notification routing (#9537)
- Populate watcher_* routing fields for watch-only processes (not just
  notify_on_complete), so watch-pattern events carry direct metadata
  instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
  at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
  (empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
2026-04-15 11:16:01 -07:00
etcircle
dee592a0b1 fix(gateway): route synthetic background events by session 2026-04-15 11:16:01 -07:00
kshitij
da448d4fce
test(cron): add regression test for credential_files ContextVar propagation (#10462)
Follow-up to #10459 (salvage of #7527). The copy_context() fix propagates
ALL ContextVars into the cron worker thread, including credential_files.
This test verifies that skill-declared required_credential_files are
visible inside the worker thread, matching the existing env_passthrough
regression test.
2026-04-15 11:11:08 -07:00
helix4u
aa398ad655 fix(cron): preserve skill env passthrough in worker thread 2026-04-15 11:03:49 -07:00
WideLee
422f2866e6 docs: restore sidebar entries removed by PR #9931
Re-add 'qqbot' and 'automation-templates' doc indexes to sidebars.ts
that were accidentally dropped in https://github.com/NousResearch/hermes-agent/pull/9931.
2026-04-15 09:39:12 -07:00
Teknium
722331a57d
fix: replace hardcoded ~/.hermes with display_hermes_home() in agent-facing text (#10285)
Tool schema descriptions and tool return values contained hardcoded
~/.hermes paths that the model sees and uses. When HERMES_HOME is set
to a custom path (Docker containers, profiles), the agent would still
reference ~/.hermes — looking at the wrong directory.

Fixes 6 locations across 5 files:
- tools/tts_tool.py: output_path schema description
- tools/cronjob_tools.py: script path schema description
- tools/skill_manager_tool.py: skill_manage schema description
- tools/skills_tool.py: two tool return messages
- agent/skill_commands.py: skill config injection text

All now use display_hermes_home() which resolves to the actual
HERMES_HOME path (e.g. /opt/data for Docker, ~/.hermes/profiles/X
for profiles, ~/.hermes for default).

Reported by: Sandeep Narahari (PrithviDevs)
2026-04-15 04:57:55 -07:00
sprmn24
41e2d61b3f feat(discord): add native send_animation for inline GIF playback 2026-04-15 04:51:27 -07:00
Teknium
4da598b48a
docs: clarify hermes model vs /model — two commands, two purposes (#10276)
Users are confused about the difference between `hermes model` (terminal
command for full provider setup) and `/model` (session command for switching
between already-configured providers). This distinction was not documented
anywhere.

Changes across 4 doc pages:
- cli-commands.md: Added warning callout explaining the difference, added
  --global flag docs, added 'only see OpenRouter models?' info box
- slash-commands.md: Added notes on both TUI and messaging /model entries
  that /model only switches between configured providers
- providers.md: Added 'Two Commands for Model Management' comparison table
  near top of page, added warning callout in switching section
- faq.md: Added new FAQ entry '/model only shows one provider' with quick
  reference table

Prompted by user feedback in Discord — new users consistently hit this
confusion when trying to add providers from inside a session.
2026-04-15 04:39:34 -07:00
asheriif
33ae403890 fix(gateway): fix matrix lingering typing indicator 2026-04-15 04:16:16 -07:00
Teknium
47e6ea84bb fix: file handle bug, warning text, and tests for Discord media send
- Fix file handle closed before POST: nest session.post() inside
  the 'with open()' block so aiohttp can read the file during upload
- Update warning text to include weixin (also supports media delivery)
- Add 8 unit tests covering: text+media, media-only, missing files,
  upload failures, multiple files, and _send_to_platform routing
2026-04-15 04:16:06 -07:00
sprmn24
4bcb2f2d26 feat(send_message): add native media attachment support for Discord
Previously send_message only supported media delivery for Telegram.
Discord users received a warning that media was omitted.

- Add media_files parameter to _send_discord()
- Upload media via Discord multipart/form-data API (files[0] field)
- Handle Discord in _send_to_platform() same way as Telegram block
- Remove Discord from generic chunk loop (now handled above)
- Update error/warning strings to mention telegram and discord
2026-04-15 04:16:06 -07:00
Teknium
1c4d3216d3
fix(cron): include job_id in delivery and guide models on removal workflow (#10242)
* fix(gateway): suppress duplicate replies on interrupt and streaming flood control

Three fixes for the duplicate reply bug affecting all gateway platforms:

1. base.py: Suppress stale response when the session was interrupted by a
   new message that hasn't been consumed yet. Checks both interrupt_event
   and _pending_messages to avoid false positives. (#8221, #2483)

2. run.py (return path): Remove response_previewed guard from already_sent
   check. Stream consumer's already_sent alone is authoritative — if
   content was delivered via streaming, the duplicate send must be
   suppressed regardless of the agent's response_previewed flag. (#8375)

3. run.py (queued-message path): Same fix — already_sent without
   response_previewed now correctly marks the first response as already
   streamed, preventing re-send before processing the queued message.

The response_previewed field is still produced by the agent (run_agent.py)
but is no longer required as a gate for duplicate suppression. The stream
consumer's already_sent flag is the delivery-level truth about what the
user actually saw.

Concepts from PR #8380 (konsisumer). Closes #8375, #8221, #2483.

* fix(cron): include job_id in delivery and guide models on removal workflow

Users reported cron reminders keep firing after asking the agent to stop.
Root cause: the conversational agent didn't know the job_id (not in delivery)
and models don't reliably do the list→remove two-step without guidance.

1. Include job_id in the cron delivery wrapper so users and agents can
   reference it when requesting removal.

2. Replace confusing footer ('The agent cannot see this message') with
   actionable guidance ('To stop or manage this job, send me a new
   message').

3. Add explicit list→remove guidance in the cronjob tool schema so models
   know to list first and never guess job IDs.
2026-04-15 03:46:58 -07:00
Misturi
dedc4600dd fix(skills): handle missing fields in Google Workspace token file gracefully instead of crashing with KeyError 2026-04-15 03:45:09 -07:00
Misturi
8bc9b5a0b4 fix(skills): use is None check for coordinates in find-nearby to avoid dropping valid 0.0 values 2026-04-15 03:45:09 -07:00
Teknium
2546b7acea fix(gateway): suppress duplicate replies on interrupt and streaming flood control
Three fixes for the duplicate reply bug affecting all gateway platforms:

1. base.py: Suppress stale response when the session was interrupted by a
   new message that hasn't been consumed yet. Checks both interrupt_event
   and _pending_messages to avoid false positives. (#8221, #2483)

2. run.py (return path): Remove response_previewed guard from already_sent
   check. Stream consumer's already_sent alone is authoritative — if
   content was delivered via streaming, the duplicate send must be
   suppressed regardless of the agent's response_previewed flag. (#8375)

3. run.py (queued-message path): Same fix — already_sent without
   response_previewed now correctly marks the first response as already
   streamed, preventing re-send before processing the queued message.

The response_previewed field is still produced by the agent (run_agent.py)
but is no longer required as a gate for duplicate suppression. The stream
consumer's already_sent flag is the delivery-level truth about what the
user actually saw.

Concepts from PR #8380 (konsisumer). Closes #8375, #8221, #2483.
2026-04-15 03:42:24 -07:00
Teknium
7b2700c9af
fix(browser): use 127.0.0.1 instead of localhost for CDP default (#10231)
/browser connect set BROWSER_CDP_URL to http://localhost:9222, but
Chrome's --remote-debugging-port only binds to 127.0.0.1 (IPv4).
On macOS, 'localhost' can resolve to ::1 (IPv6) first, causing both
_resolve_cdp_override's /json/version fetch and agent-browser's
--cdp connection to fail when Chrome isn't listening on IPv6.

The socket check in the connect handler already used 127.0.0.1
explicitly and succeeded, masking the mismatch.

Use 127.0.0.1 in the default CDP URL to match what Chrome actually
binds to.
2026-04-15 03:29:37 -07:00
Teknium
a4e1842f12
fix: strip reasoning item IDs from Responses API input when store=False (#10217)
With store=False (our default for the Responses API), the API does not
persist response items.  When reasoning items with 'id' fields were
replayed on subsequent turns, the API attempted a server-side lookup
for those IDs and returned 404:

  Item with id 'rs_...' not found. Items are not persisted when store
  is set to false.

The encrypted_content blob is self-contained for reasoning chain
continuity — the id field is unnecessary and triggers the failed lookup.

Fix: strip 'id' from reasoning items in both _chat_messages_to_responses_input
(message conversion) and _preflight_codex_input_items (normalization layer).
The id is still used for local deduplication but never sent to the API.

Reported by @zuogl448 on GPT-5.4.
2026-04-15 03:19:43 -07:00
Teknium
e69526be79
fix(send_message): URL-encode Matrix room IDs and add Matrix to schema examples (#10151)
Matrix room IDs contain ! and : which must be percent-encoded in URI
path segments per the Matrix C-S spec. Without encoding, some
homeservers reject the PUT request.

Also adds 'matrix:!roomid:server.org' and 'matrix:@user:server.org'
to the tool schema examples so models know the correct target format.
2026-04-15 00:10:59 -07:00
Teknium
180b14442f test: add _parse_target_ref Matrix coverage for salvaged PR #6144 2026-04-15 00:08:14 -07:00
bkadish
03446e06bb fix(send_message): accept Matrix room IDs and user MXIDs as explicit targets
`_parse_target_ref` has explicit-reference branches for Telegram, Feishu,
and numeric IDs, but none for Matrix. As a result, callers of
`send_message(target="matrix:!roomid:server")` or
`send_message(target="matrix:@user:server")` fall through to
`(None, None, False)` and the tool errors out with a resolution failure —
even though a raw Matrix room ID or MXID is the most unambiguous possible
target.

Three-line fix: recognize `!…` as a room ID and `@…` as a user MXID when
platform is `matrix`, and return them as explicit targets. Alias-based
targets (`#…`) continue to go through the normal resolve path.
2026-04-15 00:08:14 -07:00
Teknium
df7be3d8ae
fix(cli): /model picker shows curated models instead of full catalog (#10146)
The /model picker called provider_model_ids() which fetches the FULL
live API catalog (hundreds of models for Anthropic, Copilot, etc.) and
only fell back to the curated list when the live fetch failed.

This flips the priority: use the curated model list from
list_authenticated_providers() (same lists as `hermes model` and
gateway pickers), falling back to provider_model_ids() only when the
curated list is empty (e.g. user-defined endpoints).
2026-04-15 00:07:50 -07:00
Ubuntu
da8bab77fb fix(cli): restore messaging toolset for gateway platforms 2026-04-14 23:13:35 -07:00
Teknium
9932366f3c feat(doctor): add Command Installation check for hermes bin symlink
hermes doctor now checks whether the ~/.local/bin/hermes symlink exists
and points to the correct venv entry point. With --fix, it creates or
repairs the symlink automatically.

Covers:
- Missing symlink at ~/.local/bin/hermes (or $PREFIX/bin on Termux)
- Symlink pointing to wrong target
- Missing venv entry point (venv/bin/hermes or .venv/bin/hermes)
- PATH warning when ~/.local/bin is not on PATH
- Skipped on Windows (different mechanism)

Addresses user report: 'python -m hermes_cli.main doesn't have an option
to fix the local bin/install'

10 new tests covering all scenarios.
2026-04-14 23:13:11 -07:00
Teknium
029938fbed
fix(cli): defensive subparser routing for argparse bpo-9338 (#10113)
On some Python versions, argparse fails to route subcommand tokens when
the parent parser has nargs='?' optional arguments (--continue).  The
symptom: 'hermes model' produces 'unrecognized arguments: model' even
though 'model' is a registered subcommand.

Fix: when argv contains a token matching a known subcommand, set
subparsers.required=True to force deterministic routing.  If that fails
(e.g. 'hermes -c model' where 'model' is consumed as the session name
for --continue), fall back to the default optional-subparsers behaviour.

Adds 13 tests covering all key argument combinations.

Reported via user screenshot showing the exact error on an installed
version with the model subcommand listed in usage but rejected at parse
time.
2026-04-14 23:13:02 -07:00
Teknium
772cfb6c4e
fix: stale agent timeout, uv venv detection, empty response after tools, compression model fallback (#9051, #8620, #9400) (#10093)
Four independent fixes:

1. Reset activity timestamp on cached agent reuse (#9051)
   When the gateway reuses a cached AIAgent for a new turn, the
   _last_activity_ts from the previous turn (possibly hours ago)
   carried over. The inactivity timeout handler immediately saw
   the agent as idle for hours and killed it.

   Fix: reset _last_activity_ts, _last_activity_desc, and
   _api_call_count when retrieving an agent from the cache.

2. Detect uv-managed virtual environments (#8620 sub-issue 1)
   The systemd unit generator fell back to sys.executable (uv's
   standalone Python) when running under 'uv run', because
   sys.prefix == sys.base_prefix. The generated ExecStart pointed
   to a Python binary without site-packages.

   Fix: check VIRTUAL_ENV env var before falling back to
   sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
   doesn't reflect the venv.

3. Nudge model to continue after empty post-tool response (#9400)
   Weaker models sometimes return empty after tool calls. The agent
   silently abandoned the remaining work.

   Fix: append assistant('(empty)') + user nudge message and retry
   once. Resets after each successful tool round.

4. Compression model fallback on permanent errors (#8620 sub-issue 4)
   When the default summary model (gemini-3-flash) returns 503
   'model_not_found' on custom proxies, the compressor entered a
   600s cooldown, leaving context growing unbounded.

   Fix: detect permanent model-not-found errors (503, 404,
   'model_not_found', 'no available channel') and fall back to
   using the main model for compression instead of entering
   cooldown. One-time fallback with immediate retry.

Test plan: 40 compressor tests + 97 gateway/CLI tests + 9 venv tests pass
2026-04-14 22:38:17 -07:00
Teknium
5d5d21556e
fix: sync client.api_key during UnicodeEncodeError ASCII recovery (#10090)
The existing recovery block sanitized self.api_key and
self._client_kwargs['api_key'] but did not update self.client.api_key.
The OpenAI SDK stores its own copy of api_key and reads it dynamically
via the auth_headers property on every request. Without this fix, the
retry after sanitization would still send the corrupted key in the
Authorization header, causing the same UnicodeEncodeError.

The bug manifests when an API key contains Unicode lookalike characters
(e.g. ʋ U+028B instead of v) from copy-pasting out of PDFs, rich-text
editors, or web pages with decorative fonts. httpx hard-encodes all
HTTP headers as ASCII, so the non-ASCII char in the Authorization
header triggers the error.

Adds TestApiKeyClientSync with two tests verifying:
- All three key locations are synced after sanitization
- Recovery handles client=None (pre-init) without crashing
2026-04-14 22:37:45 -07:00
kshitijk4poor
9855190f23 feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.

- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown

Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)

Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:25 -07:00
Teknium
50c35dcabe fix: stale agent timeout, uv venv detection, empty response after tools (#9051, #8620, #9400)
Three independent fixes:

1. Reset activity timestamp on cached agent reuse (#9051)
   When the gateway reuses a cached AIAgent for a new turn, the
   _last_activity_ts from the previous turn (possibly hours ago)
   carried over. The inactivity timeout handler immediately saw
   the agent as idle for hours and killed it.

   Fix: reset _last_activity_ts, _last_activity_desc, and
   _api_call_count when retrieving an agent from the cache.

2. Detect uv-managed virtual environments (#8620 sub-issue 1)
   The systemd unit generator fell back to sys.executable (uv's
   standalone Python) when running under 'uv run', because
   sys.prefix == sys.base_prefix (uv doesn't set up traditional
   venv activation). The generated ExecStart pointed to a Python
   binary without site-packages, crashing the service on startup.

   Fix: check VIRTUAL_ENV env var before falling back to
   sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
   doesn't reflect the venv.

3. Nudge model to continue after empty post-tool response (#9400)
   Weaker models (GLM-5, mimo-v2-pro) sometimes return empty
   responses after tool calls instead of continuing to the next
   step. The agent silently abandoned the remaining work with
   '(empty)' or used prior-turn fallback text.

   Fix: when the model returns empty after tool calls AND there's
   no prior-turn content to fall back on, inject a one-time user
   nudge message telling the model to process the tool results and
   continue. The flag resets after each successful tool round so it
   can fire again on later rounds.

Test plan: 97 gateway + CLI tests pass, 9 venv detection tests pass
2026-04-14 22:16:02 -07:00
Teknium
93fe4ead83
fix: warn on invalid context_length format in config.yaml (#10067)
Previously, non-integer context_length values (e.g. '256K') in
config.yaml were silently ignored, causing the agent to fall back
to 128K auto-detection with no user feedback. This was confusing
for users with custom LiteLLM endpoints expecting larger context.

Now prints a clear stderr warning and logs at WARNING level when
model.context_length or custom_providers[].models.<model>.context_length
cannot be parsed as an integer, telling users to use plain integers
(e.g. 256000 instead of '256K').

Reported by community user ChFarhan via Discord.
2026-04-14 22:14:27 -07:00
Teknium
a8b7db35b2
fix: interrupt agent immediately when user messages during active run (#10068)
When a user sends a message while the agent is executing a task on the
gateway, the agent is now interrupted immediately — not silently queued.
Previously, messages were stored in _pending_messages with zero feedback
to the user, potentially leaving them waiting 1+ hours.

Root cause: Level 1 guard (base.py) intercepted all messages for active
sessions and returned with no response. Level 2 (gateway/run.py) which
calls agent.interrupt() was never reached.

Fix: Expand _handle_active_session_busy_message to handle the normal
(non-draining) case:
  1. Call running_agent.interrupt(text) to abort in-flight tool calls
     and exit the agent loop at the next check point
  2. Store the message as pending so it becomes the next turn once the
     interrupted run returns
  3. Send a brief ack: 'Interrupting current task (10 min elapsed,
     iteration 21/60, running: terminal). I'll respond shortly.'
  4. Debounce acks to once per 30s to avoid spam on rapid messages

Reported by @Lonely__MH.
2026-04-14 22:07:28 -07:00
Teknium
8548893d14
feat: entry-level Podman support — find_docker() + rootless entrypoint (#10066)
- find_docker() now checks HERMES_DOCKER_BINARY env var first, then
  docker on PATH, then podman on PATH, then macOS known locations
- Entrypoint respects HERMES_HOME env var (was hardcoded to /opt/data)
- Entrypoint uses groupmod -o to tolerate non-unique GIDs (fixes macOS
  GID 20 conflict with Debian's dialout group)
- Entrypoint makes chown best-effort so rootless Podman continues
  instead of failing with 'Operation not permitted'
- 5 new tests covering env var override, podman fallback, precedence

Based on work by alanjds (PR #3996) and malaiwah (PR #8115).
Closes #4084.
2026-04-14 21:20:37 -07:00