When the agent calls process(action='wait') or process(action='poll')
and gets the exited status, the completion_queue notification is
redundant — the agent already has the output from the tool return.
Previously, the drain loops in CLI and gateway would still inject
the [SYSTEM: Background process completed] message, causing the
agent to receive the same information twice.
Fix: track session IDs in _completion_consumed set when wait/poll/log
returns an exited process. Drain loops in cli.py and gateway watcher
skip completion events for consumed sessions. Watch pattern events
are never suppressed (they have independent semantics).
Adds 4 tests covering wait/poll/log marking and running-process
negative case.
* perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive
SSH: symlink-staging + tar -ch piped over SSH in a single TCP stream.
Eliminates per-file scp round-trips. Handles timeout (kills both
processes), SSH Popen failure (kills tar), and tar create failure.
Modal: in-memory gzipped tar archive, base64-encoded, decoded+extracted
in one exec call. Checks exit code and raises on failure.
Both backends use shared helpers extracted into file_sync.py:
- quoted_mkdir_command() — mirrors existing quoted_rm_command()
- unique_parent_dirs() — deduplicates parent dirs from file pairs
Migrates _ensure_remote_dirs to use the new helpers.
28 new tests (21 SSH + 7 Modal), all passing.
Closes#7465Closes#7467
* fix(modal): pipe stdin to avoid ARG_MAX, clean up review findings
- Modal bulk upload: stream base64 payload through proc.stdin in 1MB
chunks instead of embedding in command string (Modal SDK enforces
64KB ARG_MAX_BYTES — typical payloads are ~4.3MB)
- Modal single-file upload: same stdin fix, add exit code checking
- Remove what-narrating comments in ssh.py and modal.py (keep WHY
comments: symlink staging rationale, SIGPIPE, deadlock avoidance)
- Remove unnecessary `sandbox = self._sandbox` alias in modal bulk
- Daytona: use shared helpers (unique_parent_dirs, quoted_mkdir_command)
instead of inlined duplicates
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
The check_interval parameter on terminal_tool sent periodic output
updates to the gateway chat, but these were display-only — the agent
couldn't see or act on them. This added schema bloat and introduced
a bug where notify_on_complete=True was silently dropped when
check_interval was also set (the not-check_interval guard skipped
fast-watcher registration, and the check_interval watcher dict
was missing the notify_on_complete key).
Removing check_interval entirely:
- Eliminates the notify_on_complete interaction bug
- Reduces tool schema size (one fewer parameter for the model)
- Simplifies the watcher registration path
- notify_on_complete (agent wake-on-completion) still works
- watch_patterns (output alerting) still works
- process(action='poll') covers manual status checking
Closes#7947 (root cause eliminated rather than patched).
Deduplicate todo items by ID before writing to the store, keeping the
last occurrence. Prevents ghost entries when the model sends duplicate
IDs in a single write() call, which corrupts subsequent merge operations.
Co-authored-by: WAXLYY <WAXLYY@users.noreply.github.com>
Add display.interim_assistant_messages config (enabled by default) that
forwards completed assistant commentary between tool calls to the user
as separate chat messages. Models already emit useful status text like
'I'll inspect the repo first.' — this surfaces it on Telegram, Discord,
and other messaging platforms instead of swallowing it.
Independent from tool_progress and gateway streaming. Disabled for
webhooks. Uses GatewayStreamConsumer when available, falls back to
direct adapter send. Tracks response_previewed to prevent double-delivery
when interim message matches the final response.
Also fixes: cursor not stripped from fallback prefix in stream consumer
(affected continuation calculation on no-edit platforms like Signal).
Cherry-picked from PR #7885 by asheriif, default changed to enabled.
Fixes#5016
Adds _normalize_path() helper that calls expanduser().resolve() to
properly handle tilde paths (e.g. ~/.hermes, ~/.config). Previously
Path.resolve() alone treated ~ as a literal directory name, producing
invalid paths like /root/~/.hermes.
Also improves _run_git() error handling to distinguish missing working
directories from missing git executable, and adds pre-flight directory
validation.
Cherry-picked from PR #7898 by faishal882.
Fixes#7807
_write_to_sandbox interpolated storage_dir and remote_path directly into
a shell command passed to env.execute(). Paths containing shell
metacharacters (spaces, semicolons, $(), backticks) could trigger
arbitrary command execution inside the sandbox.
Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric +
slashes/hyphens/dots) are left unmodified by shlex.quote, so existing
behavior is unchanged. Paths with unsafe characters get single-quoted.
Tests added for spaces, $(command) substitution, and semicolon injection.
This commit addresses a security vulnerability where unsanitized user inputs for commit_hash and file_path were passed directly to git commands in CheckpointManager.restore() and diff(). It validates commit hashes to be strictly hexadecimal characters without leading dashes (preventing flag injection like '--patch') and enforces file paths to stay within the working directory via root resolution. Regression tests test_restore_rejects_argument_injection, test_restore_rejects_invalid_hex_chars, and test_restore_rejects_path_traversal were added.
The interrupt mechanism in tools/interrupt.py used a process-global
threading.Event. In the gateway, multiple agents run concurrently in
the same process via run_in_executor. When any agent was interrupted
(user sends a follow-up message), the global flag killed ALL agents'
running tools — terminal commands, browser ops, web requests — across
all sessions.
Changes:
- tools/interrupt.py: Replace single threading.Event with a set of
interrupted thread IDs. set_interrupt() targets a specific thread;
is_interrupted() checks the current thread. Includes a backward-
compatible _ThreadAwareEventProxy for legacy _interrupt_event usage.
- run_agent.py: Store execution thread ID at start of run_conversation().
interrupt() and clear_interrupt() pass it to set_interrupt() so only
this agent's thread is affected.
- tools/code_execution_tool.py: Use is_interrupted() instead of
directly checking _interrupt_event.is_set().
- tools/process_registry.py: Same — use is_interrupted().
- tests: Update interrupt tests for per-thread semantics. Add new
TestPerThreadInterruptIsolation with two tests verifying cross-thread
isolation.
When a Python process exits uncleanly (SIGKILL, crash, gateway restart
via hermes update), in-memory _active_sessions tracking is lost but the
agent-browser node daemons and their Chromium child processes keep
running indefinitely. On a long-running system this causes unbounded
memory growth — 24 orphaned sessions consumed 7.6 GB on a production
machine over 9 days.
Add _reap_orphaned_browser_sessions() which scans the tmp directory for
agent-browser-{h_*,cdp_*} socket dirs on cleanup thread startup. For
each dir not tracked by the current process, reads the daemon PID file
and sends SIGTERM if the daemon is still alive. Handles edge cases:
dead PIDs, corrupt PID files, permission errors, foreign processes.
The reaper runs once on thread startup (not every 30s) to avoid races
with sessions being actively created by concurrent agents.
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes#6341, #6485, #7643
Relates to #6516, #7392
Independent halving of width and height caused aspect ratio distortion
for extreme dimensions (e.g. 8000x200 panoramas). When one axis hit the
64px floor, the other kept shrinking — collapsing the ratio toward 1:1.
Use proportional scaling instead: when either dimension hits the floor,
derive the effective scale factor and apply it to both axes.
Add tests for extreme panorama (8000x200) and tall narrow (200x6000)
images to verify aspect ratio preservation.
Cherry-picked from PR #7749 by kshitijk4poor with modifications:
- Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider)
- Send images at full resolution first; only auto-resize to 5 MB on API failure
- Add _is_image_size_error() helper to detect size-related API rejections
- Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction
- Fix get_model_capabilities() to check modalities.input for vision support
- Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent)
- Applied retry-with-resize to both vision_analyze_tool and browser_vision
Closes#7740
* feat: add watch_patterns to background processes for output monitoring
Adds a new 'watch_patterns' parameter to terminal(background=true) that
lets the agent specify strings to watch for in process output. When a
matching line appears, a notification is queued and injected as a
synthetic message — triggering a new agent turn, similar to
notify_on_complete but mid-process.
Implementation:
- ProcessSession gets watch_patterns field + rate-limit state
- _check_watch_patterns() in ProcessRegistry scans new output chunks
from all three reader threads (local, PTY, env-poller)
- Rate limited: max 8 notifications per 10s window
- Sustained overload (45s) permanently disables watching for that process
- watch_queue alongside completion_queue, same consumption pattern
- CLI drains watch_queue in both idle loop and post-turn drain
- Gateway drains after agent runs via _inject_watch_notification()
- Checkpoint persistence + crash recovery includes watch_patterns
- Blocked in execute_code sandbox (like other bg params)
- 20 new tests covering matching, rate limiting, overload kill,
checkpoint persistence, schema, and handler passthrough
Usage:
terminal(
command='npm run dev',
background=true,
watch_patterns=['ERROR', 'WARN', 'listening on port']
)
* refactor: merge watch_queue into completion_queue
Unified queue with 'type' field distinguishing 'completion',
'watch_match', and 'watch_disabled' events. Extracted
_format_process_notification() in CLI and gateway to handle
all event types in a single drain loop. Removes duplication
across both CLI drain sites and the gateway.
Cover all public functions with 50 test cases:
- managed_nous_tools_enabled() feature flag toggling
- normalize_browser_cloud_provider() coercion and defaults
- coerce_modal_mode() / normalize_modal_mode() validation
- has_direct_modal_credentials() env vars and config file detection
- resolve_modal_backend_state() full backend selection matrix
- resolve_openai_audio_api_key() priority chain and edge cases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow-up fixes for cherry-pick conflicts:
- Removed test_context_keeps_pending_approval test that referenced
pop_pending() which doesn't exist on current main
- Added headers attribute to FakeResponse in vision test (needed
after #6949 added Content-Length check)
Three fixes for vision_analyze returning cryptic 400 "Invalid request data":
1. Pre-flight base64 size check — base64 inflates data ~33%, so a 3.8 MB
file exceeds the 5 MB API limit. Reject early with a clear message
instead of letting the provider return a generic 400.
2. Handle file:// URIs — strip the scheme and resolve as a local path.
Previously file:///path/to/image.png fell through to the "invalid
image source" error since it matched neither is_file() nor http(s).
3. Separate invalid_request errors from "does not support vision" errors
so the user gets actionable guidance (resize/compress/retry) instead
of a misleading "model does not support vision" message.
Closes#6677
_discover_bundled_skills() used the directory name to identify skills,
but skills_tool.py and skills_hub.py use the `name:` field from SKILL.md
frontmatter. This mismatch caused 9 builtin skills whose directory name
differs from their SKILL.md name to be written to .bundled_manifest
under the wrong key, so `hermes skills list` showed them as "local"
instead of "builtin".
Read the frontmatter name field (with directory-name fallback) so the
manifest keys match what the rest of the codebase expects.
Closes#6835
- Remove unreachable `if not content_sample` branch inside the truthy
`if content_sample` block in `_is_likely_binary()` (dead code that
could never execute).
- Replace `linter_cmd.format(file=...)` with `linter_cmd.replace("{file}", ...)`
in `_check_lint()` so file paths containing curly braces (e.g.
`src/{test}.py`) no longer raise KeyError/ValueError.
- Add 16 unit tests covering both fixes and edge cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add delegation.reasoning_effort config key so subagents can run at a
different thinking level than the parent agent. When set, overrides
the parent's reasoning_config; when empty, inherits as before.
Valid values: xhigh, high, medium, low, minimal, none (disables thinking).
Config path: delegation.reasoning_effort in config.yaml
Files changed:
- tools/delegate_tool.py: resolve override in _build_child_agent
- hermes_cli/config.py: add reasoning_effort to DEFAULT_CONFIG
- tests/tools/test_delegate.py: 4 new tests covering all cases
- Add shared is_wsl() to hermes_constants (like is_termux)
- Update supports_systemd_services() to verify systemd is actually
running on WSL before returning True
- Add WSL-specific guidance in gateway install/start/setup/status
for both cases: WSL+systemd and WSL without systemd
- Improve help strings: 'run' now says recommended for WSL/Docker,
'start'/'install' now mention systemd/launchd explicitly
- Add WSL gateway FAQ section with tmux/nohup/Task Scheduler tips
- Update CLI commands docs with WSL tip
- Deduplicate _is_wsl() from clipboard.py to shared hermes_constants
- Fix clipboard tests to reset hermes_constants cache
- 20 new WSL-specific tests covering detection, systemd check,
supports_systemd_services integration, and command output
Motivated by user feedback: took 1 hour to figure out run vs start
on WSL, Telegram bot kept disconnecting due to flaky WSL systemd.
Cover the three key behaviors:
- bulk_upload_fn is called instead of per-file upload_fn
- Fallback to upload_fn when bulk_upload_fn is None
- Rollback on bulk upload failure retries all files
Add 9 tests covering the full zombie process prevention chain:
- TestZombieReproduction: demonstrates that processes survive when
references are dropped without explicit cleanup (the original bug)
- TestAgentCloseMethod: verifies close() calls all cleanup functions,
is idempotent, propagates to children, and continues cleanup even
when individual steps fail
- TestGatewayCleanupWiring: verifies stop() calls close() and that
_evict_cached_agent() does NOT call close() (since it's also used
for non-destructive cache refreshes)
- TestDelegationCleanup: calls the real _run_single_child function and
verifies close() is called on the child agent
Ref: #7131
- Bug 1: replace read_file(limit=10000) with read_file_raw in _apply_update,
preventing silent truncation of files >2000 lines and corruption of lines
>2000 chars; add read_file_raw to FileOperations abstract interface and
ShellFileOperations
- Bug 2: split apply_v4a_operations into validate-then-apply phases; if any
hunk fails validation, zero writes occur (was: continue after failure,
leaving filesystem partially modified)
- Bug 3: parse_v4a_patch now returns an error for begin-marker-with-no-ops,
empty file paths, and moves missing a destination (was: always returned
error=None)
- Bug 4: raise strategy 7 (block anchor) single-candidate similarity threshold
from 0.10 to 0.50, eliminating false-positive matches in repetitive code
- Bug 5: add _strategy_unicode_normalized (new strategy 7) with position
mapping via _build_orig_to_norm_map; smart quotes and em-dashes in
LLM-generated patches now match via strategies 1-6 before falling through
to fuzzy strategies
- Bug 6: extend fuzzy_find_and_replace to return 4-tuple (content, count,
error, strategy); update all 5 call sites across patch_parser.py,
file_operations.py, and skill_manager_tool.py
- Bug 7: guard in _apply_update returns error when addition-only context hint
is ambiguous (>1 occurrences); validation phase errors on both 0 and >1
- Bug 8: _apply_delete returns error (not silent success) on missing file
- Bug 9: _validate_operations checks source existence and destination absence
for MOVE operations before any write occurs
`delegate_task` silently truncated batch tasks to 3 — the model sends
5 tasks, gets results for 3, never told 2 were dropped. Now returns a
clear tool_error explaining the limit and how to fix it.
The limit is configurable via:
- delegation.max_concurrent_children in config.yaml (priority 1)
- DELEGATION_MAX_CONCURRENT_CHILDREN env var (priority 2)
- default: 3
Uses the same _load_config() path as the rest of delegate_task for
consistent config priority. Clamps to min 1, warns on non-integer
config values.
Also removes the hardcoded maxItems: 3 from the JSON schema — the
schema was blocking the model from even attempting >3 tasks before
the runtime check could fire. The runtime check gives a much more
actionable error message.
Backwards compatible: default remains 3, existing configs unchanged.
When delegate_task runs, the parent agent's activity tracker freezes
because child.run_conversation() blocks and the child's own
_touch_activity() never propagates back to the parent. The gateway
inactivity timeout then fires a spurious 'No activity' warning and
eventually kills the agent, even though the subagent is actively working.
Fix: add a heartbeat thread in _run_single_child that calls
parent._touch_activity() every 30 seconds with detail from the child's
activity summary (current tool, iteration count). The thread is a daemon
that starts before child.run_conversation() and is cleaned up in the
finally block.
This also improves the gateway 'Still working...' status messages —
instead of just 'running: delegate_task', users now see what the
subagent is actually doing (e.g., 'delegate_task: subagent running
terminal (iteration 5/50)').
Four gaps in DANGEROUS_PATTERNS found by running 10 targeted tests that
each mapped to a specific pattern in approval.py and checked whether the
documented defense actually held.
1. **Heredoc script injection** — `python3 << 'EOF'` bypasses the
existing `-e`/`-c` flag pattern. Adds pattern for interpreter + `<<`
covering python{2,3}, perl, ruby, node.
2. **PID expansion self-termination** — `kill -9 $(pgrep hermes)` is
opaque to the existing `pkill|killall` + name pattern because command
substitution is not expanded at detection time. Adds structural
patterns matching `kill` + `$(pgrep` and backtick variants.
3. **Git destructive operations** — `git reset --hard`, `push --force`,
`push -f`, `clean -f*`, and `branch -D` were entirely absent.
Note: `branch -d` also triggers because IGNORECASE is global —
acceptable since -d is still a delete, just a safe one, and the
prompt is only a confirmation, not a hard block.
4. **chmod +x then execute** — two-step social engineering where a
script containing dangerous commands is first written to disk (not
checked by write_file), then made executable and run as `./script`.
Pattern catches `chmod +x ... [;&|]+ ./` combos. Does not solve the
deeper architectural issue (write_file not checking content) — that
is called out in the PR description as a known limitation.
Tests: 23 new cases across 4 test classes, all in test_approval.py:
- TestHeredocScriptExecution (7 cases, incl. regressions for -c)
- TestPgrepKillExpansion (5 cases, incl. safe kill PID negative)
- TestGitDestructiveOps (8 cases, incl. safe git status/push negatives)
- TestChmodExecuteCombo (3 cases, incl. safe chmod-only negative)
Full suite: 146 passed, 0 failed.
When kill_process() sends SIGTERM, both it and the reader thread race
to call _move_to_finished() — kill_process sets exit_code=-15 and
enqueues a notification, then the reader thread's process.wait()
returns with exit_code=143 (128+SIGTERM) and enqueues a second one.
Fix: make _move_to_finished() idempotent by tracking whether the
session was actually removed from _running. The second call sees it
was already moved and skips the completion_queue.put().
Adds regression test: test_move_to_finished_idempotent_no_duplicate
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).
Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).
Co-authored-by: alt-glitch <balyan.sid@gmail.com>
When an MCP server returns both content (model-oriented text) and
structuredContent (machine-oriented JSON), the client now combines
them instead of discarding content. The text content becomes the
primary result (what the agent reads), and structuredContent is
included as supplementary metadata.
Previously, structuredContent took full precedence — causing data
loss for servers like Desktop Commander that put the actual file
text in content and metadata in structuredContent.
MCP spec guidance: for conversational/agent UX, prefer content.
Legacy flat stt.model config key (from cli-config.yaml.example and older
versions) was passed as a model override to transcribe_audio() by the
gateway, bypassing provider-specific model resolution. When the provider
was 'local' (faster-whisper), this caused:
ValueError: Invalid model size 'whisper-1'
Changes:
- gateway/run.py, discord.py: stop passing model override — let
transcribe_audio() handle provider-specific model resolution internally
- get_stt_model_from_config(): now provider-aware, reads from the correct
nested section (stt.local.model, stt.openai.model, etc.); ignores
legacy flat key for local provider to prevent model name mismatch
- cli-config.yaml.example: updated STT section to show nested provider
config structure instead of legacy flat key
- config migration v13→v14: moves legacy stt.model to the correct
provider section and removes the flat key
Reported by community user on Discord.
Add Discord thread support to cron delivery and send_message_tool.
- _parse_target_ref: handle discord platform with chat_id:thread_id format
- _send_discord: add thread_id param, route to /channels/{thread_id}/messages
- _send_to_platform: pass thread_id through for Discord
- Discord adapter send(): read thread_id from metadata for gateway path
- Update tool schema description to document Discord thread targets
Cherry-picked from PR #7046 by pandacooming (maxyangcn).
Follow-up fixes:
- Restore proxy support (resolve_proxy_url/proxy_kwargs_for_aiohttp) that was
accidentally deleted — would have caused NameError at runtime
- Remove duplicate _DISCORD_TARGET_RE regex; reuse existing _TELEGRAM_TOPIC_TARGET_RE
via _NUMERIC_TOPIC_RE alias (identical pattern)
- Fix misleading test comments about Discord negative snowflake IDs
(Discord uses positive snowflakes; negative IDs are a Telegram convention)
- Rewrite misleading scheduler test that claimed to exercise home channel
fallback but actually tested the explicit platform:chat_id parsing path
- Modal snapshot tests: accept **kw in iter_skills_files/iter_cache_files
mock lambdas to match new container_base kwarg
- SSH preflight test: mock _detect_remote_home, _ensure_remote_dirs,
init_session, and FileSyncManager added in file sync PR
Replace per-backend ad-hoc file sync with a shared FileSyncManager
that handles mtime-based change detection, remote deletion of
locally-removed files, and transactional state updates.
- New FileSyncManager class (tools/environments/file_sync.py)
with callbacks for upload/delete, rate limiting, and rollback
- Shared iter_sync_files() eliminates 3 duplicate implementations
- SSH: replace unconditional rsync with scp + mtime skip
- Modal/Daytona: replace inline _synced_files dict with manager
- All 3 backends now sync credentials + skills + cache uniformly
- Remote deletion: files removed locally are cleaned from remote
- HERMES_FORCE_FILE_SYNC=1 env var for debugging
- Base class _before_execute() simplified to empty hook
- 12 unit tests covering mtime skip, deletion, rollback, rate limiting
Change behavior from silent clamping to returning an error when the
model requests a foreground timeout exceeding FOREGROUND_MAX_TIMEOUT.
This forces the model to use background=true for long-running commands
rather than silently changing its intent.
- Config default timeouts above the cap are NOT rejected (user's choice)
- Only explicit model-requested timeouts trigger rejection
- Added boundary test for timeout exactly at the limit