Commit graph

21 commits

Author SHA1 Message Date
Teknium
7fd508979e fix: harden sync_back — PID-suffix temp path, size cap, lifecycle guards
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018:

tools/environments/daytona.py
  - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls
    against the same sandbox don't collide on the remote temp path
  - Move sync_back() inside the cleanup lock and after the _sandbox-None
    guard, with its own try/except. Previously a no-op cleanup (sandbox
    already cleared) still fired sync_back → 3-attempt retry storm against
    a nil sandbox (~6s of sleep). Now short-circuits cleanly.

tools/environments/file_sync.py
  - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a
    tar larger than the limit. Protects against runaway sandboxes
    producing arbitrary-size archives.
  - Add 'nothing previously pushed' guard at the top of sync_back(). If
    _pushed_hashes and _synced_files are both empty, the FileSyncManager
    was never initialized from the host side — there is nothing coherent
    to sync back. Skips the retry/backoff machinery on uninitialized
    managers and eliminates test-suite slowdown from pre-existing cleanup
    tests that don't mock the sync layer.

tests/tools/test_file_sync_back.py
  - Update _make_manager helper to seed a _pushed_hashes entry by default
    so sync_back() exercises its real path. A seed_pushed_state=False
    opt-out is available for noop-path tests.
  - Add TestSyncBackSizeCap with positive and negative coverage of the
    new cap.

tests/tools/test_sync_back_backends.py
  - Update Daytona bulk download test to assert the PID-suffixed path
    pattern instead of the fixed /tmp/.hermes_sync.tar.
2026-04-16 19:39:21 -07:00
kshitijk4poor
d64446e315 feat(file-sync): sync remote changes back to host on teardown
Salvage of PR #8018 by @alt-glitch onto current main.

On sandbox teardown, FileSyncManager now downloads the remote .hermes/
directory, diffs against SHA-256 hashes of what was originally pushed,
and applies only changed files back to the host.

Core (tools/environments/file_sync.py):
- sync_back(): orchestrates download -> unpack -> diff -> apply with:
  - Retry with exponential backoff (3 attempts, 2s/4s/8s)
  - SIGINT trap + defer (prevents partial writes on Ctrl-C)
  - fcntl.flock serialization (concurrent gateway sandboxes)
  - Last-write-wins conflict resolution with warning
  - New remote files pulled back via _infer_host_path prefix matching

Backends:
- SSH: _ssh_bulk_download — tar cf - piped over SSH
- Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read
- Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file
- All three call sync_back() at the top of cleanup()

Fixes applied during salvage (vs original PR #8018):

| # | Issue | Fix |
|---|-------|-----|
| C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None |
| W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError |
| W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer |
| W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports |
| W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root |
| S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) |
| S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) |

Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT
main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup).

Based on #8018 by @alt-glitch.
2026-04-16 19:39:21 -07:00
Siddharth Balyan
27eeea0555
perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive (#8014)
* perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive

SSH: symlink-staging + tar -ch piped over SSH in a single TCP stream.
Eliminates per-file scp round-trips. Handles timeout (kills both
processes), SSH Popen failure (kills tar), and tar create failure.

Modal: in-memory gzipped tar archive, base64-encoded, decoded+extracted
in one exec call. Checks exit code and raises on failure.

Both backends use shared helpers extracted into file_sync.py:
- quoted_mkdir_command() — mirrors existing quoted_rm_command()
- unique_parent_dirs() — deduplicates parent dirs from file pairs

Migrates _ensure_remote_dirs to use the new helpers.

28 new tests (21 SSH + 7 Modal), all passing.

Closes #7465
Closes #7467

* fix(modal): pipe stdin to avoid ARG_MAX, clean up review findings

- Modal bulk upload: stream base64 payload through proc.stdin in 1MB
  chunks instead of embedding in command string (Modal SDK enforces
  64KB ARG_MAX_BYTES — typical payloads are ~4.3MB)
- Modal single-file upload: same stdin fix, add exit code checking
- Remove what-narrating comments in ssh.py and modal.py (keep WHY
  comments: symlink staging rationale, SIGPIPE, deadlock avoidance)
- Remove unnecessary `sandbox = self._sandbox` alias in modal bulk
- Daytona: use shared helpers (unique_parent_dirs, quoted_mkdir_command)
  instead of inlined duplicates

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-04-12 06:18:05 +05:30
Hermes Agent
830040f937 fix: remove unused BulkUploadFn import from daytona.py 2026-04-10 21:14:32 -07:00
Hermes Agent
223a0623ee fix(daytona): use logger.warning instead of warnings.warn for disk cap
warnings.warn() is suppressed/invisible when running as a gateway
or agent. Switch to logger.warning() so the disk cap message
actually appears in logs.

Fixes #7362 (item 3).
2026-04-10 21:14:32 -07:00
Hermes Agent
bff64858f9 perf(daytona): bulk upload files in single HTTP call
FileSyncManager now accepts an optional bulk_upload_fn callback.
When provided, all changed files are uploaded in one call instead
of iterating one-by-one with individual HTTP POSTs.

DaytonaEnvironment wires this to sandbox.fs.upload_files() which
batches everything into a single multipart POST — ~580 files goes
from ~5 min to <2s on init.

Parent directories are pre-created in one mkdir -p call.

Fixes #7362 (item 1).
2026-04-10 21:14:32 -07:00
alt-glitch
96c060018a fix: remove 115 verified dead code symbols across 46 production files
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).

Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
2026-04-10 03:44:43 -07:00
alt-glitch
1f1f297528 feat(environments): unified file sync with change tracking and deletion
Replace per-backend ad-hoc file sync with a shared FileSyncManager
that handles mtime-based change detection, remote deletion of
locally-removed files, and transactional state updates.

- New FileSyncManager class (tools/environments/file_sync.py)
  with callbacks for upload/delete, rate limiting, and rollback
- Shared iter_sync_files() eliminates 3 duplicate implementations
- SSH: replace unconditional rsync with scp + mtime skip
- Modal/Daytona: replace inline _synced_files dict with manager
- All 3 backends now sync credentials + skills + cache uniformly
- Remote deletion: files removed locally are cleaned from remote
- HERMES_FORCE_FILE_SYNC=1 env var for debugging
- Base class _before_execute() simplified to empty hook
- 12 unit tests covering mtime skip, deletion, rollback, rate limiting
2026-04-10 03:01:46 -07:00
alt-glitch
d684d7ee7e feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.

Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
  functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
  _save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
  migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)

Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 17:23:15 -07:00
Teknium
d0ffb111c2
refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821)
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.

Changes by category:

Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
  (vs availability checks which were left alone)

Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
  source, new_server_names, verify, pconfig, default_terminal,
  result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
  (12 instances of add_parser() where result was never used)

Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
  surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
  module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction

Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports

Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
  they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values

Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
  x.startswith('a') or x.startswith('b') consolidated to
  x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
  truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
  send_message_tool.py

Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls

Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
2026-04-07 10:25:31 -07:00
Teknium
5148682b43
feat: mount skills directory into all remote backends with live sync (#3890)
Skills with scripts/, templates/, and references/ subdirectories need
those files available inside sandboxed execution environments. Previously
the skills directory was missing entirely from remote backends.

Live sync — files stay current as credentials refresh and skills update:
- Docker/Singularity: bind mounts are inherently live (host changes
  visible immediately)
- Modal: _sync_files() runs before each command with mtime+size caching,
  pushing only changed credential and skill files (~13μs no-op overhead)
- SSH: rsync --safe-links before each command (naturally incremental)
- Daytona: _upload_if_changed() with mtime+size caching before each command

Security — symlink filtering:
- Docker/Singularity: sanitized temp copy when symlinks detected
- Modal/Daytona: iter_skills_files() skips symlinks
- SSH: rsync --safe-links skips symlinks pointing outside source tree
- Temp dir cleanup via atexit + reuse across calls

Non-root user support:
- SSH: detects remote home via echo $HOME, syncs to $HOME/.hermes/
- Daytona: detects sandbox home before sync, uploads to $HOME/.hermes/
- Docker/Modal/Singularity: run as root, /root/.hermes/ is correct

Also:
- credential_files.py: fix name/path key fallback in required_credential_files
- Singularity, SSH, Daytona: gained credential file support
- 14 tests covering symlink filtering, name/path fallback, iter_skills_files
2026-03-30 02:45:41 -07:00
rovle
18862145e4 fix(daytona): migrate sandbox lookup from find_one to get/list
find_one is being deprecated. Primary lookup now uses get() with a
deterministic sandbox name (hermes-{task_id}). A legacy fallback via
list(labels=...) ensures sandboxes created before this migration are
still resumable.
2026-03-19 17:54:46 +01:00
johnh4098
e9742e202f fix(security): pipe sudo password via stdin instead of shell cmdline 2026-03-10 06:34:59 -07:00
Himess
7a0544ab57
fix: three small inconsistencies across cron, gateway, and daytona
1. cron/jobs.py: respect HERMES_HOME env var for job storage path.
   scheduler.py already uses os.getenv("HERMES_HOME", ...) but jobs.py
   hardcodes Path.home() / ".hermes", causing path mismatch when
   HERMES_HOME is set.

2. gateway/run.py: add Platform.HOMEASSISTANT to default_toolset_map
   and platform_config_key. The adapter and hermes-homeassistant
   toolset both exist but the mapping dicts omit it, so HomeAssistant
   events silently fall back to the Telegram toolset.

3. tools/environments/daytona.py: use time.monotonic() for deadline
   instead of float subtraction. All other backends (docker, ssh,
   singularity, local) use monotonic clock for timeout tracking.
   The accumulator pattern (deadline -= 0.2) drifts because
   t.join(0.2) + interrupt checks take longer than 0.2s per iteration.
2026-03-06 16:52:17 +03:00
rovle
a6499b6107 fix(daytona): use shell timeout wrapper instead of broken SDK exec timeout
The Daytona SDK's process.exec(timeout=N) parameter is not enforced —
the server-side timeout never fires and the SDK has no client-side
fallback, causing commands to hang indefinitely.

Fix: wrap commands with timeout N sh -c '...' (coreutils) which
reliably kills the process and returns exit code 124. Added
shlex.quote for proper shell escaping and a secondary deadline (timeout + 10s) that force-stops the sandbox if the shell timeout somehow fails.

Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 13:12:41 -08:00
rovle
efc7a7b957 fix(daytona): don't guess /root on cwd probe failure, keep constructor default; update tests to reflect this
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:49:35 -08:00
rovle
4f1464b3af fix(daytona): default disk to 10GB to match platform limit
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:37:30 -08:00
rovle
577da79a47 fix(daytona): make disk cap visible and use SDK enum for sandbox
state

- Replace logger.warning with warnings.warn for the disk cap so users
  actually see it (logger was suppressed by CLI's log level config)
- Use SandboxState enum instead of string literals in
_ensure_sandbox_ready

Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:03:39 -08:00
rovle
1faa9648d3 chore(daytona): cap the disk size to current maximum on daytona sandboxes
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 10:43:41 -08:00
rovle
435530018b fix(daytona): resolve cwd by detecting home directory inside the sandbox 2026-03-05 10:02:22 -08:00
rovle
1e312c6582 feat(environments): add Daytona cloud sandbox backend
New execution backend using the Daytona Python SDK. Supports persistent
sandboxes via stop/start lifecycle, interrupt handling, and automatic
retry on transient errors.

Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 10:02:21 -08:00