hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Yukipukii1	40619b393f	tools: normalize file tool pagination bounds	2026-04-22 06:11:41 -07:00
Teknium	9c9d9b7ddf	feat(delegate): cross-agent file state coordination for concurrent subagents (#13718 ) * feat(models): hide OpenRouter models that don't advertise tool support Port from Kilo-Org/kilocode#9068. hermes-agent is tool-calling-first — every provider path assumes the model can invoke tools. Models whose OpenRouter supported_parameters doesn't include 'tools' (e.g. image-only or completion-only models) cannot be driven by the agent loop and fail at the first tool call. Filter them out of fetch_openrouter_models() so they never appear in the model picker (`hermes model`, setup wizard, /model slash command). Permissive when the field is missing — OpenRouter-compatible gateways (Nous Portal, private mirrors, older snapshots) don't always populate supported_parameters. Treat missing as 'unknown → allow' rather than silently emptying the picker on those gateways. Only hide models whose supported_parameters is an explicit list that omits tools. Tests cover: tools present → kept, tools absent → dropped, field missing → kept, malformed non-list → kept, non-dict item → kept, empty list → dropped. * feat(delegate): cross-agent file state coordination for concurrent subagents Prevents mangled edits when concurrent subagents touch the same file (same process, same filesystem — the mangle scenario from #11215). Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1: 1. FileStateRegistry (tools/file_state.py) — process-wide singleton tracking per-agent read stamps and the last writer globally. check_stale() names the sibling subagent in the warning when a non-owning agent wrote after this agent's last read. 2. Per-path threading.Lock wrapped around the read-modify-write region in write_file_tool and patch_tool. Concurrent siblings on the same path serialize; different paths stay fully parallel. V4A multi-file patches lock in sorted path order (deadlock-free). 3. Delegate-completion reminder in tools/delegate_tool.py: after a subagent returns, writes_since(parent, child_start, parent_reads) appends '[NOTE: subagent modified files the parent previously read — re-read before editing: ...]' to entry.summary when the child touched anything the parent had already seen. Complements (does not replace) the existing path-overlap check in run_agent._should_parallelize_tool_batch — batch check prevents same-file parallel dispatch within one agent's turn (cheap prevention, zero API cost), registry catches cross-subagent and cross-turn staleness at write time (detection). Behavior is warning-only, not hard-failing — matches existing project style. Errors surface naturally: sibling writes often invalidate the old_string in patch operations, which already errors cleanly. Tests: tests/tools/test_file_state_registry.py — 16 tests covering registry state transitions, per-path locking, per-path-not-global locking, writes_since filtering, kill switch, and end-to-end integration through the real read_file/write_file/patch handlers.	2026-04-21 16:41:26 -07:00
Teknium	5e6427a42c	fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage Follow-ups on top of @teyrebaz33's cherry-picked commit: 1. New shared helper format_no_match_hint() in fuzzy_match.py with a startswith('Could not find') gate so the snippet only appends to genuine no-match errors — not to 'Found N matches' (ambiguous), 'Escape-drift detected', or 'identical strings' errors, which would all mislead the model. 2. file_tools.patch_tool suppresses the legacy generic '[Hint: old_string not found...]' string when the rich 'Did you mean?' snippet is already attached — no more double-hint. 3. Wire the same helper into patch_parser.py (V4A patch mode, both _validate_operations and _apply_update) and skill_manager_tool.py so all three fuzzy callers surface the hint consistently. Tests: 7 new gating tests in TestFormatNoMatchHint cover every error class (ambiguous, drift, identical, non-zero match count, None error, no similar content, happy path). 34/34 test_fuzzy_match, 96/96 test_file_tools + test_patch_parser + test_skill_manager_tool pass. E2E verified across all four scenarios: no-match-with-similar, no-match-no-similar, ambiguous, success. V4A mode confirmed end-to-end with a non-matching hunk.	2026-04-21 02:03:46 -07:00
ifrederico	9b36636363	fix(security): apply file safety to copilot acp fs	2026-04-21 01:31:58 -07:00
Aniruddha Adak	4c40ec96e6	fix(file_tools): resolve relative paths against TERMINAL_CWD for worktree isolation Adds a _resolve_path() helper that reads TERMINAL_CWD and uses it as the base for relative path resolution. Applied to _check_sensitive_path, read_file_tool, _update_read_timestamp, and _check_file_staleness. Absolute paths and non-worktree sessions (no TERMINAL_CWD) are unaffected — falls back to os.getcwd(). Fixes #12689.	2026-04-20 12:29:31 -07:00
teyrebaz33	2d59afd3da	fix(docker): pass docker_mount_cwd_to_workspace and docker_forward_env to container_config in file_tools file_tools._get_file_ops() built a container_config dict for Docker/ Singularity/Modal/Daytona backends but omitted docker_mount_cwd_to_workspace and docker_forward_env. Both are read by _create_environment() from container_config, so file tools (read_file, write_file, patch, search) silently ignored those config values when running in Docker. Add the two missing keys to match the container_config already built by terminal_tool.terminal_tool(). Fixes #2672.	2026-04-20 00:58:16 -07:00
Teknium	3f43aec15d	fix(tools): bound _read_tracker sub-containers + prune _completion_consumed (#11839 ) Two accretion-over-time leaks that compound over long CLI / gateway lifetimes. Both were flagged in the memory-leak audit. ## file_tools._read_tracker _read_tracker[task_id] holds three sub-containers that grew unbounded: read_history set of (path, offset, limit) tuples — 1 per unique read dedup dict of (path, offset, limit) → mtime — same growth pattern read_timestamps dict of resolved_path → mtime — 1 per unique path A CLI session uses one stable task_id for its lifetime, so these were uncapped. A 10k-read session accumulated ~1.5MB of tracker state that the tool no longer needed (only the most recent reads are relevant for dedup, consecutive-loop detection, and write/patch external-edit warnings). Fix: _cap_read_tracker_data() enforces hard caps on each container after every add. Defaults: read_history=500, dedup=1000, read_timestamps=1000. Eviction is insertion-order (Python 3.7+ dict guarantee) for the dicts; arbitrary for the set (which only feeds diagnostic summaries). ## process_registry._completion_consumed Module-level set that recorded every session_id ever polled / waited / logged. No pruning. Each entry is ~20 bytes, so the absolute leak is small, but on a gateway processing thousands of background commands per day the set grows until process exit. Fix: _prune_if_needed() now discards _completion_consumed entries alongside the session dict evictions it already performs (both the TTL-based prune and the LRU-over-cap prune). Adds a final belt-and-suspenders pass that drops any dangling entries whose session_id no longer appears in _running or _finished. Tests: tests/tools/test_accretion_caps.py — 9 cases * Each container bound respected, oldest evicted * No-op when under cap (no unnecessary work) * Handles missing sub-containers without crashing * Live read_file_tool path enforces caps end-to-end * _completion_consumed pruned on TTL expiry * _completion_consumed pruned on LRU eviction * Dangling entries (no backing session) cleared Broader suite: 3486 tests/tools + tests/cli pass. The single flake (test_alias_command_passes_args) reproduces on unchanged main — known cross-test pollution under suite-order load.	2026-04-17 15:53:57 -07:00
Teknium	8d023e43ed	refactor: remove dead code — 1,784 lines across 77 files (#9180 ) Deep scan with vulture, pyflakes, and manual cross-referencing identified: - 41 dead functions/methods (zero callers in production) - 7 production-dead functions (only test callers, tests deleted) - 5 dead constants/variables - ~35 unused imports across agent/, hermes_cli/, tools/, gateway/ Categories of dead code removed: - Refactoring leftovers: _set_default_model, _setup_copilot_reasoning_selection, rebuild_lookups, clear_session_context, get_logs_dir, clear_session - Unused API surface: search_models_dev, get_pricing, skills_categories, get_read_files_summary, clear_read_tracker, menu_labels, get_spinner_list - Dead compatibility wrappers: schedule_cronjob, list_cronjobs, remove_cronjob - Stale debug helpers: get_debug_session_info copies in 4 tool files (centralized version in debug_helpers.py already exists) - Dead gateway methods: send_emote, send_notice (matrix), send_reaction (bluebubbles), _normalize_inbound_text (feishu), fetch_room_history (matrix), _start_typing_indicator (signal), parse_feishu_post_content - Dead constants: NOUS_API_BASE_URL, SKILLS_TOOL_DESCRIPTION, FILE_TOOLS, VALID_ASPECT_RATIOS, MEMORY_DIR - Unused UI code: _interactive_provider_selection, _interactive_model_selection (superseded by prompt_toolkit picker) Test suite verified: 609 tests covering affected files all pass. Tests for removed functions deleted. Tests using removed utilities (clear_read_tracker, MEMORY_DIR) updated to use internal APIs directly.	2026-04-13 16:32:04 -07:00
konsisumer	311dac1971	fix(file_tools): block /private/etc writes on macOS symlink bypass On macOS, /etc is a symlink to /private/etc, so os.path.realpath() resolves /etc/hosts to /private/etc/hosts. The sensitive path check only matched /etc/ prefixes against the resolved path, allowing writes to system files on macOS. - Add /private/etc/ and /private/var/ to _SENSITIVE_PATH_PREFIXES - Check both realpath-resolved and normpath-normalized paths - Add regression tests for macOS symlink bypass Closes #8734 Co-authored-by: ElhamDevelopmentStudio (PR #8829)	2026-04-13 05:15:05 -07:00
Teknium	3696c74bfb	fix: preserve existing thresholds, remove pre-read byte guard - DEFAULT_RESULT_SIZE_CHARS: 50K -> 100K (match current _LARGE_RESULT_CHARS) - DEFAULT_PREVIEW_SIZE_CHARS: 2K -> 1.5K (match current _LARGE_RESULT_PREVIEW_CHARS) - Per-tool overrides all set to 100K (terminal, execute_code, search_files) - Remove pre-read byte guard (no behavioral regression vs current main) - Revert limit signature change to int=500 (match current default) - Restore original read_file schema description - Update test assertions to match 100K thresholds	2026-04-08 02:24:32 -07:00
alt-glitch	bbcff8dcd0	fix(tools): address PR review — remove _extract_raw_output, BudgetConfig everywhere, read_file hardening - Remove _extract_raw_output: persist content verbatim (fixes size mismatch bug) - Drop import aliases: import from budget_config directly, one canonical name - BudgetConfig param on maybe_persist_tool_result and enforce_turn_budget - read_file: limit=None signature, pre-read guard fires only when limit omitted (256KB) - Unify binary extensions: file_operations.py imports from binary_extensions.py - Exclude .pdf and .svg from binary set (text-based, agents may inspect) - Remove redundant outer try/except in eval path (internal fallback handles it) - Fix broken tests: update assertion strings for new persistence format - Module-level constants: _PRE_READ_MAX_BYTES, _DEFAULT_READ_LIMIT - Remove redundant pathlib import (Path already at module level) - Update spec.md with IMPLEMENTED annotations and design decisions	2026-04-08 02:24:32 -07:00
alt-glitch	65e24c942e	wip: tool result fixes -- persistence	2026-04-08 02:24:32 -07:00
Teknium	678a87c477	refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites Add three reusable helpers to eliminate pervasive boilerplate: tools/registry.py — tool_error() and tool_result(): Every tool handler returns JSON strings. The pattern json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times, and json.dumps({"success": False, "error": msg}, ...) another 23. Now: tool_error(msg) or tool_error(msg, success=False). tool_result() handles arbitrary result dicts: tool_result(success=True, data=payload) or tool_result(some_dict). hermes_cli/config.py — read_raw_config(): Lightweight YAML reader that returns the raw config dict without load_config()'s deep-merge + migration overhead. Available for callsites that just need a single config value. Migration (129 callsites across 32 files): - tools/: browser_camofox (18), file_tools (10), homeassistant (8), web_tools (7), skill_manager (7), cronjob (11), code_execution (4), delegate (5), send_message (4), tts (4), memory (7), session_search (3), mcp (2), clarify (2), skills_tool (3), todo (1), vision (1), browser (1), process_registry (2), image_gen (1) - plugins/memory/: honcho (9), supermemory (9), hindsight (8), holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2) - agent/: memory_manager (2), builtin_memory_provider (1)	2026-04-07 13:36:38 -07:00
Teknium	ca0459d109	refactor: remove 24 confirmed dead functions — 432 lines of unused code Each function was verified to have exactly 1 reference in the entire codebase (its own definition). Zero calls, zero imports, zero string references anywhere including tests. Removed by category: Superseded wrappers (replaced by newer implementations): - agent/anthropic_adapter.py: run_hermes_oauth_login, refresh_hermes_oauth_token - hermes_cli/callbacks.py: sudo_password_callback (superseded by CLI method) - hermes_cli/setup.py: _set_model_provider, _sync_model_from_disk - tools/file_tools.py: get_file_tools (superseded by registry.register) - tools/cronjob_tools.py: get_cronjob_tool_definitions (same) - tools/terminal_tool.py: _check_dangerous_command (_check_all_guards used) Dead private helpers (lost their callers during refactors): - agent/anthropic_adapter.py: _convert_user_content_part_to_anthropic - agent/display.py: honcho_session_line, write_tty - hermes_cli/providers.py: _build_labels (+ dead _labels_cache var) - hermes_cli/tools_config.py: _prompt_yes_no - hermes_cli/models.py: _extract_model_ids - hermes_cli/uninstall.py: log_error - gateway/platforms/feishu.py: _is_loop_ready - tools/file_operations.py: _read_image (64-line method) - tools/process_registry.py: cleanup_expired - tools/skill_manager_tool.py: check_skill_manage_requirements Dead class methods (zero callers): - run_agent.py: _is_anthropic_url (logic duplicated inline at L618) - run_agent.py: _classify_empty_content_response (68-line method, never wired) - cli.py: reset_conversation (callers all use new_session directly) - cli.py: _clear_current_input (added but never wired in) Other: - gateway/delivery.py: build_delivery_context_for_tool - tools/browser_tool.py: get_active_browser_sessions	2026-04-07 11:41:26 -07:00
acsezen	831067c5d3	perf: fix O(n²) catastrophic backtracking in redact regex + reorder file read guard Two pre-existing issues causing test_file_read_guards timeouts on CI: 1. agent/redact.py: _ENV_ASSIGN_RE used unbounded [A-Z_]* with IGNORECASE, matching any letter/underscore to end-of-string at each position → O(n²) backtracking on 100K+ char inputs. Bounded to {0,50} since env var names are never that long. 2. tools/file_tools.py: redact_sensitive_text() ran BEFORE the character-count guard, so oversized content (that would be rejected anyway) went through the expensive regex first. Reordered to check size limit before redaction.	2026-04-03 22:40:37 -07:00
Teknium	ef2ae3e48f	fix(file_tools): refresh staleness timestamp after writes (#4390 ) After a successful write_file or patch, update the stored read timestamp to match the file's new modification time. Without this, consecutive edits by the same task (read → write → write) would false-warn on the second write because the stored timestamp still reflected the original read, not the first write. Also renames the internal tracker key from 'file_mtimes' to 'read_timestamps' for clarity.	2026-04-01 00:50:08 -07:00
Teknium	f04986029c	feat(file_tools): detect stale files on write and patch (#4345 ) Track file mtime when read_file is called. When write_file or patch subsequently targets the same file, compare the current mtime against the recorded one. If they differ (external edit, concurrent agent, user change), include a _warning in the result advising the agent to re-read. The write still proceeds — this is a soft signal, not a hard block. Key design points: - Per-task isolation: task A's reads don't affect task B's writes. - Files never read produce no warning (not enforcing read-before-write). - mtime naturally updates after the agent's own writes, so the warning only fires on external changes, not the agent's own edits. - V4A multi-file patches check all target paths. Tests: 10 new tests covering write staleness, patch staleness, never-read files, cross-task isolation, and the helper function.	2026-03-31 14:49:00 -07:00
Teknium	e3f8347be3	feat(file_tools): harden read_file with size guard, dedup, and device blocking (#4315 ) * feat(file_tools): harden read_file with size guard, dedup, and device blocking Three improvements to read_file_tool to reduce wasted context tokens and prevent process hangs: 1. Character-count guard: reads that produce more than 100K characters (≈25-35K tokens across tokenisers) are rejected with an error that tells the model to use offset+limit for a smaller range. The effective cap is min(file_size, 100K) so small files that happen to have long lines aren't over-penalised. Large truncated files also get a hint nudging toward targeted reads. 2. File-read deduplication: when the same (path, offset, limit) is read a second time and the file hasn't been modified (mtime unchanged), return a lightweight stub instead of re-sending the full content. Writes and patches naturally change mtime, so post-edit reads always return fresh content. The dedup cache is cleared on context compression — after compression the original read content is summarised away, so the model needs the full content again. 3. Device path blocking: paths like /dev/zero, /dev/random, /dev/stdin etc. are rejected before any I/O to prevent process hangs from infinite-output or blocking-input devices. Tests: 17 new tests covering all three features plus the dedup-reset- on-compression integration. All 52 file-read tests pass (35 existing + 17 new). Full tool suite (2124 tests) passes with 0 failures. * feat: make file_read_max_chars configurable, add docs Add file_read_max_chars to DEFAULT_CONFIG (default 100K). read_file_tool reads this on first call and caches for the process lifetime. Users on large-context models can raise it; users on small local models can lower it. Also adds a 'File Read Safety' section to the configuration docs explaining the char limit, dedup behavior, and example values.	2026-03-31 12:53:19 -07:00
Teknium	5602458794	security: harden dangerous command detection and add file tool path guards (#3872 ) Closes gaps that allowed an agent to expose Docker's Remote API to the internet by writing to /etc/docker/daemon.json. Terminal tool (approval.py): - chmod: now catches 666 and symbolic modes (o+w, a+w), not just 777 - cp/mv/install: detected when targeting /etc/ - sed -i/--in-place: detected when targeting /etc/ File tools (file_tools.py): - write_file and patch now refuse to write to sensitive system paths (/etc/, /boot/, /usr/lib/systemd/, docker.sock) - Directs users to the terminal tool (which has approval prompts) for system file modifications	2026-03-29 22:33:47 -07:00
Teknium	e97c0cb578	fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support * feat: GPT tool-use steering + strip budget warnings from history Two changes to improve tool reliability, especially for OpenAI GPT models: 1. GPT tool-use enforcement prompt: Adds GPT_TOOL_USE_GUIDANCE to the system prompt when the model name contains 'gpt' and tools are loaded. This addresses a known behavioral pattern where GPT models describe intended actions ('I will run the tests') instead of actually making tool calls. Inspired by similar steering in OpenCode (beast.txt) and Cline (GPT-5.1 variant). 2. Budget warning history stripping: Budget pressure warnings injected by _get_budget_warning() into tool results are now stripped when conversation history is replayed via run_conversation(). Previously, these turn-scoped signals persisted across turns, causing models to avoid tool calls in all subsequent messages after any turn that hit the 70-90% iteration threshold. * fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support Prep for the upcoming profiles feature — each profile is a separate HERMES_HOME directory, so all paths must respect the env var. Fixes: - gateway/platforms/matrix.py: Matrix E2EE store was hardcoded to ~/.hermes/matrix/store, ignoring HERMES_HOME. Now uses get_hermes_home() so each profile gets its own Matrix state. - gateway/platforms/telegram.py: Two locations reading config.yaml via Path.home()/.hermes instead of get_hermes_home(). DM topic thread_id persistence and hot-reload would read the wrong config in a profile. - tools/file_tools.py: Security path for hub index blocking was hardcoded to ~/.hermes, would miss the actual profile's hub cache. - hermes_cli/gateway.py: Service naming now uses the profile name (hermes-gateway-coder) instead of a cryptic hash suffix. Extracted _profile_suffix() helper shared by systemd and launchd. - hermes_cli/gateway.py: Launchd plist path and Label now scoped per profile (ai.hermes.gateway-coder.plist). Previously all profiles would collide on the same plist file on macOS. - hermes_cli/gateway.py: Launchd plist now includes HERMES_HOME in EnvironmentVariables — was missing entirely, making custom HERMES_HOME broken on macOS launchd (pre-existing bug). - All launchctl commands in gateway.py, main.py, status.py updated to use get_launchd_label() instead of hardcoded string. Test fixes: DM topic tests now set HERMES_HOME env var alongside Path.home() mock. Launchd test uses get_launchd_label() for expected commands.	2026-03-28 13:51:08 -07:00
Teknium	8bb1d15da4	chore: remove ~100 unused imports across 55 files (#3016 ) Automated cleanup via pyflakes + autoflake with manual review. Changes: - Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.) - Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.) - Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.) - Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner then immediately redefined locally — only build_welcome_banner is actually used) - Added noqa comments to imports that appear unused but serve a purpose: - Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py is_interrupted/_interrupt_event) - SDK presence checks in try/except (daytona, fal_client, discord) - Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home) Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing streaming test failures unrelated to this change).	2026-03-25 15:02:03 -07:00
Teknium	934fbe3c06	fix: strip ANSI at the source — clean terminal output before it reaches the model Root cause: terminal_tool, execute_code, and process_registry returned raw subprocess output with ANSI escape sequences intact. The model saw these in tool results and copied them into file writes. Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py, but this was a band-aid — regex on file content risks corrupting legitimate content, and doesn't prevent ANSI from wasting tokens in the model context. Source-level fix: - New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI (incl. private-mode, colon-separated, intermediate bytes), OSC (both terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1 - terminal_tool.py: strip output before returning to model - code_execution_tool.py: strip stdout/stderr before returning - process_registry.py: strip output in poll/read_log/wait - file_tools.py: remove _strip_ansi band-aid (no longer needed) Verified: `ls --color=always` output returned as clean text to model, file written from that output contains zero ESC bytes.	2026-03-23 07:43:12 -07:00
Teknium	fa6f069577	fix(file_tools): strip ANSI escape codes from write_file and patch content (#2532 ) Models occasionally copy ANSI escape sequences from terminal output or display formatting into file content, breaking shebangs and injecting binary characters into scripts. Strip ANSI codes (CSI, OSC, simple escapes) from: - write_file content - patch old_string, new_string, and V4A patch content The check is fast (skips entirely if no ESC byte present). Reported by Andi Jaeger.	2026-03-22 11:17:06 -07:00
darya	a654bc04f7	fix(file_tools): include pagination args in repeated search key	2026-03-18 01:19:05 +03:00
Teknium	96dac22194	fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files (#1630 , #1558 ) * fix: prevent infinite 400 failure loop on context overflow (#1630) When a gateway session exceeds the model's context window, Anthropic may return a generic 400 invalid_request_error with just 'Error' as the message. This bypassed the phrase-based context-length detection, causing the agent to treat it as a non-retryable client error. Worse, the failed user message was still persisted to the transcript, making the session even larger on each attempt — creating an infinite loop. Three-layer fix: 1. run_agent.py — Fallback heuristic: when a 400 error has a very short generic message AND the session is large (>40% of context or >80 messages), treat it as a probable context overflow and trigger compression instead of aborting. 2. run_agent.py + gateway/run.py — Don't persist failed messages: when the agent returns failed=True before generating any response, skip writing the user's message to the transcript/DB. This prevents the session from growing on each failure. 3. gateway/run.py — Smarter error messages: detect context-overflow failures and suggest /compact or /reset specifically, instead of a generic 'try again' that will fail identically. * fix(skills): detect prompt injection patterns and block cache file reads Adds two security layers to prevent prompt injection via skills hub cache files (#1558): 1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json was the original injection vector — untrusted skill descriptions in the catalog contained adversarial text that the model executed. 2. skill_view: warns when skills are loaded from outside the trusted ~/.hermes/skills/ directory, and detects common injection patterns in skill content ("ignore previous instructions", "<system>", etc.). Cherry-picked from PR #1562 by ygd58. --------- Co-authored-by: buray <ygd58@users.noreply.github.com>	2026-03-17 01:50:59 -07:00
teknium1	780ddd102b	fix(docker): gate cwd workspace mount behind config Keep Docker sandboxes isolated by default. Add an explicit terminal.docker_mount_cwd_to_workspace opt-in, thread it through terminal/file environment creation, and document the security tradeoff and config.yaml workflow clearly.	2026-03-16 05:20:56 -07:00
teknium1	01e62c067b	merge: resolve conflicts with origin/main (SSH preflight check)	2026-03-15 21:13:40 -07:00
teknium1	210d5ade1e	feat(tools): centralize tool emoji metadata in registry + skin integration - Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry - Add emoji= to all 50+ registry.register() calls across tool files - Add get_tool_emoji() helper in agent/display.py with 3-tier resolution: skin override → registry default → hardcoded fallback - Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and gateway/run.py with centralized get_tool_emoji() calls - Add 'tool_emojis' field to SkinConfig so skins can override per-tool emojis (e.g. ares skin could use swords instead of wrenches) - Add 11 tests (5 registry emoji, 6 display/skin integration) - Update AGENTS.md skin docs table Based on the approach from PR #1061 by ForgingAlex (emoji centralization in registry). This salvage fixes several issues from the original: - Does NOT split the cronjob tool (which would crash on missing schemas) - Does NOT change image_generate toolset/requires_env/is_async - Does NOT delete existing tests - Completes the centralization (gateway/run.py was missed) - Hooks into the skin system for full customizability	2026-03-15 20:21:21 -07:00
teknium1	5b80654198	feat(tools): add persistent shell mode to local and SSH backends Cherry-picked from PR #1067 by alt-glitch. Adds PersistentShellMixin with file-based IPC protocol for long-lived bash shells. LocalEnvironment and SSHEnvironment gain persistent=True option. Controlled via TERMINAL_LOCAL_PERSISTENT / TERMINAL_SSH_PERSISTENT env vars. Fixes latent stderr pipe buffer deadlock. Co-authored-by: alt-glitch <balyan.sid@gmail.com>	2026-03-15 20:13:02 -07:00
alt-glitch	7be314c456	pass configs to file_tools for r+w over ssh. pass TERM env. default to ~ to in local and ssh backends. ssh backend.	2026-03-15 02:26:39 +05:30
teknium1	b59da08730	fix: reduce file tool log noise - treat git diff --cached --quiet rc=1 as an expected checkpoint state instead of logging it as an error - downgrade expected write PermissionError/EROFS/EACCES failures out of error logging while keeping unexpected exceptions at error level - add regression tests for both logging behaviors	2026-03-13 22:14:00 -07:00
teknium1	c837ef949d	fix: replace debug print() with logger.error() in file_tools Stray print() in write_file_tool exception handler leaked debug output to stdout. Replaced with logger.error() which is already set up in the file. Authored by memosr. Co-authored-by: memosr <memosr@users.noreply.github.com>	2026-03-11 04:38:07 -07:00
teknium1	a458b535c9	fix: improve read-loop detection — consecutive-only, correct thresholds, fix bugs Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues: 1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only warn/block on truly consecutive identical calls. Any other tool call in between (write, patch, terminal, etc.) resets the counter via notify_other_tool_call(), called from handle_function_call() in model_tools.py. This prevents false blocks in read→edit→verify flows. 2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on 4th+ consecutive (was 3rd+). Gives the model more room before intervening. 3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a separate read_history set that only tracks file reads. 4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from web_extract return docs in code_execution_tool.py — the field IS returned by web_tools.py. 5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover consecutive-only behavior, notify_other_tool_call, interleaved read/search, and summary-unaffected-by-searches.	2026-03-10 16:25:41 -07:00
teknium1	b53d5dad67	Merge PR #705 : fix: detect, warn, and block file re-read/search loops after context compression Authored by 0xbyt4. Adds read/search loop detection, file history injection after compression, and todo filtering for active items only.	2026-03-10 16:17:03 -07:00
Dev User	c3dec1dcda	fix(file_tools): pass docker_volumes to sandbox container config file_tools.py creates its own Docker sandbox when read_file/search_files runs before any terminal command. The container_config was missing docker_volumes, so the sandbox had no user volume mounts — breaking access to heartbeat state, cron output, and all other mounted data. Matches the existing pattern in terminal_tool.py:872. Missed in original PR #158 (feat: add docker_volumes config). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 10:18:33 +01:00
memosr.eth	b78b605ba9	fix: replace print() with logger.error() in file_tools	2026-03-09 22:29:16 +03:00
0xbyt4	4684aaffdc	merge: resolve file_tools.py conflict with origin/main Combine read/search loop detection with main's redact_sensitive_text and truncation hint features. Add tracker reset to TestSearchHints to prevent cross-test state leakage.	2026-03-09 13:21:46 +03:00
teknium1	7af33accf1	fix: apply secret redaction to file tool outputs Terminal output was already redacted via redact_sensitive_text() but read_file and search_files returned raw content. Now both tools redact secrets before returning results to the LLM. Based on PR #372 by @teyrebaz33 (closes #363) — applied manually due to branch conflicts with the current codebase.	2026-03-09 00:49:46 -07:00
teknium1	491605cfea	feat: add high-value tool result hints for patch and search_files (#722 ) Add contextual [Hint: ...] suffixes to tool results where they save real iterations: - patch (no match): suggests read_file/search_files to verify content before retrying — addresses the common pattern where the agent retries with stale old_string instead of re-reading the file. - search_files (truncated): provides explicit next offset and suggests narrowing the search — clearer than relying on total_count inference. Other hints proposed in #722 (terminal, web_search, web_extract, browser_snapshot, search zero-results, search content-matches) were evaluated and found to be low-value: either already covered by existing mechanisms (read_file pagination, similar-files, schema descriptions) or guidance the agent already follows from its own reasoning. 5 new tests covering hint presence/absence for both tools.	2026-03-08 17:46:28 -07:00
0xbyt4	e2fe1373f3	fix: escalate read/search blocking, track search loops, filter completed todos - Block file reads after 3+ re-reads of same region (no content returned) - Track search_files calls and block repeated identical searches - Filter completed/cancelled todos from post-compression injection to prevent agent from re-doing finished work - Add 10 new tests covering all three fixes	2026-03-08 23:01:21 +03:00
0xbyt4	9eee529a7f	fix: detect and warn on file re-read loops after context compression When context compression summarizes conversation history, the agent loses track of which files it already read and re-reads them in a loop. Users report the agent reading the same files endlessly without writing. Root cause: context compression is lossy — file contents and read history are lost in the summary. After compression, the model thinks it hasn't examined the files yet and reads them again. Fix (two-part): 1. Track file reads per task in file_tools.py. When the same file region is read again, include a _warning in the response telling the model to stop re-reading and use existing information. 2. After context compression, inject a structured message listing all files already read in the session with explicit "do NOT re-read" instruction, preserving read history across compression boundaries. Adds 16 tests covering warning detection, task isolation, summary accuracy, tracker cleanup, and compression history injection.	2026-03-08 20:44:42 +03:00
rovle	c43451a50b	feat(terminal): integrate Daytona backend into tool pipeline Add Daytona to image selection, container_config guards, environment factory, requirements check, and diagnostics in terminal_tool.py and file_tools.py. Also add to sandboxed-backend approval bypass. Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 10:02:21 -08:00
rovle	ca33372595	fix: pass task_id to _create_environment as well, to prevent cross-session state mixing Signed-off-by: rovle <lovre.pesut@gmail.com>	2026-03-05 01:40:04 -08:00
Gesina Sands	f7677ed275	feat: add docker_volumes config for custom volume mounts	2026-02-28 07:12:48 +10:00
teknium1	2bf96ad244	feat: add ephemeral prefill messages and system prompt loading - Implemented functionality to load ephemeral prefill messages from a JSON file, enhancing few-shot priming capabilities for the agent. - Introduced a mechanism to load an ephemeral system prompt from environment variables or configuration files, ensuring dynamic prompt adjustments at API-call time. - Updated the CLI and agent initialization to utilize the new prefill messages and system prompt, improving the overall interaction experience. - Enhanced configuration options with new environment variables for prefill messages and system prompts, allowing for greater customization without persistence.	2026-02-23 23:55:42 -08:00
teknium1	e1604b2b4a	feat: enhance user authorization checks in GatewayRunner - Updated the authorization logic to include a per-platform allow-all flag for improved flexibility. - Revised the order of checks to prioritize platform-specific allow-all settings, followed by environment variable allowlists and DM pairing approvals. - Added global allow-all configuration for broader access control. - Improved handling of allowlists by stripping whitespace and ensuring valid entries are processed.	2026-02-22 16:32:08 -08:00
teknium1	08ff1c1aa8	More major refactor/tech debt removal!	2026-02-21 20:22:33 -08:00
teknium1	a885d2f240	refactor: implement structured logging across multiple modules - Introduced logging functionality in cli.py, run_agent.py, scheduler.py, and various tool modules to replace print statements with structured logging. - Enhanced error handling and informational messages to improve debugging and monitoring capabilities. - Ensured consistent logging practices across the codebase, facilitating better traceability and maintenance.	2026-02-21 03:11:11 -08:00
teknium1	f9eb5edb96	refactor: rename search tool for clarity and consistency - Updated the tool name from "search" to "search_files" across multiple files to better reflect its functionality. - Adjusted related documentation and descriptions to ensure clarity in usage and expected behavior. - Enhanced the toolset definitions and mappings to incorporate the new naming convention, improving overall consistency in the codebase.	2026-02-20 02:43:57 -08:00
teknium1	8117d0adab	Refactor file operations and environment management in file_tools and terminal_tool - Improved the caching mechanism for ShellFileOperations to ensure stale entries are invalidated when environments are cleaned up. - Enhanced thread safety by refining the use of locks during environment creation and cleanup processes. - Streamlined the cleanup of inactive environments to prevent blocking other tool calls, ensuring efficient resource management. - Added error handling and messaging improvements for better user feedback during environment cleanup.	2026-02-16 19:37:40 -08:00

1 2

54 commits