* feat(models): hide OpenRouter models that don't advertise tool support
Port from Kilo-Org/kilocode#9068.
hermes-agent is tool-calling-first — every provider path assumes the
model can invoke tools. Models whose OpenRouter supported_parameters
doesn't include 'tools' (e.g. image-only or completion-only models)
cannot be driven by the agent loop and fail at the first tool call.
Filter them out of fetch_openrouter_models() so they never appear in
the model picker (`hermes model`, setup wizard, /model slash command).
Permissive when the field is missing — OpenRouter-compatible gateways
(Nous Portal, private mirrors, older snapshots) don't always populate
supported_parameters. Treat missing as 'unknown → allow' rather than
silently emptying the picker on those gateways. Only hide models
whose supported_parameters is an explicit list that omits tools.
Tests cover: tools present → kept, tools absent → dropped, field
missing → kept, malformed non-list → kept, non-dict item → kept,
empty list → dropped.
* feat(delegate): cross-agent file state coordination for concurrent subagents
Prevents mangled edits when concurrent subagents touch the same file
(same process, same filesystem — the mangle scenario from #11215).
Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1:
1. FileStateRegistry (tools/file_state.py) — process-wide singleton
tracking per-agent read stamps and the last writer globally.
check_stale() names the sibling subagent in the warning when a
non-owning agent wrote after this agent's last read.
2. Per-path threading.Lock wrapped around the read-modify-write
region in write_file_tool and patch_tool. Concurrent siblings on
the same path serialize; different paths stay fully parallel.
V4A multi-file patches lock in sorted path order (deadlock-free).
3. Delegate-completion reminder in tools/delegate_tool.py: after a
subagent returns, writes_since(parent, child_start, parent_reads)
appends '[NOTE: subagent modified files the parent previously
read — re-read before editing: ...]' to entry.summary when the
child touched anything the parent had already seen.
Complements (does not replace) the existing path-overlap check in
run_agent._should_parallelize_tool_batch — batch check prevents
same-file parallel dispatch within one agent's turn (cheap prevention,
zero API cost), registry catches cross-subagent and cross-turn
staleness at write time (detection).
Behavior is warning-only, not hard-failing — matches existing project
style. Errors surface naturally: sibling writes often invalidate the
old_string in patch operations, which already errors cleanly.
Tests: tests/tools/test_file_state_registry.py — 16 tests covering
registry state transitions, per-path locking, per-path-not-global
locking, writes_since filtering, kill switch, and end-to-end
integration through the real read_file/write_file/patch handlers.
Follow-ups on top of @teyrebaz33's cherry-picked commit:
1. New shared helper format_no_match_hint() in fuzzy_match.py with a
startswith('Could not find') gate so the snippet only appends to
genuine no-match errors — not to 'Found N matches' (ambiguous),
'Escape-drift detected', or 'identical strings' errors, which would
all mislead the model.
2. file_tools.patch_tool suppresses the legacy generic '[Hint: old_string
not found...]' string when the rich 'Did you mean?' snippet is
already attached — no more double-hint.
3. Wire the same helper into patch_parser.py (V4A patch mode, both
_validate_operations and _apply_update) and skill_manager_tool.py so
all three fuzzy callers surface the hint consistently.
Tests: 7 new gating tests in TestFormatNoMatchHint cover every error
class (ambiguous, drift, identical, non-zero match count, None error,
no similar content, happy path). 34/34 test_fuzzy_match, 96/96
test_file_tools + test_patch_parser + test_skill_manager_tool pass.
E2E verified across all four scenarios: no-match-with-similar,
no-match-no-similar, ambiguous, success. V4A mode confirmed
end-to-end with a non-matching hunk.
Adds a _resolve_path() helper that reads TERMINAL_CWD and uses it as
the base for relative path resolution. Applied to _check_sensitive_path,
read_file_tool, _update_read_timestamp, and _check_file_staleness.
Absolute paths and non-worktree sessions (no TERMINAL_CWD) are
unaffected — falls back to os.getcwd().
Fixes#12689.
file_tools._get_file_ops() built a container_config dict for Docker/
Singularity/Modal/Daytona backends but omitted docker_mount_cwd_to_workspace
and docker_forward_env. Both are read by _create_environment() from
container_config, so file tools (read_file, write_file, patch, search)
silently ignored those config values when running in Docker.
Add the two missing keys to match the container_config already built by
terminal_tool.terminal_tool().
Fixes#2672.
Two accretion-over-time leaks that compound over long CLI / gateway
lifetimes. Both were flagged in the memory-leak audit.
## file_tools._read_tracker
_read_tracker[task_id] holds three sub-containers that grew unbounded:
read_history set of (path, offset, limit) tuples — 1 per unique read
dedup dict of (path, offset, limit) → mtime — same growth pattern
read_timestamps dict of resolved_path → mtime — 1 per unique path
A CLI session uses one stable task_id for its lifetime, so these were
uncapped. A 10k-read session accumulated ~1.5MB of tracker state that
the tool no longer needed (only the most recent reads are relevant for
dedup, consecutive-loop detection, and write/patch external-edit
warnings).
Fix: _cap_read_tracker_data() enforces hard caps on each container
after every add. Defaults: read_history=500, dedup=1000,
read_timestamps=1000. Eviction is insertion-order (Python 3.7+ dict
guarantee) for the dicts; arbitrary for the set (which only feeds
diagnostic summaries).
## process_registry._completion_consumed
Module-level set that recorded every session_id ever polled / waited /
logged. No pruning. Each entry is ~20 bytes, so the absolute leak is
small, but on a gateway processing thousands of background commands
per day the set grows until process exit.
Fix: _prune_if_needed() now discards _completion_consumed entries
alongside the session dict evictions it already performs (both the
TTL-based prune and the LRU-over-cap prune). Adds a final
belt-and-suspenders pass that drops any dangling entries whose
session_id no longer appears in _running or _finished.
Tests: tests/tools/test_accretion_caps.py — 9 cases
* Each container bound respected, oldest evicted
* No-op when under cap (no unnecessary work)
* Handles missing sub-containers without crashing
* Live read_file_tool path enforces caps end-to-end
* _completion_consumed pruned on TTL expiry
* _completion_consumed pruned on LRU eviction
* Dangling entries (no backing session) cleared
Broader suite: 3486 tests/tools + tests/cli pass. The single flake
(test_alias_command_passes_args) reproduces on unchanged main — known
cross-test pollution under suite-order load.
On macOS, /etc is a symlink to /private/etc, so os.path.realpath()
resolves /etc/hosts to /private/etc/hosts. The sensitive path check
only matched /etc/ prefixes against the resolved path, allowing
writes to system files on macOS.
- Add /private/etc/ and /private/var/ to _SENSITIVE_PATH_PREFIXES
- Check both realpath-resolved and normpath-normalized paths
- Add regression tests for macOS symlink bypass
Closes#8734
Co-authored-by: ElhamDevelopmentStudio (PR #8829)
- DEFAULT_RESULT_SIZE_CHARS: 50K -> 100K (match current _LARGE_RESULT_CHARS)
- DEFAULT_PREVIEW_SIZE_CHARS: 2K -> 1.5K (match current _LARGE_RESULT_PREVIEW_CHARS)
- Per-tool overrides all set to 100K (terminal, execute_code, search_files)
- Remove pre-read byte guard (no behavioral regression vs current main)
- Revert limit signature change to int=500 (match current default)
- Restore original read_file schema description
- Update test assertions to match 100K thresholds
Two pre-existing issues causing test_file_read_guards timeouts on CI:
1. agent/redact.py: _ENV_ASSIGN_RE used unbounded [A-Z_]* with
IGNORECASE, matching any letter/underscore to end-of-string at
each position → O(n²) backtracking on 100K+ char inputs.
Bounded to {0,50} since env var names are never that long.
2. tools/file_tools.py: redact_sensitive_text() ran BEFORE the
character-count guard, so oversized content (that would be rejected
anyway) went through the expensive regex first. Reordered to check
size limit before redaction.
After a successful write_file or patch, update the stored read
timestamp to match the file's new modification time. Without this,
consecutive edits by the same task (read → write → write) would
false-warn on the second write because the stored timestamp still
reflected the original read, not the first write.
Also renames the internal tracker key from 'file_mtimes' to
'read_timestamps' for clarity.
Track file mtime when read_file is called. When write_file or patch
subsequently targets the same file, compare the current mtime against
the recorded one. If they differ (external edit, concurrent agent,
user change), include a _warning in the result advising the agent to
re-read. The write still proceeds — this is a soft signal, not a
hard block.
Key design points:
- Per-task isolation: task A's reads don't affect task B's writes.
- Files never read produce no warning (not enforcing read-before-write).
- mtime naturally updates after the agent's own writes, so the warning
only fires on external changes, not the agent's own edits.
- V4A multi-file patches check all target paths.
Tests: 10 new tests covering write staleness, patch staleness,
never-read files, cross-task isolation, and the helper function.
* feat(file_tools): harden read_file with size guard, dedup, and device blocking
Three improvements to read_file_tool to reduce wasted context tokens and
prevent process hangs:
1. Character-count guard: reads that produce more than 100K characters
(≈25-35K tokens across tokenisers) are rejected with an error that
tells the model to use offset+limit for a smaller range. The
effective cap is min(file_size, 100K) so small files that happen to
have long lines aren't over-penalised. Large truncated files also
get a hint nudging toward targeted reads.
2. File-read deduplication: when the same (path, offset, limit) is read
a second time and the file hasn't been modified (mtime unchanged),
return a lightweight stub instead of re-sending the full content.
Writes and patches naturally change mtime, so post-edit reads always
return fresh content. The dedup cache is cleared on context
compression — after compression the original read content is
summarised away, so the model needs the full content again.
3. Device path blocking: paths like /dev/zero, /dev/random, /dev/stdin
etc. are rejected before any I/O to prevent process hangs from
infinite-output or blocking-input devices.
Tests: 17 new tests covering all three features plus the dedup-reset-
on-compression integration. All 52 file-read tests pass (35 existing +
17 new). Full tool suite (2124 tests) passes with 0 failures.
* feat: make file_read_max_chars configurable, add docs
Add file_read_max_chars to DEFAULT_CONFIG (default 100K). read_file_tool
reads this on first call and caches for the process lifetime. Users on
large-context models can raise it; users on small local models can lower it.
Also adds a 'File Read Safety' section to the configuration docs
explaining the char limit, dedup behavior, and example values.
Closes gaps that allowed an agent to expose Docker's Remote API to the
internet by writing to /etc/docker/daemon.json.
Terminal tool (approval.py):
- chmod: now catches 666 and symbolic modes (o+w, a+w), not just 777
- cp/mv/install: detected when targeting /etc/
- sed -i/--in-place: detected when targeting /etc/
File tools (file_tools.py):
- write_file and patch now refuse to write to sensitive system paths
(/etc/, /boot/, /usr/lib/systemd/, docker.sock)
- Directs users to the terminal tool (which has approval prompts) for
system file modifications
* feat: GPT tool-use steering + strip budget warnings from history
Two changes to improve tool reliability, especially for OpenAI GPT models:
1. GPT tool-use enforcement prompt: Adds GPT_TOOL_USE_GUIDANCE to the
system prompt when the model name contains 'gpt' and tools are loaded.
This addresses a known behavioral pattern where GPT models describe
intended actions ('I will run the tests') instead of actually making
tool calls. Inspired by similar steering in OpenCode (beast.txt) and
Cline (GPT-5.1 variant).
2. Budget warning history stripping: Budget pressure warnings injected by
_get_budget_warning() into tool results are now stripped when
conversation history is replayed via run_conversation(). Previously,
these turn-scoped signals persisted across turns, causing models to
avoid tool calls in all subsequent messages after any turn that hit
the 70-90% iteration threshold.
* fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support
Prep for the upcoming profiles feature — each profile is a separate
HERMES_HOME directory, so all paths must respect the env var.
Fixes:
- gateway/platforms/matrix.py: Matrix E2EE store was hardcoded to
~/.hermes/matrix/store, ignoring HERMES_HOME. Now uses
get_hermes_home() so each profile gets its own Matrix state.
- gateway/platforms/telegram.py: Two locations reading config.yaml via
Path.home()/.hermes instead of get_hermes_home(). DM topic thread_id
persistence and hot-reload would read the wrong config in a profile.
- tools/file_tools.py: Security path for hub index blocking was
hardcoded to ~/.hermes, would miss the actual profile's hub cache.
- hermes_cli/gateway.py: Service naming now uses the profile name
(hermes-gateway-coder) instead of a cryptic hash suffix. Extracted
_profile_suffix() helper shared by systemd and launchd.
- hermes_cli/gateway.py: Launchd plist path and Label now scoped per
profile (ai.hermes.gateway-coder.plist). Previously all profiles
would collide on the same plist file on macOS.
- hermes_cli/gateway.py: Launchd plist now includes HERMES_HOME in
EnvironmentVariables — was missing entirely, making custom
HERMES_HOME broken on macOS launchd (pre-existing bug).
- All launchctl commands in gateway.py, main.py, status.py updated
to use get_launchd_label() instead of hardcoded string.
Test fixes: DM topic tests now set HERMES_HOME env var alongside
Path.home() mock. Launchd test uses get_launchd_label() for expected
commands.
Root cause: terminal_tool, execute_code, and process_registry returned raw
subprocess output with ANSI escape sequences intact. The model saw these
in tool results and copied them into file writes.
Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py,
but this was a band-aid — regex on file content risks corrupting legitimate
content, and doesn't prevent ANSI from wasting tokens in the model context.
Source-level fix:
- New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI
(incl. private-mode, colon-separated, intermediate bytes), OSC (both
terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1
- terminal_tool.py: strip output before returning to model
- code_execution_tool.py: strip stdout/stderr before returning
- process_registry.py: strip output in poll/read_log/wait
- file_tools.py: remove _strip_ansi band-aid (no longer needed)
Verified: `ls --color=always` output returned as clean text to model,
file written from that output contains zero ESC bytes.
Models occasionally copy ANSI escape sequences from terminal output
or display formatting into file content, breaking shebangs and
injecting binary characters into scripts.
Strip ANSI codes (CSI, OSC, simple escapes) from:
- write_file content
- patch old_string, new_string, and V4A patch content
The check is fast (skips entirely if no ESC byte present).
Reported by Andi Jaeger.
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
Keep Docker sandboxes isolated by default. Add an explicit terminal.docker_mount_cwd_to_workspace opt-in, thread it through terminal/file environment creation, and document the security tradeoff and config.yaml workflow clearly.
- Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry
- Add emoji= to all 50+ registry.register() calls across tool files
- Add get_tool_emoji() helper in agent/display.py with 3-tier resolution:
skin override → registry default → hardcoded fallback
- Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and
gateway/run.py with centralized get_tool_emoji() calls
- Add 'tool_emojis' field to SkinConfig so skins can override per-tool
emojis (e.g. ares skin could use swords instead of wrenches)
- Add 11 tests (5 registry emoji, 6 display/skin integration)
- Update AGENTS.md skin docs table
Based on the approach from PR #1061 by ForgingAlex (emoji centralization
in registry). This salvage fixes several issues from the original:
- Does NOT split the cronjob tool (which would crash on missing schemas)
- Does NOT change image_generate toolset/requires_env/is_async
- Does NOT delete existing tests
- Completes the centralization (gateway/run.py was missed)
- Hooks into the skin system for full customizability
- treat git diff --cached --quiet rc=1 as an expected checkpoint state
instead of logging it as an error
- downgrade expected write PermissionError/EROFS/EACCES failures out of
error logging while keeping unexpected exceptions at error level
- add regression tests for both logging behaviors
Stray print() in write_file_tool exception handler leaked debug output
to stdout. Replaced with logger.error() which is already set up in
the file.
Authored by memosr.
Co-authored-by: memosr <memosr@users.noreply.github.com>
Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues:
1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only
warn/block on truly consecutive identical calls. Any other tool call
in between (write, patch, terminal, etc.) resets the counter via
notify_other_tool_call(), called from handle_function_call() in
model_tools.py. This prevents false blocks in read→edit→verify flows.
2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on
4th+ consecutive (was 3rd+). Gives the model more room before
intervening.
3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on
search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a
separate read_history set that only tracks file reads.
4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from
web_extract return docs in code_execution_tool.py — the field IS
returned by web_tools.py.
5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover
consecutive-only behavior, notify_other_tool_call, interleaved
read/search, and summary-unaffected-by-searches.
file_tools.py creates its own Docker sandbox when read_file/search_files
runs before any terminal command. The container_config was missing
docker_volumes, so the sandbox had no user volume mounts — breaking
access to heartbeat state, cron output, and all other mounted data.
Matches the existing pattern in terminal_tool.py:872.
Missed in original PR #158 (feat: add docker_volumes config).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Combine read/search loop detection with main's redact_sensitive_text
and truncation hint features. Add tracker reset to TestSearchHints
to prevent cross-test state leakage.
Terminal output was already redacted via redact_sensitive_text() but
read_file and search_files returned raw content. Now both tools
redact secrets before returning results to the LLM.
Based on PR #372 by @teyrebaz33 (closes#363) — applied manually
due to branch conflicts with the current codebase.
Add contextual [Hint: ...] suffixes to tool results where they save
real iterations:
- patch (no match): suggests read_file/search_files to verify content
before retrying — addresses the common pattern where the agent retries
with stale old_string instead of re-reading the file.
- search_files (truncated): provides explicit next offset and suggests
narrowing the search — clearer than relying on total_count inference.
Other hints proposed in #722 (terminal, web_search, web_extract,
browser_snapshot, search zero-results, search content-matches) were
evaluated and found to be low-value: either already covered by existing
mechanisms (read_file pagination, similar-files, schema descriptions)
or guidance the agent already follows from its own reasoning.
5 new tests covering hint presence/absence for both tools.
- Block file reads after 3+ re-reads of same region (no content returned)
- Track search_files calls and block repeated identical searches
- Filter completed/cancelled todos from post-compression injection
to prevent agent from re-doing finished work
- Add 10 new tests covering all three fixes
When context compression summarizes conversation history, the agent
loses track of which files it already read and re-reads them in a loop.
Users report the agent reading the same files endlessly without writing.
Root cause: context compression is lossy — file contents and read history
are lost in the summary. After compression, the model thinks it hasn't
examined the files yet and reads them again.
Fix (two-part):
1. Track file reads per task in file_tools.py. When the same file region
is read again, include a _warning in the response telling the model
to stop re-reading and use existing information.
2. After context compression, inject a structured message listing all
files already read in the session with explicit "do NOT re-read"
instruction, preserving read history across compression boundaries.
Adds 16 tests covering warning detection, task isolation, summary
accuracy, tracker cleanup, and compression history injection.
Add Daytona to image selection, container_config guards, environment
factory, requirements check, and diagnostics in terminal_tool.py and
file_tools.py. Also add to sandboxed-backend approval bypass.
Signed-off-by: rovle <lovre.pesut@gmail.com>
- Implemented functionality to load ephemeral prefill messages from a JSON file, enhancing few-shot priming capabilities for the agent.
- Introduced a mechanism to load an ephemeral system prompt from environment variables or configuration files, ensuring dynamic prompt adjustments at API-call time.
- Updated the CLI and agent initialization to utilize the new prefill messages and system prompt, improving the overall interaction experience.
- Enhanced configuration options with new environment variables for prefill messages and system prompts, allowing for greater customization without persistence.
- Updated the authorization logic to include a per-platform allow-all flag for improved flexibility.
- Revised the order of checks to prioritize platform-specific allow-all settings, followed by environment variable allowlists and DM pairing approvals.
- Added global allow-all configuration for broader access control.
- Improved handling of allowlists by stripping whitespace and ensuring valid entries are processed.
- Introduced logging functionality in cli.py, run_agent.py, scheduler.py, and various tool modules to replace print statements with structured logging.
- Enhanced error handling and informational messages to improve debugging and monitoring capabilities.
- Ensured consistent logging practices across the codebase, facilitating better traceability and maintenance.
- Updated the tool name from "search" to "search_files" across multiple files to better reflect its functionality.
- Adjusted related documentation and descriptions to ensure clarity in usage and expected behavior.
- Enhanced the toolset definitions and mappings to incorporate the new naming convention, improving overall consistency in the codebase.
- Improved the caching mechanism for ShellFileOperations to ensure stale entries are invalidated when environments are cleaned up.
- Enhanced thread safety by refining the use of locks during environment creation and cleanup processes.
- Streamlined the cleanup of inactive environments to prevent blocking other tool calls, ensuring efficient resource management.
- Added error handling and messaging improvements for better user feedback during environment cleanup.