Fixes blockquote > escaping, edit_message raw markdown, ***bold italic***
handling, HTML entity double-escaping (&), Wikipedia URL parens
truncation, and step numbering format. Also adds format_message to the
tool-layer _send_to_platform for consistent formatting across all
delivery paths.
Changes:
- Protect Slack entities (<@user>, <https://...|label>, <!here>) from
escaping passes
- Protect blockquote > markers before HTML entity escaping
- Unescape-before-escape for idempotent HTML entity handling
- ***bold italic*** → *_text_* conversion (before **bold** pass)
- URL regex upgraded to handle balanced parentheses
- mrkdwn:True flag on chat_postMessage payloads
- format_message applied in edit_message and send_message_tool
- 52 new tests (format, edit, streaming, splitting, tool chunking)
- Use reversed(dict) idiom for placeholder restoration
Based on PR #3715 by dashed, cherry-picked onto current main.
* fix(tests): mock is_safe_url in tests that use example.com
Tests using example.com URLs were failing because is_safe_url does a real DNS lookup which fails in environments where example.com doesn't resolve, causing the request to be blocked before reaching the already-mocked HTTP client. This should fix around 17 failing tests.
These tests test logic, caching, etc. so mocking this method should not modify them in any way. TestMattermostSendUrlAsFile was already doing this so we follow the same pattern.
* fix(test): use case-insensitive lookup for model context length check
DEFAULT_CONTEXT_LENGTHS uses inconsistent casing (MiniMax keys are lowercase, Qwen keys are mixed-case) so the test was broken in some cases since it couldn't find the model.
* fix(test): patch is_linux in systemd gateway restart test
The test only patched is_macos to False but didn't patch is_linux to True. On macOS hosts, is_linux() returns False and the systemd restart code path is skipped entirely, making the assertion fail.
* fix(test): use non-blocklisted env var in docker forward_env tests
GITHUB_TOKEN is in api_key_env_vars and thus in _HERMES_PROVIDER_ENV_BLOCKLIST so the env var is silently dropped, we replace it with a non-blocked one like DATABASE_URL so the tests actually work.
* fix(test): fully isolate _has_any_provider_configured from host env
_has_any_provider_configured() checks all env vars from PROVIDER_REGISTRY (not just the 5 the tests were clearing) and also calls get_auth_status() which detects gh auth token for Copilot. On machines with any of these set, the function returns True before reaching the code path under test.
Clear all registry vars and mock get_auth_status so host credentials don't interfere.
* fix(test): correct path to hermes_base_env.py in tool parser tests
Path(__file__).parent.parent resolved to tests/, not the project root.
The file lives at environments/hermes_base_env.py so we need one more parent level.
* fix(test): accept optional HTML fields in Matrix send payload
_send_matrix sometimes adds format and formatted_body when the markdown library is installed. The test was doing an exact dict equality check which broke. Check required fields instead.
* fix(test): add config.yaml to codex vision requirements test
The test only wrote auth.json but not config.yaml, so _read_main_provider() returned empty and vision auto-detect never tried the codex provider. Add a config.yaml pointing at openai-codex so the fallback path actually resolves the client.
* fix(test): clear OPENROUTER_API_KEY in _isolate_hermes_home
run_agent.py calls load_hermes_dotenv() at import time, which injects API keys from ~/.hermes/.env into os.environ before any test fixture runs. This caused test_agent_loop_tool_calling to make real API calls instead of skipping, which ends up making some tests fail.
* fix(test): add get_rate_limit_state to agent mock in usage report tests
_show_usage now calls agent.get_rate_limit_state() for rate limit
display. The SimpleNamespace mock was missing this method.
* fix(test): update expected Camofox config version from 12 to 13
* fix(test): mock _get_enabled_platforms in nous managed defaults test
Importing gateway.run leaks DISCORD_BOT_TOKEN into os.environ, which makes _get_enabled_platforms() return ["cli", "discord"] instead of just ["cli"]. tools_command loops per platform, so apply_nous_managed_defaults
runs twice: the first call sets config values, the second sees them as
already configured and returns an empty set, causing the assertion to
fail.
- Docker env tests: verify _build_init_env_args() instead of per-execute
Popen flags (env forwarding is now init-time only)
- Docker: preserve explicit forward_env bypass of blocklist from main
- Daytona tests: adapt to SDK-native timeout, _ThreadedProcessHandle,
base.py interrupt handling, HERMES_STDIN_ heredoc prefix
- Modal tests: fix _load_module to include _ThreadedProcessHandle stub,
check ensurepip in _resolve_modal_image instead of __init__
- SSH tests: mock time.sleep on base module instead of removed ssh import
- Add missing BaseEnvironment attributes to __new__()-based test fixtures
mistralai v2.x is a namespace package — `Mistral` class lives at
`mistralai.client`, not at the top-level `mistralai` module. The
previous `from mistralai import Mistral` raises ImportError at runtime.
Update both production code and test fixture to use the correct path.
* fix(tools): skip camofox auto-cleanup when managed persistence is enabled
When managed_persistence is enabled, cleanup_browser() was calling
camofox_close() which destroys the server-side browser context via
DELETE /sessions/{userId}, killing login sessions across cron runs.
Add camofox_soft_cleanup() — a public wrapper that drops only the
in-memory session entry when managed persistence is on, returning True.
When persistence is off it returns False so the caller falls back to
the full camofox_close(). The inactivity reaper still handles idle
resource cleanup.
Also surface a logger.warning() when _managed_persistence_enabled()
fails to load config, replacing a silent except-and-return-False.
Salvaged from #6182 by el-analista (Eduardo Perea Fernandez).
Added public API wrapper to avoid cross-module private imports,
and test coverage for both persistence paths.
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
* fix(doctor): only check the active memory provider, not all providers unconditionally
hermes doctor had hardcoded Honcho Memory and Mem0 Memory sections that
always ran regardless of the user's memory.provider config setting. After
the swappable memory provider update (#4623), users with leftover Honcho
config but no active provider saw false 'broken' errors.
Replaced both sections with a single Memory Provider section that reads
memory.provider from config.yaml and only checks the configured provider.
Users with no external provider see a green 'Built-in memory active' check.
Reported by community user michaelruiz001, confirmed by Eri (Honcho).
---------
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
Previously crash recovery recreated detached sessions as if they were
fully managed, so polls and kills could lie about liveness and the
checkpoint could forget recovered jobs after the next restart.
This commit refreshes recovered host-backed sessions from real PID
state, keeps checkpoint data durable, and preserves notify watcher
metadata while treating sandbox-only PIDs as non-recoverable.
- Persist `pid_scope` in `tools/process_registry.py` and skip
recovering sandbox-backed entries without a host-visible PID handle
- Refresh detached sessions on access so `get`/`poll`/`wait` and active
session queries observe exited processes instead of hanging forever
- Allow recovered host PIDs to be terminated honestly and requeue
`notify_on_complete` watchers during checkpoint recovery
- Add regression tests for durable checkpoints, detached exit/kill
behavior, sandbox skip logic, and recovered notify watchers
- DEFAULT_RESULT_SIZE_CHARS: 50K -> 100K (match current _LARGE_RESULT_CHARS)
- DEFAULT_PREVIEW_SIZE_CHARS: 2K -> 1.5K (match current _LARGE_RESULT_PREVIEW_CHARS)
- Per-tool overrides all set to 100K (terminal, execute_code, search_files)
- Remove pre-read byte guard (no behavioral regression vs current main)
- Revert limit signature change to int=500 (match current default)
- Restore original read_file schema description
- Update test assertions to match 100K thresholds
- The MCP SDK Pydantic model uses camelCase (structuredContent), not
snake_case (structured_content). The original getattr was a silent no-op.
- When structuredContent is present, return it AS the result instead of
alongside text — the structured payload is the machine-readable data.
- Move test file to tests/tools/ and fix fake class to use camelCase.
- Patch _run_on_mcp_loop in tests so the handler actually executes.
* refactor: re-architect tests to mirror the codebase
* Update tests.yml
* fix: add missing tool_error imports after registry refactor
* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist
patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.
* fix(tests): fix update_check and telegram xdist failures
- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
directly, it uses get_hermes_home() from hermes_constants.
- test_telegram_conflict/approval_buttons: provide real exception classes
for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
except clause in connect() doesn't fail with "catching classes that do
not inherit from BaseException" when xdist pollutes sys.modules.
* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock
Add win32 platform branch to clipboard.py so Ctrl+V image paste
works on native Windows (PowerShell / Windows Terminal), not just
WSL2.
Uses the same .NET System.Windows.Forms.Clipboard approach as the
WSL path but calls PowerShell directly instead of powershell.exe
(the WSL cross-call path). Tries 'powershell' first (Windows
PowerShell 5.1, always available), then 'pwsh' (PowerShell 7+).
PowerShell executable is discovered once and cached for the process
lifetime.
Includes 14 new tests covering:
- Platform dispatch (save_clipboard_image + has_clipboard_image)
- Image detection via PowerShell .NET check
- Base64 PNG extraction and decode
- Edge cases: no PowerShell, empty output, invalid base64, timeout
* fix: repair 57 failing CI tests across 14 files
Categories of fixes:
**Test isolation under xdist (-n auto):**
- test_hermes_logging: Strip ALL RotatingFileHandlers before each test
to prevent handlers leaked from other xdist workers from polluting counts
- test_code_execution: Force TERMINAL_ENV=local in setUp — prevents Modal
AuthError when another test leaks TERMINAL_ENV=modal
- test_timezone: Same TERMINAL_ENV fix for execute_code timezone tests
- test_codex_execution_paths: Mock _resolve_turn_agent_config to ensure
model resolution works regardless of xdist worker state
**Matrix adapter tests (nio not installed in CI):**
- Add _make_fake_nio() helper with real response classes for isinstance()
checks in production code
- Replace MagicMock(spec=nio.XxxResponse) with fake_nio instances
- Wrap production method calls with patch.dict('sys.modules', {'nio': ...})
so import nio succeeds in method bodies
- Use try/except instead of pytest.importorskip for nio.crypto imports
(importorskip can be fooled by MagicMock in sys.modules)
- test_matrix_voice: Skip entire file if nio is a mock, not just missing
**Stale test expectations:**
- test_cli_provider_resolution: _prompt_provider_choice now takes **kwargs
(default param added); mock getpass.getpass alongside input
- test_anthropic_oauth_flow: Mock getpass.getpass (code switched from input)
- test_gemini_provider: Mock models.dev + OpenRouter API lookups to test
hardcoded defaults without external API variance
- test_code_execution: Add notify_on_complete to blocked terminal params
- test_setup_openclaw_migration: Mock prompt_choice to select 'Full setup'
(new quick-setup path leads to _require_tty → sys.exit in CI)
- test_skill_manager_tool: Patch get_all_skills_dirs alongside SKILLS_DIR
so _find_skill searches tmp_path, not real ~/.hermes/skills/
**Missing attributes in object.__new__ test runners:**
- test_platform_reconnect: Add session_store to _make_runner()
- test_session_race_guard: Add hooks, _running_agents_ts, session_store,
delivery_router to _make_runner()
**Production bug fix (gateway/run.py):**
- Fix sentinel eviction race: _AGENT_PENDING_SENTINEL was immediately
evicted by the stale-detection logic because sentinels have no
get_activity_summary() method, causing _stale_idle=inf >= timeout.
Guard _should_evict with 'is not _AGENT_PENDING_SENTINEL'.
* fix: address remaining CI failures
- test_setup_openclaw_migration: Also mock _offer_launch_chat (called at
end of both quick and full setup paths)
- test_code_execution: Move TERMINAL_ENV=local to module level to protect
ALL test classes (TestEnvVarFiltering, TestExecuteCodeEdgeCases,
TestInterruptHandling, TestHeadTailTruncation) from xdist env leaks
- test_matrix: Use try/except for nio.crypto imports (importorskip can be
fooled by MagicMock in sys.modules under xdist)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
* feat: notify_on_complete for background processes
When terminal(background=true, notify_on_complete=true), the system
auto-triggers a new agent turn when the process exits — no polling needed.
Changes:
- ProcessSession: add notify_on_complete field
- ProcessRegistry: add completion_queue, populate on _move_to_finished()
- Terminal tool: add notify_on_complete parameter to schema + handler
- CLI: drain completion_queue after agent turn AND during idle loop
- Gateway: enhanced _run_process_watcher injects synthetic MessageEvent
on completion, triggering a full agent turn
- Checkpoint persistence includes notify_on_complete for crash recovery
- code_execution_tool: block notify_on_complete in sandbox scripts
- 15 new tests covering queue mechanics, checkpoint round-trip, schema
* docs: update terminal tool descriptions for notify_on_complete
- background: remove 'ONLY for servers' language, describe both patterns
(long-lived processes AND long-running tasks with notify_on_complete)
- notify_on_complete: more prescriptive about when to use it
- TERMINAL_TOOL_DESCRIPTION: remove 'Do NOT use background for builds'
guidance that contradicted the new feature
Cherry-picked from PR #5580 by MestreY0d4-Uninter.
- Share parent's credential pool with child agents for key rotation
- Leasing layer spreads parallel children across keys (least-loaded)
- Thread-safe acquire_lease/release_lease in CredentialPool
- Reverted sneaked-in tool-name restoration change (kept original
getattr + isinstance guard pattern)
* feat(tools): add Firecrawl cloud browser provider
Adds Firecrawl (https://firecrawl.dev) as a cloud browser provider
alongside Browserbase and Browser Use. All browser tools route through
Firecrawl's cloud browser via CDP when selected.
- tools/browser_providers/firecrawl.py — FirecrawlProvider
- tools/browser_tool.py — register in _PROVIDER_REGISTRY
- hermes_cli/tools_config.py — add to onboarding provider picker
- hermes_cli/setup.py — add to setup summary
- hermes_cli/config.py — add FIRECRAWL_BROWSER_TTL config
- website/docs/ — browser docs and env var reference
Based on #4490 by @developersdigest.
Co-Authored-By: Developers Digest <124798203+developersdigest@users.noreply.github.com>
* refactor: simplify FirecrawlProvider.emergency_cleanup
Use self._headers() and self._api_url() instead of duplicating
env-var reads and header construction.
* fix: recognize Firecrawl in subscription browser detection
_resolve_browser_feature_state() now handles "firecrawl" as a direct
browser provider (same pattern as "browser-use"), so hermes setup
summary correctly shows "Browser Automation (Firecrawl)" instead of
misreporting as "Local browser".
Also fixes test_config_version_unchanged assertion (11 → 12).
---------
Co-authored-by: Developers Digest <124798203+developersdigest@users.noreply.github.com>
Before launching an MCP server via npx/uvx, queries the OSV (Open Source
Vulnerabilities) API to check if the package has known malware advisories
(MAL-* IDs). Regular CVEs are ignored — only confirmed malware is blocked.
- Free, public API (Google-maintained), ~300ms per query
- Runs once per MCP server launch, inside _run_stdio() before subprocess spawn
- Parallel with other MCP servers (asyncio.gather already in place)
- Fail-open: network errors, timeouts, unrecognized commands → allow
- Parses npm (scoped @scope/pkg@version) and PyPI (name[extras]==version)
Inspired by Block/goose extension malware check.
Route AIAgent print output to stderr via _print_fn for ACP stdio sessions.
Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC
on stdout isn't corrupted. Child agents inherit the redirect.
Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>
When commands like grep, diff, test, or find return non-zero exit codes
that aren't actual errors (grep 1 = no matches, diff 1 = files differ),
the model wastes turns investigating non-problems. This adds an
exit_code_meaning field to the terminal JSON result that explains
informational exit codes, so the agent can move on instead of debugging.
Covers grep/rg/ag/ack (no matches), diff (files differ), find (partial
access), test/[ (condition false), curl (timeouts, DNS, HTTP errors),
and git (context-dependent). Correctly extracts the last command from
pipelines and chains, strips full paths and env var assignments.
The exit_code field itself is unchanged — this is purely additive context.
Bug fixes:
- agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed
re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual
env var name chars. Without this, the pattern backtracks exponentially on large
strings (e.g. 100K tool output), causing test_file_read_guards to time out.
- tools/file_operations.py: over-escaped newline in find -printf format string
produced literal backslash-n instead of a real newline, breaking file search
result parsing (total_count always 1, paths concatenated).
Test fixes:
- Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as
'Hangs in non-interactive environments' but actually run fine:
- test_413_compression.py (12 tests, 25s)
- test_file_tools_live.py (71 tests, 24s)
- test_code_execution.py (61 tests, 99s)
- test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already)
- test_413_compression.py: fix threshold values in 2 preflight compression tests
where context_length was too small for the compressed output to fit in one pass.
- test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK.
- test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when
SDK is not installed so patch() targets exist.
- test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling
of _gateway_queues — fixes race condition where resolve fires before threads
register their approval entries, causing the test to hang indefinitely.
Net effect: +256 tests recovered from skip, 8 real failures fixed.
Add docker_env option to terminal config — a dict of key-value pairs that
get set inside Docker containers via -e flags at both container creation
(docker run) and per-command execution (docker exec) time.
This complements docker_forward_env (which reads values dynamically from
the host process environment). docker_env is useful when Hermes runs as a
systemd service without access to the user's shell environment — e.g.
setting SSH_AUTH_SOCK or GNUPGHOME to known stable paths for SSH/GPG
agent socket forwarding.
Precedence: docker_env provides baseline values; docker_forward_env
overrides for the same key.
Config example:
terminal:
docker_env:
SSH_AUTH_SOCK: /run/user/1000/ssh-agent.sock
GNUPGHOME: /root/.gnupg
docker_volumes:
- /run/user/1000/ssh-agent.sock:/run/user/1000/ssh-agent.sock
- /run/user/1000/gnupg/S.gpg-agent:/root/.gnupg/S.gpg-agent
The config key skills.external_dirs and core resolution (get_all_skills_dirs,
get_external_skills_dirs in agent/skill_utils.py) already existed but several
code paths still only scanned SKILLS_DIR. Now external dirs are respected
everywhere:
- skills_categories(): scan all dirs for category discovery
- _get_category_from_path(): resolve categories against any skills root
- skill_manager_tool._find_skill(): search all dirs for edit/patch/delete
- credential_files.get_skills_directory_mount(): mount all dirs into
Docker/Singularity containers (external dirs at external_skills/<idx>)
- credential_files.iter_skills_files(): list files from all dirs for
Modal/Daytona upload
- tools/environments/ssh.py: rsync all skill dirs to remote hosts
- gateway _check_unavailable_skill(): check disabled skills across all dirs
Usage in config.yaml:
skills:
external_dirs:
- ~/repos/agent-skills/hermes
- /shared/team-skills
- Add .zip to SUPPORTED_DOCUMENT_TYPES so gateway platforms (Telegram,
Slack, Discord) cache uploaded zip files instead of rejecting them.
- Add get_cache_directory_mounts() and iter_cache_files() to
credential_files.py for host-side cache directory passthrough
(documents, images, audio, screenshots).
- Docker: bind-mount cache dirs read-only alongside credentials/skills.
Changes are live (bind mount semantics).
- Modal: mount cache files at sandbox creation + resync before each
command via _sync_files() with mtime+size change detection.
- Handles backward-compat with legacy dir names (document_cache,
image_cache, audio_cache, browser_screenshots) via get_hermes_dir().
- Container paths always use the new cache/<subdir> layout regardless
of host layout.
This replaces the need for a dedicated extract_archive tool (PR #4819)
— the agent can now use standard terminal commands (unzip, tar) on
uploaded files inside remote containers.
Closes: related to PR #4819 by kshitijk4poor
Three fixes for memory+profile isolation bugs:
1. memory_tool.py: Replace module-level MEMORY_DIR constant with
get_memory_dir() function that calls get_hermes_home() dynamically.
The old constant was cached at import time and could go stale if
HERMES_HOME changed after import. Internal MemoryStore methods now
call get_memory_dir() directly. MEMORY_DIR kept as backward-compat
alias.
2. profiles.py: profile create --clone now copies MEMORY.md and USER.md
from the source profile. These curated memory files are part of the
agent's identity (same as SOUL.md) and should carry over on clone.
3. holographic plugin: initialize() now expands $HERMES_HOME and
${HERMES_HOME} in the db_path config value, so users can write
'db_path: $HERMES_HOME/memory_store.db' and it resolves to the
active profile directory, not the default home.
Tests updated to mock get_memory_dir() alongside the legacy MEMORY_DIR.
Four fixes for MCP server stability issues reported by community member
(terminal lockup, zombie processes, escape sequence pollution, startup hang):
1. MCP reload timeout guard (cli.py): _check_config_mcp_changes now runs
_reload_mcp in a separate daemon thread with a 30s hard timeout. Previously,
a hung MCP server could block the process_loop thread indefinitely, freezing
the entire TUI (user can type but nothing happens, only Ctrl+D/Ctrl+\ work).
2. MCP stdio subprocess PID tracking (mcp_tool.py): Tracks child PIDs spawned
by stdio_client via before/after snapshots of /proc children. On shutdown,
_stop_mcp_loop force-kills any tracked PIDs that survived the SDK's graceful
SIGTERM→SIGKILL cleanup. Prevents zombie MCP server processes from
accumulating across sessions.
3. MCP event loop exception handler (mcp_tool.py): Installs
_mcp_loop_exception_handler on the MCP background event loop — same pattern
as the existing _suppress_closed_loop_errors on prompt_toolkit's loop.
Suppresses benign 'Event loop is closed' RuntimeError from httpx transport
__del__ during MCP shutdown. Salvaged from PR #2538 (acsezen).
4. MCP OAuth non-blocking (mcp_oauth.py): Replaces blocking input() call in
_wait_for_callback with OAuthNonInteractiveError raise. Adds _is_interactive()
TTY detection. In non-interactive environments, build_oauth_auth() still
returns a provider (cached tokens + refresh work), but the callback handler
raises immediately instead of blocking the MCP event loop for 120s. Re-raises
OAuth setup failures in _run_http so failed servers are reported cleanly
without blocking others. Salvaged from PRs #4521 (voidborne-d) and #4465
(heathley).
Closes#2537, closes#4462
Related: #4128, #3436
The PR changed prev_tools from list[str] to list[dict] with name/result
keys. The gateway's _step_callback_sync passed this directly to hooks
as 'tool_names', breaking user-authored hooks that call
', '.join(tool_names).
Now:
- 'tool_names' always contains strings (backward-compatible)
- 'tools' carries the enriched dicts for hooks that want results
Also adds summary logging to register_mcp_servers() and comprehensive
tests for all three PR changes:
- sanitize_mcp_name_component edge cases
- register_mcp_servers public API
- _register_session_mcp_servers ACP integration
- step_callback result forwarding
- gateway normalization backward compat
* feat(memory): add pluggable memory provider interface with profile isolation
Introduces a pluggable MemoryProvider ABC so external memory backends can
integrate with Hermes without modifying core files. Each backend becomes a
plugin implementing a standard interface, orchestrated by MemoryManager.
Key architecture:
- agent/memory_provider.py — ABC with core + optional lifecycle hooks
- agent/memory_manager.py — single integration point in the agent loop
- agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md
Profile isolation fixes applied to all 6 shipped plugins:
- Cognitive Memory: use get_hermes_home() instead of raw env var
- Hindsight Memory: check $HERMES_HOME/hindsight/config.json first,
fall back to legacy ~/.hindsight/ for backward compat
- Hermes Memory Store: replace hardcoded ~/.hermes paths with
get_hermes_home() for config loading and DB path defaults
- Mem0 Memory: use get_hermes_home() instead of raw env var
- RetainDB Memory: auto-derive profile-scoped project name from
hermes_home path (hermes-<profile>), explicit env var overrides
- OpenViking Memory: read-only, no local state, isolation via .env
MemoryManager.initialize_all() now injects hermes_home into kwargs so
every provider can resolve profile-scoped storage without importing
get_hermes_home() themselves.
Plugin system: adds register_memory_provider() to PluginContext and
get_plugin_memory_providers() accessor.
Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration).
* refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider
Remove cognitive-memory plugin (#727) — core mechanics are broken:
decay runs 24x too fast (hourly not daily), prefetch uses row ID as
timestamp, search limited by importance not similarity.
Rewrite openviking-memory plugin from a read-only search wrapper into
a full bidirectional memory provider using the complete OpenViking
session lifecycle API:
- sync_turn: records user/assistant messages to OpenViking session
(threaded, non-blocking)
- on_session_end: commits session to trigger automatic memory extraction
into 6 categories (profile, preferences, entities, events, cases,
patterns)
- prefetch: background semantic search via find() endpoint
- on_memory_write: mirrors built-in memory writes to the session
- is_available: checks env var only, no network calls (ABC compliance)
Tools expanded from 3 to 5:
- viking_search: semantic search with mode/scope/limit
- viking_read: tiered content (abstract ~100tok / overview ~2k / full)
- viking_browse: filesystem-style navigation (list/tree/stat)
- viking_remember: explicit memory storage via session
- viking_add_resource: ingest URLs/docs into knowledge base
Uses direct HTTP via httpx (no openviking SDK dependency needed).
Response truncation on viking_read to prevent context flooding.
* fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker
- Remove redundant mem0_context tool (identical to mem0_search with
rerank=true, top_k=5 — wastes a tool slot and confuses the model)
- Thread sync_turn so it's non-blocking — Mem0's server-side LLM
extraction can take 5-10s, was stalling the agent after every turn
- Add threading.Lock around _get_client() for thread-safe lazy init
(prefetch and sync threads could race on first client creation)
- Add circuit breaker: after 5 consecutive API failures, pause calls
for 120s instead of hammering a down server every turn. Auto-resets
after cooldown. Logs a warning when tripped.
- Track success/failure in prefetch, sync_turn, and all tool calls
- Wait for previous sync to finish before starting a new one (prevents
unbounded thread accumulation on rapid turns)
- Clean up shutdown to join both prefetch and sync threads
* fix(memory): enforce single external memory provider limit
MemoryManager now rejects a second non-builtin provider with a warning.
Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE
external plugin provider is allowed at a time. This prevents tool
schema bloat (some providers add 3-5 tools each) and conflicting
memory backends.
The warning message directs users to configure memory.provider in
config.yaml to select which provider to activate.
Updated all 47 tests to use builtin + one external pattern instead
of multiple externals. Added test_second_external_rejected to verify
the enforcement.
* feat(memory): add ByteRover memory provider plugin
Implements the ByteRover integration (from PR #3499 by hieuntg81) as a
MemoryProvider plugin instead of direct run_agent.py modifications.
ByteRover provides persistent memory via the brv CLI — a hierarchical
knowledge tree with tiered retrieval (fuzzy text then LLM-driven search).
Local-first with optional cloud sync.
Plugin capabilities:
- prefetch: background brv query for relevant context
- sync_turn: curate conversation turns (threaded, non-blocking)
- on_memory_write: mirror built-in memory writes to brv
- on_pre_compress: extract insights before context compression
Tools (3):
- brv_query: search the knowledge tree
- brv_curate: store facts/decisions/patterns
- brv_status: check CLI version and context tree state
Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped
per profile). Binary resolution cached with thread-safe double-checked
locking. All write operations threaded to avoid blocking the agent
(curate can take 120s with LLM processing).
* fix(memory): thread remaining sync_turns, fix holographic, add config key
Plugin fixes:
- Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread)
- RetainDB: thread sync_turn (was blocking on HTTP POST)
- Both: shutdown now joins sync threads alongside prefetch threads
Holographic retrieval fixes:
- reason(): removed dead intersection_key computation (bundled but never
used in scoring). Now reuses pre-computed entity_residuals directly,
moved role_content encoding outside the inner loop.
- contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above
500 facts, only checks the most recently updated ones to avoid O(n^2)
explosion (~125K comparisons at 500 is acceptable).
Config:
- Added memory.provider key to DEFAULT_CONFIG ("" = builtin only).
No version bump needed (deep_merge handles new keys automatically).
* feat(memory): extract Honcho as a MemoryProvider plugin
Creates plugins/honcho-memory/ as a thin adapter over the existing
honcho_integration/ package. All 4 Honcho tools (profile, search,
context, conclude) move from the normal tool registry to the
MemoryProvider interface.
The plugin delegates all work to HonchoSessionManager — no Honcho
logic is reimplemented. It uses the existing config chain:
$HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars.
Lifecycle hooks:
- initialize: creates HonchoSessionManager via existing client factory
- prefetch: background dialectic query
- sync_turn: records messages + flushes to API (threaded)
- on_memory_write: mirrors user profile writes as conclusions
- on_session_end: flushes all pending messages
This is a prerequisite for the MemoryManager wiring in run_agent.py.
Once wired, Honcho goes through the same provider interface as all
other memory plugins, and the scattered Honcho code in run_agent.py
can be consolidated into the single MemoryManager integration point.
* feat(memory): wire MemoryManager into run_agent.py
Adds 8 integration points for the external memory provider plugin,
all purely additive (zero existing code modified):
1. Init (~L1130): Create MemoryManager, find matching plugin provider
from memory.provider config, initialize with session context
2. Tool injection (~L1160): Append provider tool schemas to self.tools
and self.valid_tool_names after memory_manager init
3. System prompt (~L2705): Add external provider's system_prompt_block
alongside existing MEMORY.md/USER.md blocks
4. Tool routing (~L5362): Route provider tool calls through
memory_manager.handle_tool_call() before the catchall handler
5. Memory write bridge (~L5353): Notify external provider via
on_memory_write() when the built-in memory tool writes
6. Pre-compress (~L5233): Call on_pre_compress() before context
compression discards messages
7. Prefetch (~L6421): Inject provider prefetch results into the
current-turn user message (same pattern as Honcho turn context)
8. Turn sync + session end (~L8161, ~L8172): sync_all() after each
completed turn, queue_prefetch_all() for next turn, on_session_end()
+ shutdown_all() at conversation end
All hooks are wrapped in try/except — a failing provider never breaks
the agent. The existing memory system, Honcho integration, and all
other code paths are completely untouched.
Full suite: 7222 passed, 4 pre-existing failures.
* refactor(memory): remove legacy Honcho integration from core
Extracts all Honcho-specific code from run_agent.py, model_tools.py,
toolsets.py, and gateway/run.py. Honcho is now exclusively available
as a memory provider plugin (plugins/honcho-memory/).
Removed from run_agent.py (-457 lines):
- Honcho init block (session manager creation, activation, config)
- 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools,
_activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch,
_honcho_prefetch, _honcho_save_user_observation, _honcho_sync
- _inject_honcho_turn_context module-level function
- Honcho system prompt block (tool descriptions, CLI commands)
- Honcho context injection in api_messages building
- Honcho params from __init__ (honcho_session_key, honcho_manager,
honcho_config)
- HONCHO_TOOL_NAMES constant
- All honcho-specific tool dispatch forwarding
Removed from other files:
- model_tools.py: honcho_tools import, honcho params from handle_function_call
- toolsets.py: honcho toolset definition, honcho tools from core tools list
- gateway/run.py: honcho params from AIAgent constructor calls
Removed tests (-339 lines):
- 9 Honcho-specific test methods from test_run_agent.py
- TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py
Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that
were accidentally removed during the honcho function extraction.
The honcho_integration/ package is kept intact — the plugin delegates
to it. tools/honcho_tools.py registry entries are now dead code (import
commented out in model_tools.py) but the file is preserved for reference.
Full suite: 7207 passed, 4 pre-existing failures. Zero regressions.
* refactor(memory): restructure plugins, add CLI, clean gateway, migration notice
Plugin restructure:
- Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/
(byterover, hindsight, holographic, honcho, mem0, openviking, retaindb)
- New plugins/memory/__init__.py discovery module that scans the directory
directly, loading providers by name without the general plugin system
- run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers()
CLI wiring:
- hermes memory setup — interactive curses picker + config wizard
- hermes memory status — show active provider, config, availability
- hermes memory off — disable external provider (built-in only)
- hermes honcho — now shows migration notice pointing to hermes memory setup
Gateway cleanup:
- Remove _get_or_create_gateway_honcho (already removed in prev commit)
- Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods
- Remove all calls to shutdown methods (4 call sites)
- Remove _honcho_managers/_honcho_configs dict references
Dead code removal:
- Delete tools/honcho_tools.py (279 lines, import was already commented out)
- Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods)
- Remove if False placeholder from run_agent.py
Migration:
- Honcho migration notice on startup: detects existing honcho.json or
~/.honcho/config.json, prints guidance to run hermes memory setup.
Only fires when memory.provider is not set and not in quiet mode.
Full suite: 7203 passed, 4 pre-existing failures. Zero regressions.
* feat(memory): standardize plugin config + add per-plugin documentation
Config architecture:
- Add save_config(values, hermes_home) to MemoryProvider ABC
- Honcho: writes to $HERMES_HOME/honcho.json (SDK native)
- Mem0: writes to $HERMES_HOME/mem0.json
- Hindsight: writes to $HERMES_HOME/hindsight/config.json
- Holographic: writes to config.yaml under plugins.hermes-memory-store
- OpenViking/RetainDB/ByteRover: env-var only (default no-op)
Setup wizard (hermes memory setup):
- Now calls provider.save_config() for non-secret config
- Secrets still go to .env via env vars
- Only memory.provider activation key goes to config.yaml
Documentation:
- README.md for each of the 7 providers in plugins/memory/<name>/
- Requirements, setup (wizard + manual), config reference, tools table
- Consistent format across all providers
The contract for new memory plugins:
- get_config_schema() declares all fields (REQUIRED)
- save_config() writes native config (REQUIRED if not env-var-only)
- Secrets use env_var field in schema, written to .env by wizard
- README.md in the plugin directory
* docs: add memory providers user guide + developer guide
New pages:
- user-guide/features/memory-providers.md — comprehensive guide covering
all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight,
Holographic, RetainDB, ByteRover). Each with setup, config, tools,
cost, and unique features. Includes comparison table and profile
isolation notes.
- developer-guide/memory-provider-plugin.md — how to build a new memory
provider plugin. Covers ABC, required methods, config schema,
save_config, threading contract, profile isolation, testing.
Updated pages:
- user-guide/features/memory.md — replaced Honcho section with link to
new Memory Providers page
- user-guide/features/honcho.md — replaced with migration redirect to
the new Memory Providers page
- sidebars.ts — added both new pages to navigation
* fix(memory): auto-migrate Honcho users to memory provider plugin
When honcho.json or ~/.honcho/config.json exists but memory.provider
is not set, automatically set memory.provider: honcho in config.yaml
and activate the plugin. The plugin reads the same config files, so
all data and credentials are preserved. Zero user action needed.
Persists the migration to config.yaml so it only fires once. Prints
a one-line confirmation in non-quiet mode.
* fix(memory): only auto-migrate Honcho when enabled + credentialed
Check HonchoClientConfig.enabled AND (api_key OR base_url) before
auto-migrating — not just file existence. Prevents false activation
for users who disabled Honcho, stopped using it (config lingers),
or have ~/.honcho/ from a different tool.
* feat(memory): auto-install pip dependencies during hermes memory setup
Reads pip_dependencies from plugin.yaml, checks which are missing,
installs them via pip before config walkthrough. Also shows install
guidance for external_dependencies (e.g. brv CLI for ByteRover).
Updated all 7 plugin.yaml files with pip_dependencies:
- honcho: honcho-ai
- mem0: mem0ai
- openviking: httpx
- hindsight: hindsight-client
- holographic: (none)
- retaindb: requests
- byterover: (external_dependencies for brv CLI)
* fix: remove remaining Honcho crash risks from cli.py and gateway
cli.py: removed Honcho session re-mapping block (would crash importing
deleted tools/honcho_tools.py), Honcho flush on compress, Honcho
session display on startup, Honcho shutdown on exit, honcho_session_key
AIAgent param.
gateway/run.py: removed honcho_session_key params from helper methods,
sync_honcho param, _honcho.shutdown() block.
tests: fixed test_cron_session_with_honcho_key_skipped (was passing
removed honcho_key param to _flush_memories_for_session).
* fix: include plugins/ in pyproject.toml package list
Without this, plugins/memory/ wouldn't be included in non-editable
installs. Hermes always runs from the repo checkout so this is belt-
and-suspenders, but prevents breakage if the install method changes.
* fix(memory): correct pip-to-import name mapping for dep checks
The heuristic dep.replace('-', '_') fails for packages where the pip
name differs from the import name: honcho-ai→honcho, mem0ai→mem0,
hindsight-client→hindsight_client. Added explicit mapping table so
hermes memory setup doesn't try to reinstall already-installed packages.
* chore: remove dead code from old plugin memory registration path
- hermes_cli/plugins.py: removed register_memory_provider(),
_memory_providers list, get_plugin_memory_providers() — memory
providers now use plugins/memory/ discovery, not the general plugin system
- hermes_cli/main.py: stripped 74 lines of dead honcho argparse
subparsers (setup, status, sessions, map, peer, mode, tokens,
identity, migrate) — kept only the migration redirect
- agent/memory_provider.py: updated docstring to reflect new
registration path
- tests: replaced TestPluginMemoryProviderRegistration with
TestPluginMemoryDiscovery that tests the actual plugins/memory/
discovery system. Added 3 new tests (discover, load, nonexistent).
* chore: delete dead honcho_integration/cli.py and its tests
cli.py (794 lines) was the old 'hermes honcho' command handler — nobody
calls it since cmd_honcho was replaced with a migration redirect.
Deleted tests that imported from removed code:
- tests/honcho_integration/test_cli.py (tested _resolve_api_key)
- tests/honcho_integration/test_config_isolation.py (tested CLI config paths)
- tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py)
Remaining honcho_integration/ files (actively used by the plugin):
- client.py (445 lines) — config loading, SDK client creation
- session.py (991 lines) — session management, queries, flush
* refactor: move honcho_integration/ into the honcho plugin
Moves client.py (445 lines) and session.py (991 lines) from the
top-level honcho_integration/ package into plugins/memory/honcho/.
No Honcho code remains in the main codebase.
- plugins/memory/honcho/client.py — config loading, SDK client creation
- plugins/memory/honcho/session.py — session management, queries, flush
- Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py,
plugin __init__.py, session.py cross-import, all tests
- Removed honcho_integration/ package and pyproject.toml entry
- Renamed tests/honcho_integration/ → tests/honcho_plugin/
* docs: update architecture + gateway-internals for memory provider system
- architecture.md: replaced honcho_integration/ with plugins/memory/
- gateway-internals.md: replaced Honcho-specific session routing and
flush lifecycle docs with generic memory provider interface docs
* fix: update stale mock path for resolve_active_host after honcho plugin migration
* fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore
Review feedback from Honcho devs (erosika):
P0 — Provider lifecycle:
- Remove on_session_end() + shutdown_all() from run_conversation() tail
(was killing providers after every turn in multi-turn sessions)
- Add shutdown_memory_provider() method on AIAgent for callers
- Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry
Bug fixes:
- Remove sync_honcho=False kwarg from /btw callsites (TypeError crash)
- Fix doctor.py references to dead 'hermes honcho setup' command
- Cache prefetch_all() before tool loop (was re-calling every iteration)
ABC contract hardening (all backwards-compatible):
- Add session_id kwarg to prefetch/sync_turn/queue_prefetch
- Make on_pre_compress() return str (provider insights in compression)
- Add **kwargs to on_turn_start() for runtime context
- Add on_delegation() hook for parent-side subagent observation
- Document agent_context/agent_identity/agent_workspace kwargs on
initialize() (prevents cron corruption, enables profile scoping)
- Fix docstring: single external provider, not multiple
Honcho CLI restoration:
- Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py
with imports adapted to plugin path)
- Restore full hermes honcho command with all subcommands (status, peer,
mode, tokens, identity, enable/disable, sync, peers, --target-profile)
- Restore auto-clone on profile creation + sync on hermes update
- hermes honcho setup now redirects to hermes memory setup
* fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type
- Wire on_delegation() in delegate_tool.py — parent's memory provider
is notified with task+result after each subagent completes
- Add skip_memory=True to cron scheduler (prevents cron system prompts
from corrupting user representations — closes#4052)
- Add skip_memory=True to gateway flush agent (throwaway agent shouldn't
activate memory provider)
- Fix ByteRover on_pre_compress() return type: None -> str
* fix(honcho): port profile isolation fixes from PR #4632
Ports 5 bug fixes found during profile testing (erosika's PR #4632):
1. 3-tier config resolution — resolve_config_path() now checks
$HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json
(non-default profiles couldn't find shared host blocks)
2. Thread host=_host_key() through from_global_config() in cmd_setup,
cmd_status, cmd_identity (--target-profile was being ignored)
3. Use bare profile name as aiPeer (not host key with dots) — Honcho's
peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid
4. Wrap add_peers() in try/except — was fatal on new AI peers, killed
all message uploads for the session
5. Gate Honcho clone behind --clone/--clone-all on profile create
(bare create should be blank-slate)
Also: sanitize assistant_peer_id via _sanitize_id()
* fix(tests): add module cleanup fixture to test_cli_provider_resolution
test_cli_provider_resolution._import_cli() wipes tools.*, cli, and
run_agent from sys.modules to force fresh imports, but had no cleanup.
This poisoned all subsequent tests on the same xdist worker — mocks
targeting tools.file_tools, tools.send_message_tool, etc. patched the
NEW module object while already-imported functions still referenced
the OLD one. Caused ~25 cascade failures: send_message KeyError,
process_registry FileNotFoundError, file_read_guards timeouts,
read_loop_detection file-not-found, mcp_oauth None port, and
provider_parity/codex_execution stale tool lists.
Fix: autouse fixture saves all affected modules before each test and
restores them after, matching the pattern in
test_managed_browserbase_and_modal.py.
Three root causes addressed:
1. AIAgent no longer defaults base_url to OpenRouter (9 tests)
Tests that assert OpenRouter-specific behavior (prompt caching,
reasoning extra_body, provider preferences) need explicit base_url
and model set on the agent. Updated test_run_agent.py and
test_provider_parity.py.
2. Credential pool auto-seeding from host env (2 tests)
test_auxiliary_client.py tests for Anthropic OAuth and custom
endpoint fallback were not mocking _select_pool_entry, so the
host's credential pool interfered. Added pool + codex mocks.
3. sys.modules corruption cascade (major - ~250 tests)
test_managed_modal_environment.py replaced sys.modules entries
(tools, hermes_cli, agent packages) with SimpleNamespace stubs
but had NO cleanup fixture. Every subsequent test in the process
saw corrupted imports: 'cannot import get_config_path from
<unknown module name>' and 'module tools has no attribute
environments'. Added _restore_tool_and_agent_modules autouse
fixture matching the pattern in test_managed_browserbase_and_modal.py.
This was also the root cause of CI failures (104 failed on main).
The original test file had mock secrets corrupted by secret-redaction
tooling before commit — the test values (sk-ant...l012) didn't actually
trigger the PREFIX_RE regex, so 4 of 10 tests were asserting against
values that never appeared in the input.
- Replace truncated mock values with proper fake keys built via string
concatenation (avoids tool redaction during file writes)
- Add _ensure_redaction_enabled autouse fixture to patch the module-level
_REDACT_ENABLED constant, matching the pattern from test_redact.py
Three exfiltration vectors closed:
1. Browser URL exfil — agent could embed secrets in URL params and
navigate to attacker-controlled server. Now scans URLs for known
API key patterns before navigating (browser_navigate, web_extract).
2. Browser snapshot leak — page displaying env vars or API keys would
send secrets to auxiliary LLM via _extract_relevant_content before
run_agent.py's redaction layer sees the result. Now redacts snapshot
text before the auxiliary call.
3. Camofox annotation leak — accessibility tree text sent to vision
LLM could contain secrets visible on screen. Now redacts annotation
context before the vision call.
10 new tests covering URL blocking, snapshot redaction, and annotation
redaction for both browser and camofox backends.
PR #4419 was based on pre-credential-pools main where _config_version was 10.
The squash merge downgraded it from 11 (set by #2647) back to 10.
Also fixes the test assertion.
* feat(skills): add content size limits for agent-created skills
Agent writes via skill_manage (create/edit/patch/write_file) are now
constrained to prevent unbounded growth:
- SKILL.md and supporting files: 100,000 character limit
- Supporting files: additional 1 MiB byte limit
- Patches on oversized hand-placed skills that reduce the size are
allowed (shrink path), but patches that grow beyond the limit are
rejected
Hand-placed skills and hub-installed skills have NO hard limit —
they load and function normally regardless of size. Hub installs
get a warning in the log if SKILL.md exceeds 100k chars.
This mirrors the memory system's char_limit pattern. Without this,
the agent auto-grows skills indefinitely through iterative patches
(hermes-agent-dev reached 197k chars / 72k tokens — 40x larger than
the largest skill in the entire skills.sh ecosystem).
Constants: MAX_SKILL_CONTENT_CHARS (100k), MAX_SKILL_FILE_BYTES (1MiB)
Tests: 14 new tests covering all write paths and edge cases
* feat(skills): add fuzzy matching to skill patch
_patch_skill now uses the same 8-strategy fuzzy matching engine
(tools/fuzzy_match.py) as the file patch tool. Handles whitespace
normalization, indentation differences, escape sequences, and
block-anchor matching. Eliminates exact-match failures when agents
patch skills with minor formatting mismatches.
Adds two Camofox features:
1. Persistent browser sessions: new `browser.camofox.managed_persistence`
config option. When enabled, Hermes sends a deterministic profile-scoped
userId to Camofox so the server maps it to a persistent browser profile
directory. Cookies, logins, and browser state survive across restarts.
Default remains ephemeral (random userId per session).
2. VNC URL discovery: Camofox /health endpoint returns vncPort when running
in headed mode. Hermes constructs the VNC URL and includes it in navigate
responses so the agent can share it with users.
Also fixes camofox_vision bug where call_llm response object was passed
directly to json.dumps instead of extracting .choices[0].message.content.
Changes from original PR:
- Removed browser_evaluate tool (separate feature, needs own PR)
- Removed snapshot truncation limit change (unrelated)
- Config.yaml only for managed_persistence (no env var, no version bump)
- Rewrote tests to use config mock instead of env var
- Reverted package-lock.json churn
Co-authored-by: analista <psikonetik@gmail.com.com>
After a successful write_file or patch, update the stored read
timestamp to match the file's new modification time. Without this,
consecutive edits by the same task (read → write → write) would
false-warn on the second write because the stored timestamp still
reflected the original read, not the first write.
Also renames the internal tracker key from 'file_mtimes' to
'read_timestamps' for clarity.
Track file mtime when read_file is called. When write_file or patch
subsequently targets the same file, compare the current mtime against
the recorded one. If they differ (external edit, concurrent agent,
user change), include a _warning in the result advising the agent to
re-read. The write still proceeds — this is a soft signal, not a
hard block.
Key design points:
- Per-task isolation: task A's reads don't affect task B's writes.
- Files never read produce no warning (not enforcing read-before-write).
- mtime naturally updates after the agent's own writes, so the warning
only fires on external changes, not the agent's own edits.
- V4A multi-file patches check all target paths.
Tests: 10 new tests covering write staleness, patch staleness,
never-read files, cross-task isolation, and the helper function.
* feat(file_tools): harden read_file with size guard, dedup, and device blocking
Three improvements to read_file_tool to reduce wasted context tokens and
prevent process hangs:
1. Character-count guard: reads that produce more than 100K characters
(≈25-35K tokens across tokenisers) are rejected with an error that
tells the model to use offset+limit for a smaller range. The
effective cap is min(file_size, 100K) so small files that happen to
have long lines aren't over-penalised. Large truncated files also
get a hint nudging toward targeted reads.
2. File-read deduplication: when the same (path, offset, limit) is read
a second time and the file hasn't been modified (mtime unchanged),
return a lightweight stub instead of re-sending the full content.
Writes and patches naturally change mtime, so post-edit reads always
return fresh content. The dedup cache is cleared on context
compression — after compression the original read content is
summarised away, so the model needs the full content again.
3. Device path blocking: paths like /dev/zero, /dev/random, /dev/stdin
etc. are rejected before any I/O to prevent process hangs from
infinite-output or blocking-input devices.
Tests: 17 new tests covering all three features plus the dedup-reset-
on-compression integration. All 52 file-read tests pass (35 existing +
17 new). Full tool suite (2124 tests) passes with 0 failures.
* feat: make file_read_max_chars configurable, add docs
Add file_read_max_chars to DEFAULT_CONFIG (default 100K). read_file_tool
reads this on first call and caches for the process lifetime. Users on
large-context models can raise it; users on small local models can lower it.
Also adds a 'File Read Safety' section to the configuration docs
explaining the char limit, dedup behavior, and example values.
_load_config_files() had the same hermes_home / item pattern without
containment checks. While config.yaml is user-controlled (lower threat
than skill frontmatter), defense in depth prevents exploitation via
config injection or copy-paste mistakes.