hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-09 08:21:50 +00:00

Author	SHA1	Message	Date
adybag14-cyber	e79cc88985	feat: add tested Termux install path and EOF-aware gh auth	2026-04-09 16:24:53 -07:00
dashed	7f7b02b764	fix(slack): comprehensive mrkdwn formatting — 6 bug fixes + 52 tests Fixes blockquote > escaping, edit_message raw markdown, *bold italic* handling, HTML entity double-escaping (&amp;), Wikipedia URL parens truncation, and step numbering format. Also adds format_message to the tool-layer _send_to_platform for consistent formatting across all delivery paths. Changes: - Protect Slack entities (<@user>, <https://...\|label>, <!here>) from escaping passes - Protect blockquote > markers before HTML entity escaping - Unescape-before-escape for idempotent HTML entity handling - *bold italic* → _text_ conversion (before bold pass) - URL regex upgraded to handle balanced parentheses - mrkdwn:True flag on chat_postMessage payloads - format_message applied in edit_message and send_message_tool - 52 new tests (format, edit, streaming, splitting, tool chunking) - Use reversed(dict) idiom for placeholder restoration Based on PR #3715 by dashed, cherry-picked onto current main.	2026-04-09 14:07:32 -07:00
Dylan Socolobsky	c6dba918b3	fix(tests): fix several failing/flaky tests on main (#6777 ) * fix(tests): mock is_safe_url in tests that use example.com Tests using example.com URLs were failing because is_safe_url does a real DNS lookup which fails in environments where example.com doesn't resolve, causing the request to be blocked before reaching the already-mocked HTTP client. This should fix around 17 failing tests. These tests test logic, caching, etc. so mocking this method should not modify them in any way. TestMattermostSendUrlAsFile was already doing this so we follow the same pattern. * fix(test): use case-insensitive lookup for model context length check DEFAULT_CONTEXT_LENGTHS uses inconsistent casing (MiniMax keys are lowercase, Qwen keys are mixed-case) so the test was broken in some cases since it couldn't find the model. * fix(test): patch is_linux in systemd gateway restart test The test only patched is_macos to False but didn't patch is_linux to True. On macOS hosts, is_linux() returns False and the systemd restart code path is skipped entirely, making the assertion fail. * fix(test): use non-blocklisted env var in docker forward_env tests GITHUB_TOKEN is in api_key_env_vars and thus in _HERMES_PROVIDER_ENV_BLOCKLIST so the env var is silently dropped, we replace it with a non-blocked one like DATABASE_URL so the tests actually work. * fix(test): fully isolate _has_any_provider_configured from host env _has_any_provider_configured() checks all env vars from PROVIDER_REGISTRY (not just the 5 the tests were clearing) and also calls get_auth_status() which detects gh auth token for Copilot. On machines with any of these set, the function returns True before reaching the code path under test. Clear all registry vars and mock get_auth_status so host credentials don't interfere. * fix(test): correct path to hermes_base_env.py in tool parser tests Path(__file__).parent.parent resolved to tests/, not the project root. The file lives at environments/hermes_base_env.py so we need one more parent level. * fix(test): accept optional HTML fields in Matrix send payload _send_matrix sometimes adds format and formatted_body when the markdown library is installed. The test was doing an exact dict equality check which broke. Check required fields instead. * fix(test): add config.yaml to codex vision requirements test The test only wrote auth.json but not config.yaml, so _read_main_provider() returned empty and vision auto-detect never tried the codex provider. Add a config.yaml pointing at openai-codex so the fallback path actually resolves the client. * fix(test): clear OPENROUTER_API_KEY in _isolate_hermes_home run_agent.py calls load_hermes_dotenv() at import time, which injects API keys from ~/.hermes/.env into os.environ before any test fixture runs. This caused test_agent_loop_tool_calling to make real API calls instead of skipping, which ends up making some tests fail. * fix(test): add get_rate_limit_state to agent mock in usage report tests _show_usage now calls agent.get_rate_limit_state() for rate limit display. The SimpleNamespace mock was missing this method. * fix(test): update expected Camofox config version from 12 to 13 * fix(test): mock _get_enabled_platforms in nous managed defaults test Importing gateway.run leaks DISCORD_BOT_TOKEN into os.environ, which makes _get_enabled_platforms() return ["cli", "discord"] instead of just ["cli"]. tools_command loops per platform, so apply_nous_managed_defaults runs twice: the first call sets config values, the second sees them as already configured and returns an empty set, causing the assertion to fail.	2026-04-09 13:17:06 -07:00
Lumen Radley	e22416dd9b	fix: handle empty sudo password and false prompts	2026-04-09 02:50:07 -07:00
helix4u	e94008c404	fix(terminal): guard invalid command values	2026-04-08 21:37:51 -07:00
Teknium	e19252afc4	fix: update tests for unified spawn-per-call execution model - Docker env tests: verify _build_init_env_args() instead of per-execute Popen flags (env forwarding is now init-time only) - Docker: preserve explicit forward_env bypass of blocklist from main - Daytona tests: adapt to SDK-native timeout, _ThreadedProcessHandle, base.py interrupt handling, HERMES_STDIN_ heredoc prefix - Modal tests: fix _load_module to include _ThreadedProcessHandle stub, check ensurepip in _resolve_modal_image instead of __init__ - SSH tests: mock time.sleep on base module instead of removed ssh import - Add missing BaseEnvironment attributes to __new__()-based test fixtures	2026-04-08 17:23:15 -07:00
alt-glitch	d684d7ee7e	feat(environments): unified spawn-per-call execution layer Replace dual execution model (PersistentShellMixin + per-backend oneshot) with spawn-per-call + session snapshot for all backends except ManagedModal. Core changes: - Every command spawns a fresh bash process; session snapshot (env vars, functions, aliases) captured at init and re-sourced before each command - CWD persists via file-based read (local) or in-band stdout markers (remote) - ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends - cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop) - Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store, _save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS - Rate-limited file sync unified in base _before_execute() with _sync_files() hook - execute_oneshot() removed; all 11 call sites in code_execution_tool.py migrated to execute() - Daytona timeout wrapper replaced with SDK-native timeout parameter - persistent_shell.py deleted (291 lines) Backend-specific: - Local: process-group kill via os.killpg, file-based CWD read - Docker: -e env flags only on init_session, not per-command - SSH: shlex.quote transport, ControlMaster connection reuse - Singularity: apptainer exec with instance://, no forced --pwd - Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate - Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop - ManagedModal: unchanged (gateway owns execution); docstring added explaining why	2026-04-08 17:23:15 -07:00
jjovalle99	d46db0a1b4	fix(tools): use correct import path for mistralai SDK mistralai v2.x is a namespace package — `Mistral` class lives at `mistralai.client`, not at the top-level `mistralai` module. The previous `from mistralai import Mistral` raises ImportError at runtime. Update both production code and test fixture to use the correct path.	2026-04-08 13:47:08 -07:00
jjovalle99	5f4b93c20f	feat(tools): add Voxtral Transcribe STT provider (Mistral AI)	2026-04-08 13:47:08 -07:00
Teknium	4f467700d4	fix(doctor): only check the active memory provider, not all providers unconditionally (#6285 ) * fix(tools): skip camofox auto-cleanup when managed persistence is enabled When managed_persistence is enabled, cleanup_browser() was calling camofox_close() which destroys the server-side browser context via DELETE /sessions/{userId}, killing login sessions across cron runs. Add camofox_soft_cleanup() — a public wrapper that drops only the in-memory session entry when managed persistence is on, returning True. When persistence is off it returns False so the caller falls back to the full camofox_close(). The inactivity reaper still handles idle resource cleanup. Also surface a logger.warning() when _managed_persistence_enabled() fails to load config, replacing a silent except-and-return-False. Salvaged from #6182 by el-analista (Eduardo Perea Fernandez). Added public API wrapper to avoid cross-module private imports, and test coverage for both persistence paths. Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com> * fix(doctor): only check the active memory provider, not all providers unconditionally hermes doctor had hardcoded Honcho Memory and Mem0 Memory sections that always ran regardless of the user's memory.provider config setting. After the swappable memory provider update (#4623), users with leftover Honcho config but no active provider saw false 'broken' errors. Replaced both sections with a single Memory Provider section that reads memory.provider from config.yaml and only checks the configured provider. Users with no external provider see a green 'Built-in memory active' check. Reported by community user michaelruiz001, confirmed by Eri (Honcho). --------- Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>	2026-04-08 13:44:58 -07:00
mrshu	19b0ddce40	fix(process): correct detached crash recovery state Previously crash recovery recreated detached sessions as if they were fully managed, so polls and kills could lie about liveness and the checkpoint could forget recovered jobs after the next restart. This commit refreshes recovered host-backed sessions from real PID state, keeps checkpoint data durable, and preserves notify watcher metadata while treating sandbox-only PIDs as non-recoverable. - Persist `pid_scope` in `tools/process_registry.py` and skip recovering sandbox-backed entries without a host-visible PID handle - Refresh detached sessions on access so `get`/`poll`/`wait` and active session queries observe exited processes instead of hanging forever - Allow recovered host PIDs to be terminated honestly and requeue `notify_on_complete` watchers during checkpoint recovery - Add regression tests for durable checkpoints, detached exit/kill behavior, sandbox skip logic, and recovered notify watchers	2026-04-08 03:35:43 -07:00
Vasanthdev2004	085c1c6875	fix(browser): preserve agent-browser paths with spaces	2026-04-08 02:35:48 -07:00
Teknium	3696c74bfb	fix: preserve existing thresholds, remove pre-read byte guard - DEFAULT_RESULT_SIZE_CHARS: 50K -> 100K (match current _LARGE_RESULT_CHARS) - DEFAULT_PREVIEW_SIZE_CHARS: 2K -> 1.5K (match current _LARGE_RESULT_PREVIEW_CHARS) - Per-tool overrides all set to 100K (terminal, execute_code, search_files) - Remove pre-read byte guard (no behavioral regression vs current main) - Revert limit signature change to int=500 (match current default) - Restore original read_file schema description - Update test assertions to match 100K thresholds	2026-04-08 02:24:32 -07:00
alt-glitch	bbcff8dcd0	fix(tools): address PR review — remove _extract_raw_output, BudgetConfig everywhere, read_file hardening - Remove _extract_raw_output: persist content verbatim (fixes size mismatch bug) - Drop import aliases: import from budget_config directly, one canonical name - BudgetConfig param on maybe_persist_tool_result and enforce_turn_budget - read_file: limit=None signature, pre-read guard fires only when limit omitted (256KB) - Unify binary extensions: file_operations.py imports from binary_extensions.py - Exclude .pdf and .svg from binary set (text-based, agents may inspect) - Remove redundant outer try/except in eval path (internal fallback handles it) - Fix broken tests: update assertion strings for new persistence format - Module-level constants: _PRE_READ_MAX_BYTES, _DEFAULT_READ_LIMIT - Remove redundant pathlib import (Path already at module level) - Update spec.md with IMPLEMENTED annotations and design decisions	2026-04-08 02:24:32 -07:00
alt-glitch	65e24c942e	wip: tool result fixes -- persistence	2026-04-08 02:24:32 -07:00
Teknium	b9a5e6e247	fix: use camelCase structuredContent attr, prefer structured over text - The MCP SDK Pydantic model uses camelCase (structuredContent), not snake_case (structured_content). The original getattr was a silent no-op. - When structuredContent is present, return it AS the result instead of alongside text — the structured payload is the machine-readable data. - Move test file to tests/tools/ and fix fake class to use camelCase. - Patch _run_on_mcp_loop in tests so the handler actually executes.	2026-04-07 18:00:01 -07:00
Siddharth Balyan	f3006ebef9	refactor(tests): re-architect tests + fix CI failures (#5946 ) * refactor: re-architect tests to mirror the codebase * Update tests.yml * fix: add missing tool_error imports after registry refactor * fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers, causing test_code_execution tests to hit the Modal remote path. * fix(tests): fix update_check and telegram xdist failures - test_update_check: replace patch("hermes_cli.banner.os.getenv") with monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os directly, it uses get_hermes_home() from hermes_constants. - test_telegram_conflict/approval_buttons: provide real exception classes for telegram.error mock (NetworkError, TimedOut, BadRequest) so the except clause in connect() doesn't fail with "catching classes that do not inherit from BaseException" when xdist pollutes sys.modules. * fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock	2026-04-07 17:19:07 -07:00
kshitijk4poor	f4528c885b	feat(clipboard): add native Windows image paste support Add win32 platform branch to clipboard.py so Ctrl+V image paste works on native Windows (PowerShell / Windows Terminal), not just WSL2. Uses the same .NET System.Windows.Forms.Clipboard approach as the WSL path but calls PowerShell directly instead of powershell.exe (the WSL cross-call path). Tries 'powershell' first (Windows PowerShell 5.1, always available), then 'pwsh' (PowerShell 7+). PowerShell executable is discovered once and cached for the process lifetime. Includes 14 new tests covering: - Platform dispatch (save_clipboard_image + has_clipboard_image) - Image detection via PowerShell .NET check - Base64 PNG extraction and decode - Edge cases: no PowerShell, empty output, invalid base64, timeout	2026-04-07 12:49:39 -07:00
Teknium	caded0a5e7	fix: repair 57 failing CI tests across 14 files (#5823 ) * fix: repair 57 failing CI tests across 14 files Categories of fixes: Test isolation under xdist (-n auto): - test_hermes_logging: Strip ALL RotatingFileHandlers before each test to prevent handlers leaked from other xdist workers from polluting counts - test_code_execution: Force TERMINAL_ENV=local in setUp — prevents Modal AuthError when another test leaks TERMINAL_ENV=modal - test_timezone: Same TERMINAL_ENV fix for execute_code timezone tests - test_codex_execution_paths: Mock _resolve_turn_agent_config to ensure model resolution works regardless of xdist worker state Matrix adapter tests (nio not installed in CI): - Add _make_fake_nio() helper with real response classes for isinstance() checks in production code - Replace MagicMock(spec=nio.XxxResponse) with fake_nio instances - Wrap production method calls with patch.dict('sys.modules', {'nio': ...}) so import nio succeeds in method bodies - Use try/except instead of pytest.importorskip for nio.crypto imports (importorskip can be fooled by MagicMock in sys.modules) - test_matrix_voice: Skip entire file if nio is a mock, not just missing Stale test expectations: - test_cli_provider_resolution: _prompt_provider_choice now takes kwargs (default param added); mock getpass.getpass alongside input - test_anthropic_oauth_flow: Mock getpass.getpass (code switched from input) - test_gemini_provider: Mock models.dev + OpenRouter API lookups to test hardcoded defaults without external API variance - test_code_execution: Add notify_on_complete to blocked terminal params - test_setup_openclaw_migration: Mock prompt_choice to select 'Full setup' (new quick-setup path leads to _require_tty → sys.exit in CI) - test_skill_manager_tool: Patch get_all_skills_dirs alongside SKILLS_DIR so _find_skill searches tmp_path, not real ~/.hermes/skills/ Missing attributes in object.__new__ test runners: - test_platform_reconnect: Add session_store to _make_runner() - test_session_race_guard: Add hooks, _running_agents_ts, session_store, delivery_router to _make_runner() Production bug fix (gateway/run.py):** - Fix sentinel eviction race: _AGENT_PENDING_SENTINEL was immediately evicted by the stale-detection logic because sentinels have no get_activity_summary() method, causing _stale_idle=inf >= timeout. Guard _should_evict with 'is not _AGENT_PENDING_SENTINEL'. * fix: address remaining CI failures - test_setup_openclaw_migration: Also mock _offer_launch_chat (called at end of both quick and full setup paths) - test_code_execution: Move TERMINAL_ENV=local to module level to protect ALL test classes (TestEnvVarFiltering, TestExecuteCodeEdgeCases, TestInterruptHandling, TestHeadTailTruncation) from xdist env leaks - test_matrix: Use try/except for nio.crypto imports (importorskip can be fooled by MagicMock in sys.modules under xdist)	2026-04-07 09:58:45 -07:00
Ben Barclay	b2f477a30b	feat: switch managed browser provider from Browserbase to Browser Use (#5750 ) * feat: switch managed browser provider from Browserbase to Browser Use The Nous subscription tool gateway now routes browser automation through Browser Use instead of Browserbase. This commit: - Adds managed Nous gateway support to BrowserUseProvider (idempotency keys, X-BB-API-Key auth header, external_call_id persistence) - Removes managed gateway support from BrowserbaseProvider (now direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID) - Updates browser_tool.py fallback: prefers Browser Use over Browserbase - Updates nous_subscription.py: gateway vendor 'browser-use', auto-config sets cloud_provider='browser-use' for new subscribers - Updates tools_config.py: Nous Subscription entry now uses Browser Use - Updates setup.py, cli.py, status.py, prompt_builder.py display strings - Updates all affected tests to match new behavior Browserbase remains fully functional for users with direct API credentials. The change only affects the managed/subscription path. * chore: remove redundant Browser Use hint from system prompt * fix: upgrade Browser Use provider to v3 API - Base URL: api/v2 -> api/v3 (v2 is legacy) - Unified all endpoints to use native Browser Use paths: - POST /browsers (create session, returns cdpUrl) - PATCH /browsers/{id} with {action: stop} (close session) - Removed managed-mode branching that used Browserbase-style /v1/sessions paths — v3 gateway now supports /browsers directly - Removed unused managed_mode variable in close_session * fix(browser-use): use X-Browser-Use-API-Key header for managed mode The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key (which is a Browserbase-specific header). Using the wrong header caused a 401 AUTH_ERROR on every managed-mode browser session create. Simplified _headers() to always use X-Browser-Use-API-Key regardless of direct vs managed mode. * fix(nous_subscription): browserbase explicit provider is direct-only Since managed Nous gateway now routes through Browser Use, the browserbase explicit provider path should not check managed_browser_available (which resolves against the browser-use gateway). Simplified to direct-only with managed=False. * fix(browser-use): port missing improvements from PR #5605 - CDP URL normalization: resolve HTTP discovery URLs to websocket after cloud provider create_session() (prevents agent-browser failures) - Managed session payload: send timeout=5 and proxyCountryCode=us for gateway-backed sessions (prevents billing overruns) - Update prompt builder, browser_close schema, and module docstring to replace remaining Browserbase references with Browser Use - Dynamic /browser status detection via _get_cloud_provider() instead of hardcoded env var checks (future-proof for new providers) - Rename post_setup key from 'browserbase' to 'agent_browser' - Update setup hint to mention Browser Use alongside Browserbase - Add tests: CDP normalization, browserbase direct-only guard, managed browser-use gateway, direct browserbase fallback --------- Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>	2026-04-07 08:40:22 -04:00
Teknium	8b861b77c1	refactor: remove browser_close tool — auto-cleanup handles it (#5792 ) * refactor: remove browser_close tool — auto-cleanup handles it The browser_close tool was called in only 9% of browser sessions (13/144 navigations across 66 sessions), always redundantly — cleanup_browser() already runs via _cleanup_task_resources() at conversation end, and the background inactivity reaper catches anything else. Removing it saves one tool schema slot in every browser-enabled API call. Also fixes a latent bug: cleanup_browser() now handles Camofox sessions too (previously only Browserbase). Camofox sessions were never auto-cleaned per-task because they live in a separate dict from _active_sessions. Files changed (13): - tools/browser_tool.py: remove function, schema, registry entry; add camofox cleanup to cleanup_browser() - toolsets.py, model_tools.py, prompt_builder.py, display.py, acp_adapter/tools.py: remove browser_close from all tool lists - tests/: remove browser_close test, update toolset assertion - docs/skills: remove all browser_close references * fix: repeat browser_scroll 5x per call for meaningful page movement Most backends scroll ~100px per call — barely visible on a typical viewport. Repeating 5x gives ~500px (~half a viewport), making each scroll tool call actually useful. Backend-agnostic approach: works across all 7+ browser backends without needing to configure each one's scroll amount individually. Breaks early on error for the agent-browser path. * feat: auto-return compact snapshot from browser_navigate Every browser session starts with navigate → snapshot. Now navigate returns the compact accessibility tree snapshot inline, saving one tool call per browser task. The snapshot captures the full page DOM (not viewport-limited), so scroll position doesn't affect it. browser_snapshot remains available for refreshing after interactions or getting full=true content. Both Browserbase and Camofox paths auto-snapshot. If the snapshot fails for any reason, navigation still succeeds — the snapshot is a bonus, not a requirement. Schema descriptions updated to guide models: navigate mentions it returns a snapshot, snapshot mentions it's for refresh/full content. * refactor: slim cronjob tool schema — consolidate model/provider, drop unused params Session data (151 calls across 67 sessions) showed several schema properties were never used by models. Consolidated and cleaned up: Removed from schema (still work via backend/CLI): - skill (singular): use skills array instead - reason: pause-only, unnecessary - include_disabled: now defaults to true - base_url: extreme edge case, zero usage - provider (standalone): merged into model object Consolidated: - model + provider → single 'model' object with {model, provider} fields. If provider is omitted, the current main provider is pinned at creation time so the job stays stable even if the user changes their default. Kept: - script: useful data collection feature - skills array: standard interface for skill loading Schema shrinks from 14 to 10 properties. All backend functionality preserved — the Python function signature and handler lambda still accept every parameter. * fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli, hermes-messaging, safe), which meant it appeared in every session for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS gate only works after running 'hermes tools' explicitly. Now MoA only appears when a user explicitly enables it via 'hermes tools'. The moa toolset definition and check_fn remain unchanged — it just needs to be opted into.	2026-04-07 03:28:44 -07:00
Teknium	e120d2afac	feat: notify_on_complete for background processes (#5779 ) * feat: notify_on_complete for background processes When terminal(background=true, notify_on_complete=true), the system auto-triggers a new agent turn when the process exits — no polling needed. Changes: - ProcessSession: add notify_on_complete field - ProcessRegistry: add completion_queue, populate on _move_to_finished() - Terminal tool: add notify_on_complete parameter to schema + handler - CLI: drain completion_queue after agent turn AND during idle loop - Gateway: enhanced _run_process_watcher injects synthetic MessageEvent on completion, triggering a full agent turn - Checkpoint persistence includes notify_on_complete for crash recovery - code_execution_tool: block notify_on_complete in sandbox scripts - 15 new tests covering queue mechanics, checkpoint round-trip, schema * docs: update terminal tool descriptions for notify_on_complete - background: remove 'ONLY for servers' language, describe both patterns (long-lived processes AND long-running tasks with notify_on_complete) - notify_on_complete: more prescriptive about when to use it - TERMINAL_TOOL_DESCRIPTION: remove 'Do NOT use background for builds' guidance that contradicted the new feature	2026-04-07 02:40:16 -07:00
Mateus Scheuer Macedo	f2c11ff30c	fix(delegate): share credential pools with subagents + per-task leasing Cherry-picked from PR #5580 by MestreY0d4-Uninter. - Share parent's credential pool with child agents for key rotation - Leasing layer spreads parallel children across keys (least-loaded) - Thread-safe acquire_lease/release_lease in CredentialPool - Reverted sneaked-in tool-name restoration change (kept original getattr + isinstance guard pattern)	2026-04-06 23:01:11 -07:00
WAXLYY	c1818b7e9e	fix(tools): redact query secrets in send_message errors	2026-04-06 16:49:52 -07:00
Siddharth Balyan	7b129636f0	feat(tools): add Firecrawl cloud browser provider (#5628 ) * feat(tools): add Firecrawl cloud browser provider Adds Firecrawl (https://firecrawl.dev) as a cloud browser provider alongside Browserbase and Browser Use. All browser tools route through Firecrawl's cloud browser via CDP when selected. - tools/browser_providers/firecrawl.py — FirecrawlProvider - tools/browser_tool.py — register in _PROVIDER_REGISTRY - hermes_cli/tools_config.py — add to onboarding provider picker - hermes_cli/setup.py — add to setup summary - hermes_cli/config.py — add FIRECRAWL_BROWSER_TTL config - website/docs/ — browser docs and env var reference Based on #4490 by @developersdigest. Co-Authored-By: Developers Digest <124798203+developersdigest@users.noreply.github.com> * refactor: simplify FirecrawlProvider.emergency_cleanup Use self._headers() and self._api_url() instead of duplicating env-var reads and header construction. * fix: recognize Firecrawl in subscription browser detection _resolve_browser_feature_state() now handles "firecrawl" as a direct browser provider (same pattern as "browser-use"), so hermes setup summary correctly shows "Browser Automation (Firecrawl)" instead of misreporting as "Local browser". Also fixes test_config_version_unchanged assertion (11 → 12). --------- Co-authored-by: Developers Digest <124798203+developersdigest@users.noreply.github.com>	2026-04-07 02:35:26 +05:30
Teknium	38d8446011	feat: implement MCP OAuth 2.1 PKCE client support (#5420 ) Implement tools/mcp_oauth.py — the OAuth adapter that mcp_tool.py's existing auth: oauth hook has been waiting for. Components: - HermesTokenStorage: persists tokens + client registration to HERMES_HOME/mcp-tokens/<server>.json with 0o600 permissions - Callback handler factory: per-flow isolated HTTP handlers (safe for concurrent OAuth flows across multiple MCP servers) - OAuthClientProvider integration: wraps the MCP SDK's httpx.Auth subclass which handles discovery, DCR, PKCE, token exchange, refresh, and step-up auth (403 insufficient_scope) automatically - Non-interactive detection: warns when gateway/cron environments try to OAuth without cached tokens - Pre-registered client support: injects client_id/secret from config for servers that don't support Dynamic Client Registration (e.g. Slack) - Path traversal protection on server names - remove_oauth_tokens() for cleanup Config format: mcp_servers: sentry: url: 'https://mcp.sentry.dev/mcp' auth: oauth oauth: # all optional client_id: '...' # skip DCR client_secret: '...' # confidential client scope: 'read write' # server-provided by default Also passes oauth config dict through from mcp_tool.py (was passing only server_name and url before). E2E verified: full OAuth flow (401 → discovery → DCR → authorize → token exchange → authenticated request → tokens persisted) against local test servers. 23 unit tests + 186 MCP suite tests pass.	2026-04-05 22:08:00 -07:00
Teknium	4494fba140	feat: OSV malware check for MCP extension packages (#5305 ) Before launching an MCP server via npx/uvx, queries the OSV (Open Source Vulnerabilities) API to check if the package has known malware advisories (MAL-* IDs). Regular CVEs are ignored — only confirmed malware is blocked. - Free, public API (Google-maintained), ~300ms per query - Runs once per MCP server launch, inside _run_stdio() before subprocess spawn - Parallel with other MCP servers (asyncio.gather already in place) - Fail-open: network errors, timeouts, unrecognized commands → allow - Parses npm (scoped @scope/pkg@version) and PyPI (name[extras]==version) Inspired by Block/goose extension malware check.	2026-04-05 12:46:07 -07:00
Git-on-my-level	fcdd5447e2	fix: keep ACP stdout protocol-clean Route AIAgent print output to stderr via _print_fn for ACP stdio sessions. Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC on stdout isn't corrupted. Child agents inherit the redirect. Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>	2026-04-05 12:05:13 -07:00
Damian P	afccbf253c	fix: resolve listed messaging targets consistently	2026-04-05 11:59:28 -07:00
Teknium	aa475aef31	feat: add exit code context for common CLI tools in terminal results (#5144 ) When commands like grep, diff, test, or find return non-zero exit codes that aren't actual errors (grep 1 = no matches, diff 1 = files differ), the model wastes turns investigating non-problems. This adds an exit_code_meaning field to the terminal JSON result that explains informational exit codes, so the agent can move on instead of debugging. Covers grep/rg/ag/ack (no matches), diff (files differ), find (partial access), test/[ (condition false), curl (timeouts, DNS, HTTP errors), and git (context-dependent). Correctly extracts the last command from pipelines and chains, strips full paths and env var assignments. The exit_code field itself is unchanged — this is purely additive context.	2026-04-04 16:57:24 -07:00
LucidPaths	6367e1c4c0	fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness Bug fixes: - agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual env var name chars. Without this, the pattern backtracks exponentially on large strings (e.g. 100K tool output), causing test_file_read_guards to time out. - tools/file_operations.py: over-escaped newline in find -printf format string produced literal backslash-n instead of a real newline, breaking file search result parsing (total_count always 1, paths concatenated). Test fixes: - Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as 'Hangs in non-interactive environments' but actually run fine: - test_413_compression.py (12 tests, 25s) - test_file_tools_live.py (71 tests, 24s) - test_code_execution.py (61 tests, 99s) - test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already) - test_413_compression.py: fix threshold values in 2 preflight compression tests where context_length was too small for the compressed output to fit in one pass. - test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK. - test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when SDK is not installed so patch() targets exist. - test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling of _gateway_queues — fixes race condition where resolve fires before threads register their approval entries, causing the test to hang indefinitely. Net effect: +256 tests recovered from skip, 8 real failures fixed.	2026-04-04 10:18:57 -07:00
Teknium	43d3efd5c8	feat: add docker_env config for explicit container environment variables (#4738 ) Add docker_env option to terminal config — a dict of key-value pairs that get set inside Docker containers via -e flags at both container creation (docker run) and per-command execution (docker exec) time. This complements docker_forward_env (which reads values dynamically from the host process environment). docker_env is useful when Hermes runs as a systemd service without access to the user's shell environment — e.g. setting SSH_AUTH_SOCK or GNUPGHOME to known stable paths for SSH/GPG agent socket forwarding. Precedence: docker_env provides baseline values; docker_forward_env overrides for the same key. Config example: terminal: docker_env: SSH_AUTH_SOCK: /run/user/1000/ssh-agent.sock GNUPGHOME: /root/.gnupg docker_volumes: - /run/user/1000/ssh-agent.sock:/run/user/1000/ssh-agent.sock - /run/user/1000/gnupg/S.gpg-agent:/root/.gnupg/S.gpg-agent	2026-04-03 23:30:12 -07:00
Teknium	ad4feeaf0d	feat: wire skills.external_dirs into all remaining discovery paths The config key skills.external_dirs and core resolution (get_all_skills_dirs, get_external_skills_dirs in agent/skill_utils.py) already existed but several code paths still only scanned SKILLS_DIR. Now external dirs are respected everywhere: - skills_categories(): scan all dirs for category discovery - _get_category_from_path(): resolve categories against any skills root - skill_manager_tool._find_skill(): search all dirs for edit/patch/delete - credential_files.get_skills_directory_mount(): mount all dirs into Docker/Singularity containers (external dirs at external_skills/<idx>) - credential_files.iter_skills_files(): list files from all dirs for Modal/Daytona upload - tools/environments/ssh.py: rsync all skill dirs to remote hosts - gateway _check_unavailable_skill(): check disabled skills across all dirs Usage in config.yaml: skills: external_dirs: - ~/repos/agent-skills/hermes - /shared/team-skills	2026-04-03 21:14:42 -07:00
Tranquil-Flow	3bfb39a25f	fix(gateway): isolate approval session key per turn	2026-04-03 17:50:01 -07:00
Teknium	b1756084a3	feat: add .zip document support and auto-mount cache dirs into remote backends (#4846 ) - Add .zip to SUPPORTED_DOCUMENT_TYPES so gateway platforms (Telegram, Slack, Discord) cache uploaded zip files instead of rejecting them. - Add get_cache_directory_mounts() and iter_cache_files() to credential_files.py for host-side cache directory passthrough (documents, images, audio, screenshots). - Docker: bind-mount cache dirs read-only alongside credentials/skills. Changes are live (bind mount semantics). - Modal: mount cache files at sandbox creation + resync before each command via _sync_files() with mtime+size change detection. - Handles backward-compat with legacy dir names (document_cache, image_cache, audio_cache, browser_screenshots) via get_hermes_dir(). - Container paths always use the new cache/<subdir> layout regardless of host layout. This replaces the need for a dedicated extract_archive tool (PR #4819) — the agent can now use standard terminal commands (unzip, tar) on uploaded files inside remote containers. Closes: related to PR #4819 by kshitijk4poor	2026-04-03 13:16:26 -07:00
Teknium	8a384628a5	fix(memory): profile-scoped memory isolation and clone support (#4845 ) Three fixes for memory+profile isolation bugs: 1. memory_tool.py: Replace module-level MEMORY_DIR constant with get_memory_dir() function that calls get_hermes_home() dynamically. The old constant was cached at import time and could go stale if HERMES_HOME changed after import. Internal MemoryStore methods now call get_memory_dir() directly. MEMORY_DIR kept as backward-compat alias. 2. profiles.py: profile create --clone now copies MEMORY.md and USER.md from the source profile. These curated memory files are part of the agent's identity (same as SOUL.md) and should carry over on clone. 3. holographic plugin: initialize() now expands $HERMES_HOME and ${HERMES_HOME} in the db_path config value, so users can write 'db_path: $HERMES_HOME/memory_store.db' and it resolves to the active profile directory, not the default home. Tests updated to mock get_memory_dir() alongside the legacy MEMORY_DIR.	2026-04-03 13:10:11 -07:00
Teknium	cc54818d26	fix(mcp): stability fix pack — reload timeout, shutdown cleanup, event loop handler, OAuth non-blocking (#4757 ) Four fixes for MCP server stability issues reported by community member (terminal lockup, zombie processes, escape sequence pollution, startup hang): 1. MCP reload timeout guard (cli.py): _check_config_mcp_changes now runs _reload_mcp in a separate daemon thread with a 30s hard timeout. Previously, a hung MCP server could block the process_loop thread indefinitely, freezing the entire TUI (user can type but nothing happens, only Ctrl+D/Ctrl+\ work). 2. MCP stdio subprocess PID tracking (mcp_tool.py): Tracks child PIDs spawned by stdio_client via before/after snapshots of /proc children. On shutdown, _stop_mcp_loop force-kills any tracked PIDs that survived the SDK's graceful SIGTERM→SIGKILL cleanup. Prevents zombie MCP server processes from accumulating across sessions. 3. MCP event loop exception handler (mcp_tool.py): Installs _mcp_loop_exception_handler on the MCP background event loop — same pattern as the existing _suppress_closed_loop_errors on prompt_toolkit's loop. Suppresses benign 'Event loop is closed' RuntimeError from httpx transport __del__ during MCP shutdown. Salvaged from PR #2538 (acsezen). 4. MCP OAuth non-blocking (mcp_oauth.py): Replaces blocking input() call in _wait_for_callback with OAuthNonInteractiveError raise. Adds _is_interactive() TTY detection. In non-interactive environments, build_oauth_auth() still returns a provider (cached tokens + refresh work), but the callback handler raises immediately instead of blocking the MCP event loop for 120s. Re-raises OAuth setup failures in _run_http so failed servers are reported cleanly without blocking others. Salvaged from PRs #4521 (voidborne-d) and #4465 (heathley). Closes #2537, closes #4462 Related: #4128, #3436	2026-04-03 02:29:20 -07:00
Teknium	21c2d32471	fix(gateway): normalize step_callback prev_tools for backward compat The PR changed prev_tools from list[str] to list[dict] with name/result keys. The gateway's _step_callback_sync passed this directly to hooks as 'tool_names', breaking user-authored hooks that call ', '.join(tool_names). Now: - 'tool_names' always contains strings (backward-compatible) - 'tools' carries the enriched dicts for hooks that want results Also adds summary logging to register_mcp_servers() and comprehensive tests for all three PR changes: - sanitize_mcp_name_component edge cases - register_mcp_servers public API - _register_session_mcp_servers ACP integration - step_callback result forwarding - gateway normalization backward compat	2026-04-02 20:54:27 -07:00
Teknium	924bc67eee	feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623 ) * feat(memory): add pluggable memory provider interface with profile isolation Introduces a pluggable MemoryProvider ABC so external memory backends can integrate with Hermes without modifying core files. Each backend becomes a plugin implementing a standard interface, orchestrated by MemoryManager. Key architecture: - agent/memory_provider.py — ABC with core + optional lifecycle hooks - agent/memory_manager.py — single integration point in the agent loop - agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md Profile isolation fixes applied to all 6 shipped plugins: - Cognitive Memory: use get_hermes_home() instead of raw env var - Hindsight Memory: check $HERMES_HOME/hindsight/config.json first, fall back to legacy ~/.hindsight/ for backward compat - Hermes Memory Store: replace hardcoded ~/.hermes paths with get_hermes_home() for config loading and DB path defaults - Mem0 Memory: use get_hermes_home() instead of raw env var - RetainDB Memory: auto-derive profile-scoped project name from hermes_home path (hermes-<profile>), explicit env var overrides - OpenViking Memory: read-only, no local state, isolation via .env MemoryManager.initialize_all() now injects hermes_home into kwargs so every provider can resolve profile-scoped storage without importing get_hermes_home() themselves. Plugin system: adds register_memory_provider() to PluginContext and get_plugin_memory_providers() accessor. Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration). * refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider Remove cognitive-memory plugin (#727) — core mechanics are broken: decay runs 24x too fast (hourly not daily), prefetch uses row ID as timestamp, search limited by importance not similarity. Rewrite openviking-memory plugin from a read-only search wrapper into a full bidirectional memory provider using the complete OpenViking session lifecycle API: - sync_turn: records user/assistant messages to OpenViking session (threaded, non-blocking) - on_session_end: commits session to trigger automatic memory extraction into 6 categories (profile, preferences, entities, events, cases, patterns) - prefetch: background semantic search via find() endpoint - on_memory_write: mirrors built-in memory writes to the session - is_available: checks env var only, no network calls (ABC compliance) Tools expanded from 3 to 5: - viking_search: semantic search with mode/scope/limit - viking_read: tiered content (abstract ~100tok / overview ~2k / full) - viking_browse: filesystem-style navigation (list/tree/stat) - viking_remember: explicit memory storage via session - viking_add_resource: ingest URLs/docs into knowledge base Uses direct HTTP via httpx (no openviking SDK dependency needed). Response truncation on viking_read to prevent context flooding. * fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker - Remove redundant mem0_context tool (identical to mem0_search with rerank=true, top_k=5 — wastes a tool slot and confuses the model) - Thread sync_turn so it's non-blocking — Mem0's server-side LLM extraction can take 5-10s, was stalling the agent after every turn - Add threading.Lock around _get_client() for thread-safe lazy init (prefetch and sync threads could race on first client creation) - Add circuit breaker: after 5 consecutive API failures, pause calls for 120s instead of hammering a down server every turn. Auto-resets after cooldown. Logs a warning when tripped. - Track success/failure in prefetch, sync_turn, and all tool calls - Wait for previous sync to finish before starting a new one (prevents unbounded thread accumulation on rapid turns) - Clean up shutdown to join both prefetch and sync threads * fix(memory): enforce single external memory provider limit MemoryManager now rejects a second non-builtin provider with a warning. Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE external plugin provider is allowed at a time. This prevents tool schema bloat (some providers add 3-5 tools each) and conflicting memory backends. The warning message directs users to configure memory.provider in config.yaml to select which provider to activate. Updated all 47 tests to use builtin + one external pattern instead of multiple externals. Added test_second_external_rejected to verify the enforcement. * feat(memory): add ByteRover memory provider plugin Implements the ByteRover integration (from PR #3499 by hieuntg81) as a MemoryProvider plugin instead of direct run_agent.py modifications. ByteRover provides persistent memory via the brv CLI — a hierarchical knowledge tree with tiered retrieval (fuzzy text then LLM-driven search). Local-first with optional cloud sync. Plugin capabilities: - prefetch: background brv query for relevant context - sync_turn: curate conversation turns (threaded, non-blocking) - on_memory_write: mirror built-in memory writes to brv - on_pre_compress: extract insights before context compression Tools (3): - brv_query: search the knowledge tree - brv_curate: store facts/decisions/patterns - brv_status: check CLI version and context tree state Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped per profile). Binary resolution cached with thread-safe double-checked locking. All write operations threaded to avoid blocking the agent (curate can take 120s with LLM processing). * fix(memory): thread remaining sync_turns, fix holographic, add config key Plugin fixes: - Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread) - RetainDB: thread sync_turn (was blocking on HTTP POST) - Both: shutdown now joins sync threads alongside prefetch threads Holographic retrieval fixes: - reason(): removed dead intersection_key computation (bundled but never used in scoring). Now reuses pre-computed entity_residuals directly, moved role_content encoding outside the inner loop. - contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above 500 facts, only checks the most recently updated ones to avoid O(n^2) explosion (~125K comparisons at 500 is acceptable). Config: - Added memory.provider key to DEFAULT_CONFIG ("" = builtin only). No version bump needed (deep_merge handles new keys automatically). * feat(memory): extract Honcho as a MemoryProvider plugin Creates plugins/honcho-memory/ as a thin adapter over the existing honcho_integration/ package. All 4 Honcho tools (profile, search, context, conclude) move from the normal tool registry to the MemoryProvider interface. The plugin delegates all work to HonchoSessionManager — no Honcho logic is reimplemented. It uses the existing config chain: $HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars. Lifecycle hooks: - initialize: creates HonchoSessionManager via existing client factory - prefetch: background dialectic query - sync_turn: records messages + flushes to API (threaded) - on_memory_write: mirrors user profile writes as conclusions - on_session_end: flushes all pending messages This is a prerequisite for the MemoryManager wiring in run_agent.py. Once wired, Honcho goes through the same provider interface as all other memory plugins, and the scattered Honcho code in run_agent.py can be consolidated into the single MemoryManager integration point. * feat(memory): wire MemoryManager into run_agent.py Adds 8 integration points for the external memory provider plugin, all purely additive (zero existing code modified): 1. Init (~L1130): Create MemoryManager, find matching plugin provider from memory.provider config, initialize with session context 2. Tool injection (~L1160): Append provider tool schemas to self.tools and self.valid_tool_names after memory_manager init 3. System prompt (~L2705): Add external provider's system_prompt_block alongside existing MEMORY.md/USER.md blocks 4. Tool routing (~L5362): Route provider tool calls through memory_manager.handle_tool_call() before the catchall handler 5. Memory write bridge (~L5353): Notify external provider via on_memory_write() when the built-in memory tool writes 6. Pre-compress (~L5233): Call on_pre_compress() before context compression discards messages 7. Prefetch (~L6421): Inject provider prefetch results into the current-turn user message (same pattern as Honcho turn context) 8. Turn sync + session end (~L8161, ~L8172): sync_all() after each completed turn, queue_prefetch_all() for next turn, on_session_end() + shutdown_all() at conversation end All hooks are wrapped in try/except — a failing provider never breaks the agent. The existing memory system, Honcho integration, and all other code paths are completely untouched. Full suite: 7222 passed, 4 pre-existing failures. * refactor(memory): remove legacy Honcho integration from core Extracts all Honcho-specific code from run_agent.py, model_tools.py, toolsets.py, and gateway/run.py. Honcho is now exclusively available as a memory provider plugin (plugins/honcho-memory/). Removed from run_agent.py (-457 lines): - Honcho init block (session manager creation, activation, config) - 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools, _activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch, _honcho_prefetch, _honcho_save_user_observation, _honcho_sync - _inject_honcho_turn_context module-level function - Honcho system prompt block (tool descriptions, CLI commands) - Honcho context injection in api_messages building - Honcho params from __init__ (honcho_session_key, honcho_manager, honcho_config) - HONCHO_TOOL_NAMES constant - All honcho-specific tool dispatch forwarding Removed from other files: - model_tools.py: honcho_tools import, honcho params from handle_function_call - toolsets.py: honcho toolset definition, honcho tools from core tools list - gateway/run.py: honcho params from AIAgent constructor calls Removed tests (-339 lines): - 9 Honcho-specific test methods from test_run_agent.py - TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that were accidentally removed during the honcho function extraction. The honcho_integration/ package is kept intact — the plugin delegates to it. tools/honcho_tools.py registry entries are now dead code (import commented out in model_tools.py) but the file is preserved for reference. Full suite: 7207 passed, 4 pre-existing failures. Zero regressions. * refactor(memory): restructure plugins, add CLI, clean gateway, migration notice Plugin restructure: - Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/ (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb) - New plugins/memory/__init__.py discovery module that scans the directory directly, loading providers by name without the general plugin system - run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers() CLI wiring: - hermes memory setup — interactive curses picker + config wizard - hermes memory status — show active provider, config, availability - hermes memory off — disable external provider (built-in only) - hermes honcho — now shows migration notice pointing to hermes memory setup Gateway cleanup: - Remove _get_or_create_gateway_honcho (already removed in prev commit) - Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods - Remove all calls to shutdown methods (4 call sites) - Remove _honcho_managers/_honcho_configs dict references Dead code removal: - Delete tools/honcho_tools.py (279 lines, import was already commented out) - Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods) - Remove if False placeholder from run_agent.py Migration: - Honcho migration notice on startup: detects existing honcho.json or ~/.honcho/config.json, prints guidance to run hermes memory setup. Only fires when memory.provider is not set and not in quiet mode. Full suite: 7203 passed, 4 pre-existing failures. Zero regressions. * feat(memory): standardize plugin config + add per-plugin documentation Config architecture: - Add save_config(values, hermes_home) to MemoryProvider ABC - Honcho: writes to $HERMES_HOME/honcho.json (SDK native) - Mem0: writes to $HERMES_HOME/mem0.json - Hindsight: writes to $HERMES_HOME/hindsight/config.json - Holographic: writes to config.yaml under plugins.hermes-memory-store - OpenViking/RetainDB/ByteRover: env-var only (default no-op) Setup wizard (hermes memory setup): - Now calls provider.save_config() for non-secret config - Secrets still go to .env via env vars - Only memory.provider activation key goes to config.yaml Documentation: - README.md for each of the 7 providers in plugins/memory/<name>/ - Requirements, setup (wizard + manual), config reference, tools table - Consistent format across all providers The contract for new memory plugins: - get_config_schema() declares all fields (REQUIRED) - save_config() writes native config (REQUIRED if not env-var-only) - Secrets use env_var field in schema, written to .env by wizard - README.md in the plugin directory * docs: add memory providers user guide + developer guide New pages: - user-guide/features/memory-providers.md — comprehensive guide covering all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover). Each with setup, config, tools, cost, and unique features. Includes comparison table and profile isolation notes. - developer-guide/memory-provider-plugin.md — how to build a new memory provider plugin. Covers ABC, required methods, config schema, save_config, threading contract, profile isolation, testing. Updated pages: - user-guide/features/memory.md — replaced Honcho section with link to new Memory Providers page - user-guide/features/honcho.md — replaced with migration redirect to the new Memory Providers page - sidebars.ts — added both new pages to navigation * fix(memory): auto-migrate Honcho users to memory provider plugin When honcho.json or ~/.honcho/config.json exists but memory.provider is not set, automatically set memory.provider: honcho in config.yaml and activate the plugin. The plugin reads the same config files, so all data and credentials are preserved. Zero user action needed. Persists the migration to config.yaml so it only fires once. Prints a one-line confirmation in non-quiet mode. * fix(memory): only auto-migrate Honcho when enabled + credentialed Check HonchoClientConfig.enabled AND (api_key OR base_url) before auto-migrating — not just file existence. Prevents false activation for users who disabled Honcho, stopped using it (config lingers), or have ~/.honcho/ from a different tool. * feat(memory): auto-install pip dependencies during hermes memory setup Reads pip_dependencies from plugin.yaml, checks which are missing, installs them via pip before config walkthrough. Also shows install guidance for external_dependencies (e.g. brv CLI for ByteRover). Updated all 7 plugin.yaml files with pip_dependencies: - honcho: honcho-ai - mem0: mem0ai - openviking: httpx - hindsight: hindsight-client - holographic: (none) - retaindb: requests - byterover: (external_dependencies for brv CLI) * fix: remove remaining Honcho crash risks from cli.py and gateway cli.py: removed Honcho session re-mapping block (would crash importing deleted tools/honcho_tools.py), Honcho flush on compress, Honcho session display on startup, Honcho shutdown on exit, honcho_session_key AIAgent param. gateway/run.py: removed honcho_session_key params from helper methods, sync_honcho param, _honcho.shutdown() block. tests: fixed test_cron_session_with_honcho_key_skipped (was passing removed honcho_key param to _flush_memories_for_session). * fix: include plugins/ in pyproject.toml package list Without this, plugins/memory/ wouldn't be included in non-editable installs. Hermes always runs from the repo checkout so this is belt- and-suspenders, but prevents breakage if the install method changes. * fix(memory): correct pip-to-import name mapping for dep checks The heuristic dep.replace('-', '_') fails for packages where the pip name differs from the import name: honcho-ai→honcho, mem0ai→mem0, hindsight-client→hindsight_client. Added explicit mapping table so hermes memory setup doesn't try to reinstall already-installed packages. * chore: remove dead code from old plugin memory registration path - hermes_cli/plugins.py: removed register_memory_provider(), _memory_providers list, get_plugin_memory_providers() — memory providers now use plugins/memory/ discovery, not the general plugin system - hermes_cli/main.py: stripped 74 lines of dead honcho argparse subparsers (setup, status, sessions, map, peer, mode, tokens, identity, migrate) — kept only the migration redirect - agent/memory_provider.py: updated docstring to reflect new registration path - tests: replaced TestPluginMemoryProviderRegistration with TestPluginMemoryDiscovery that tests the actual plugins/memory/ discovery system. Added 3 new tests (discover, load, nonexistent). * chore: delete dead honcho_integration/cli.py and its tests cli.py (794 lines) was the old 'hermes honcho' command handler — nobody calls it since cmd_honcho was replaced with a migration redirect. Deleted tests that imported from removed code: - tests/honcho_integration/test_cli.py (tested _resolve_api_key) - tests/honcho_integration/test_config_isolation.py (tested CLI config paths) - tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py) Remaining honcho_integration/ files (actively used by the plugin): - client.py (445 lines) — config loading, SDK client creation - session.py (991 lines) — session management, queries, flush * refactor: move honcho_integration/ into the honcho plugin Moves client.py (445 lines) and session.py (991 lines) from the top-level honcho_integration/ package into plugins/memory/honcho/. No Honcho code remains in the main codebase. - plugins/memory/honcho/client.py — config loading, SDK client creation - plugins/memory/honcho/session.py — session management, queries, flush - Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py, plugin __init__.py, session.py cross-import, all tests - Removed honcho_integration/ package and pyproject.toml entry - Renamed tests/honcho_integration/ → tests/honcho_plugin/ * docs: update architecture + gateway-internals for memory provider system - architecture.md: replaced honcho_integration/ with plugins/memory/ - gateway-internals.md: replaced Honcho-specific session routing and flush lifecycle docs with generic memory provider interface docs * fix: update stale mock path for resolve_active_host after honcho plugin migration * fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore Review feedback from Honcho devs (erosika): P0 — Provider lifecycle: - Remove on_session_end() + shutdown_all() from run_conversation() tail (was killing providers after every turn in multi-turn sessions) - Add shutdown_memory_provider() method on AIAgent for callers - Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry Bug fixes: - Remove sync_honcho=False kwarg from /btw callsites (TypeError crash) - Fix doctor.py references to dead 'hermes honcho setup' command - Cache prefetch_all() before tool loop (was re-calling every iteration) ABC contract hardening (all backwards-compatible): - Add session_id kwarg to prefetch/sync_turn/queue_prefetch - Make on_pre_compress() return str (provider insights in compression) - Add *kwargs to on_turn_start() for runtime context - Add on_delegation() hook for parent-side subagent observation - Document agent_context/agent_identity/agent_workspace kwargs on initialize() (prevents cron corruption, enables profile scoping) - Fix docstring: single external provider, not multiple Honcho CLI restoration: - Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py with imports adapted to plugin path) - Restore full hermes honcho command with all subcommands (status, peer, mode, tokens, identity, enable/disable, sync, peers, --target-profile) - Restore auto-clone on profile creation + sync on hermes update - hermes honcho setup now redirects to hermes memory setup fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type - Wire on_delegation() in delegate_tool.py — parent's memory provider is notified with task+result after each subagent completes - Add skip_memory=True to cron scheduler (prevents cron system prompts from corrupting user representations — closes #4052) - Add skip_memory=True to gateway flush agent (throwaway agent shouldn't activate memory provider) - Fix ByteRover on_pre_compress() return type: None -> str * fix(honcho): port profile isolation fixes from PR #4632 Ports 5 bug fixes found during profile testing (erosika's PR #4632): 1. 3-tier config resolution — resolve_config_path() now checks $HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json (non-default profiles couldn't find shared host blocks) 2. Thread host=_host_key() through from_global_config() in cmd_setup, cmd_status, cmd_identity (--target-profile was being ignored) 3. Use bare profile name as aiPeer (not host key with dots) — Honcho's peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid 4. Wrap add_peers() in try/except — was fatal on new AI peers, killed all message uploads for the session 5. Gate Honcho clone behind --clone/--clone-all on profile create (bare create should be blank-slate) Also: sanitize assistant_peer_id via _sanitize_id() * fix(tests): add module cleanup fixture to test_cli_provider_resolution test_cli_provider_resolution._import_cli() wipes tools.*, cli, and run_agent from sys.modules to force fresh imports, but had no cleanup. This poisoned all subsequent tests on the same xdist worker — mocks targeting tools.file_tools, tools.send_message_tool, etc. patched the NEW module object while already-imported functions still referenced the OLD one. Caused ~25 cascade failures: send_message KeyError, process_registry FileNotFoundError, file_read_guards timeouts, read_loop_detection file-not-found, mcp_oauth None port, and provider_parity/codex_execution stale tool lists. Fix: autouse fixture saves all affected modules before each test and restores them after, matching the pattern in test_managed_browserbase_and_modal.py.	2026-04-02 15:33:51 -07:00
Teknium	acea9ee20b	fix(tests): fix 11 real test failures + major cascade poisoner (#4570 ) Three root causes addressed: 1. AIAgent no longer defaults base_url to OpenRouter (9 tests) Tests that assert OpenRouter-specific behavior (prompt caching, reasoning extra_body, provider preferences) need explicit base_url and model set on the agent. Updated test_run_agent.py and test_provider_parity.py. 2. Credential pool auto-seeding from host env (2 tests) test_auxiliary_client.py tests for Anthropic OAuth and custom endpoint fallback were not mocking _select_pool_entry, so the host's credential pool interfered. Added pool + codex mocks. 3. sys.modules corruption cascade (major - ~250 tests) test_managed_modal_environment.py replaced sys.modules entries (tools, hermes_cli, agent packages) with SimpleNamespace stubs but had NO cleanup fixture. Every subsequent test in the process saw corrupted imports: 'cannot import get_config_path from <unknown module name>' and 'module tools has no attribute environments'. Added _restore_tool_and_agent_modules autouse fixture matching the pattern in test_managed_browserbase_and_modal.py. This was also the root cause of CI failures (104 failed on main).	2026-04-02 08:43:06 -07:00
Ben Barclay	a2e56d044b	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-04-02 11:00:35 +11:00
Teknium	1515e8c8f2	fix: rewrite test mock secrets and add redaction fixture The original test file had mock secrets corrupted by secret-redaction tooling before commit — the test values (sk-ant...l012) didn't actually trigger the PREFIX_RE regex, so 4 of 10 tests were asserting against values that never appeared in the input. - Replace truncated mock values with proper fake keys built via string concatenation (avoids tool redaction during file writes) - Add _ensure_redaction_enabled autouse fixture to patch the module-level _REDACT_ENABLED constant, matching the pattern from test_redact.py	2026-04-01 12:03:56 -07:00
0xbyt4	712aa44325	security: block secret exfiltration via browser URLs and auxiliary LLM calls Three exfiltration vectors closed: 1. Browser URL exfil — agent could embed secrets in URL params and navigate to attacker-controlled server. Now scans URLs for known API key patterns before navigating (browser_navigate, web_extract). 2. Browser snapshot leak — page displaying env vars or API keys would send secrets to auxiliary LLM via _extract_relevant_content before run_agent.py's redaction layer sees the result. Now redacts snapshot text before the auxiliary call. 3. Camofox annotation leak — accessibility tree text sent to vision LLM could contain secrets visible on screen. Now redacts annotation context before the vision call. 10 new tests covering URL blocking, snapshot redaction, and annotation redaction for both browser and camofox backends.	2026-04-01 12:03:56 -07:00
Teknium	e0abf2416d	fix: restore _config_version to 11 (reverted by stale-branch merge in #4419 ) (#4440 ) PR #4419 was based on pre-credential-pools main where _config_version was 10. The squash merge downgraded it from 11 (set by #2647) back to 10. Also fixes the test assertion.	2026-04-01 04:34:04 -07:00
Teknium	f6ada27d1c	feat(skills): size limits for agent writes + fuzzy matching for patch (#4414 ) * feat(skills): add content size limits for agent-created skills Agent writes via skill_manage (create/edit/patch/write_file) are now constrained to prevent unbounded growth: - SKILL.md and supporting files: 100,000 character limit - Supporting files: additional 1 MiB byte limit - Patches on oversized hand-placed skills that reduce the size are allowed (shrink path), but patches that grow beyond the limit are rejected Hand-placed skills and hub-installed skills have NO hard limit — they load and function normally regardless of size. Hub installs get a warning in the log if SKILL.md exceeds 100k chars. This mirrors the memory system's char_limit pattern. Without this, the agent auto-grows skills indefinitely through iterative patches (hermes-agent-dev reached 197k chars / 72k tokens — 40x larger than the largest skill in the entire skills.sh ecosystem). Constants: MAX_SKILL_CONTENT_CHARS (100k), MAX_SKILL_FILE_BYTES (1MiB) Tests: 14 new tests covering all write paths and edge cases * feat(skills): add fuzzy matching to skill patch _patch_skill now uses the same 8-strategy fuzzy matching engine (tools/fuzzy_match.py) as the file patch tool. Handles whitespace normalization, indentation differences, escape sequences, and block-anchor matching. Eliminates exact-match failures when agents patch skills with minor formatting mismatches.	2026-04-01 04:19:19 -07:00
Teknium	70744add15	feat(browser): add persistent Camofox sessions and VNC URL discovery (salvage #4400 ) (#4419 ) Adds two Camofox features: 1. Persistent browser sessions: new `browser.camofox.managed_persistence` config option. When enabled, Hermes sends a deterministic profile-scoped userId to Camofox so the server maps it to a persistent browser profile directory. Cookies, logins, and browser state survive across restarts. Default remains ephemeral (random userId per session). 2. VNC URL discovery: Camofox /health endpoint returns vncPort when running in headed mode. Hermes constructs the VNC URL and includes it in navigate responses so the agent can share it with users. Also fixes camofox_vision bug where call_llm response object was passed directly to json.dumps instead of extracting .choices[0].message.content. Changes from original PR: - Removed browser_evaluate tool (separate feature, needs own PR) - Removed snapshot truncation limit change (unrelated) - Config.yaml only for managed_persistence (no env var, no version bump) - Rewrote tests to use config mock instead of env var - Reverted package-lock.json churn Co-authored-by: analista <psikonetik@gmail.com.com>	2026-04-01 04:18:50 -07:00
Teknium	ef2ae3e48f	fix(file_tools): refresh staleness timestamp after writes (#4390 ) After a successful write_file or patch, update the stored read timestamp to match the file's new modification time. Without this, consecutive edits by the same task (read → write → write) would false-warn on the second write because the stored timestamp still reflected the original read, not the first write. Also renames the internal tracker key from 'file_mtimes' to 'read_timestamps' for clarity.	2026-04-01 00:50:08 -07:00
Teknium	f04986029c	feat(file_tools): detect stale files on write and patch (#4345 ) Track file mtime when read_file is called. When write_file or patch subsequently targets the same file, compare the current mtime against the recorded one. If they differ (external edit, concurrent agent, user change), include a _warning in the result advising the agent to re-read. The write still proceeds — this is a soft signal, not a hard block. Key design points: - Per-task isolation: task A's reads don't affect task B's writes. - Files never read produce no warning (not enforcing read-before-write). - mtime naturally updates after the agent's own writes, so the warning only fires on external changes, not the agent's own edits. - V4A multi-file patches check all target paths. Tests: 10 new tests covering write staleness, patch staleness, never-read files, cross-task isolation, and the helper function.	2026-03-31 14:49:00 -07:00
Teknium	e3f8347be3	feat(file_tools): harden read_file with size guard, dedup, and device blocking (#4315 ) * feat(file_tools): harden read_file with size guard, dedup, and device blocking Three improvements to read_file_tool to reduce wasted context tokens and prevent process hangs: 1. Character-count guard: reads that produce more than 100K characters (≈25-35K tokens across tokenisers) are rejected with an error that tells the model to use offset+limit for a smaller range. The effective cap is min(file_size, 100K) so small files that happen to have long lines aren't over-penalised. Large truncated files also get a hint nudging toward targeted reads. 2. File-read deduplication: when the same (path, offset, limit) is read a second time and the file hasn't been modified (mtime unchanged), return a lightweight stub instead of re-sending the full content. Writes and patches naturally change mtime, so post-edit reads always return fresh content. The dedup cache is cleared on context compression — after compression the original read content is summarised away, so the model needs the full content again. 3. Device path blocking: paths like /dev/zero, /dev/random, /dev/stdin etc. are rejected before any I/O to prevent process hangs from infinite-output or blocking-input devices. Tests: 17 new tests covering all three features plus the dedup-reset- on-compression integration. All 52 file-read tests pass (35 existing + 17 new). Full tool suite (2124 tests) passes with 0 failures. * feat: make file_read_max_chars configurable, add docs Add file_read_max_chars to DEFAULT_CONFIG (default 100K). read_file_tool reads this on first call and caches for the process lifetime. Users on large-context models can raise it; users on small local models can lower it. Also adds a 'File Read Safety' section to the configuration docs explaining the char limit, dedup behavior, and example values.	2026-03-31 12:53:19 -07:00
Teknium	7f78deebe7	fix: apply same path traversal checks to config-based credential files _load_config_files() had the same hermes_home / item pattern without containment checks. While config.yaml is user-controlled (lower threat than skill frontmatter), defense in depth prevents exploitation via config injection or copy-paste mistakes.	2026-03-31 12:16:37 -07:00

1 2 3 4 5 ...

382 commits