diff --git a/SECURITY.md b/SECURITY.md new file mode 100644 index 000000000..3cede2885 --- /dev/null +++ b/SECURITY.md @@ -0,0 +1,84 @@ +# Hermes Agent Security Policy + +This document outlines the security protocols, trust model, and deployment hardening guidelines for the **Hermes Agent** project. + +## 1. Vulnerability Reporting + +Hermes Agent does **not** operate a bug bounty program. Security issues should be reported via [GitHub Security Advisories (GHSA)](https://github.com/NousResearch/hermes-agent/security/advisories/new) or by emailing **security@nousresearch.com**. Do not open public issues for security vulnerabilities. + +### Required Submission Details +- **Title & Severity:** Concise description and CVSS score/rating. +- **Affected Component:** Exact file path and line range (e.g., `tools/approval.py:120-145`). +- **Environment:** Output of `hermes version`, commit SHA, OS, and Python version. +- **Reproduction:** Step-by-step Proof-of-Concept (PoC) against `main` or the latest release. +- **Impact:** Explanation of what trust boundary was crossed. + +--- + +## 2. Trust Model + +The core assumption is that Hermes is a **personal agent** with one trusted operator. + +### Operator & Session Trust +- **Single Tenant:** The system protects the operator from LLM actions, not from malicious co-tenants. Multi-user isolation must happen at the OS/host level. +- **Gateway Security:** Authorized callers (Telegram, Discord, Slack, etc.) receive equal trust. Session keys are used for routing, not as authorization boundaries. +- **Execution:** Defaults to `terminal.backend: local` (direct host execution). Container isolation (Docker, Modal, Daytona) is opt-in for sandboxing. + +### Dangerous Command Approval +The approval system (`tools/approval.py`) is a core security boundary. Terminal commands, file operations, and other potentially destructive actions are gated behind explicit user confirmation before execution. The approval mode is configurable via `approvals.mode` in `config.yaml`: +- `"on"` (default) — prompts the user to approve dangerous commands. +- `"auto"` — auto-approves after a configurable delay. +- `"off"` — disables the gate entirely (break-glass; see Section 3). + +### Output Redaction +`agent/redact.py` strips secret-like patterns (API keys, tokens, credentials) from all display output before it reaches the terminal or gateway platform. This prevents accidental credential leakage in chat logs, tool previews, and response text. Redaction operates on the display layer only — underlying values remain intact for internal agent operations. + +### Skills vs. MCP Servers +- **Installed Skills:** High trust. Equivalent to local host code; skills can read environment variables and run arbitrary commands. +- **MCP Servers:** Lower trust. MCP subprocesses receive a filtered environment (`_build_safe_env()` in `tools/mcp_tool.py`) — only safe baseline variables (`PATH`, `HOME`, `XDG_*`) plus variables explicitly declared in the server's `env` config block are passed through. Host credentials are stripped by default. Additionally, packages invoked via `npx`/`uvx` are checked against the OSV malware database before spawning. + +### Code Execution Sandbox +The `execute_code` tool (`tools/code_execution_tool.py`) runs LLM-generated Python scripts in a child process with API keys and tokens stripped from the environment to prevent credential exfiltration. Only environment variables explicitly declared by loaded skills (via `env_passthrough`) or by the user in `config.yaml` (`terminal.env_passthrough`) are passed through. The child accesses Hermes tools via RPC, not direct API calls. + +### Subagents +- **No recursive delegation:** The `delegate_task` tool is disabled for child agents. +- **Depth limit:** `MAX_DEPTH = 2` — parent (depth 0) can spawn a child (depth 1); grandchildren are rejected. +- **Memory isolation:** Subagents run with `skip_memory=True` and do not have access to the parent's persistent memory provider. The parent receives only the task prompt and final response as an observation. + +--- + +## 3. Out of Scope (Non-Vulnerabilities) + +The following scenarios are **not** considered security breaches: +- **Prompt Injection:** Unless it results in a concrete bypass of the approval system, toolset restrictions, or container sandbox. +- **Public Exposure:** Deploying the gateway to the public internet without external authentication or network protection. +- **Trusted State Access:** Reports that require pre-existing write access to `~/.hermes/`, `.env`, or `config.yaml` (these are operator-owned files). +- **Default Behavior:** Host-level command execution when `terminal.backend` is set to `local` — this is the documented default, not a vulnerability. +- **Configuration Trade-offs:** Intentional break-glass settings such as `approvals.mode: "off"` or `terminal.backend: local` in production. +- **Tool-level read/access restrictions:** The agent has unrestricted shell access via the `terminal` tool by design. Reports that a specific tool (e.g., `read_file`) can access a resource are not vulnerabilities if the same access is available through `terminal`. Tool-level deny lists only constitute a meaningful security boundary when paired with equivalent restrictions on the terminal side (as with write operations, where `WRITE_DENIED_PATHS` is paired with the dangerous command approval system). + +--- + +## 4. Deployment Hardening & Best Practices + +### Filesystem & Network +- **Production sandboxing:** Use container backends (`docker`, `modal`, `daytona`) instead of `local` for untrusted workloads. +- **File permissions:** Run as non-root (the Docker image uses UID 10000); protect credentials with `chmod 600 ~/.hermes/.env` on local installs. +- **Network exposure:** Do not expose the gateway or API server to the public internet without VPN, Tailscale, or firewall protection. SSRF protection is enabled by default across all gateway platform adapters (Telegram, Discord, Slack, Matrix, Mattermost, etc.) with redirect validation. Note: the local terminal backend does not apply SSRF filtering, as it operates within the trusted operator's environment. + +### Skills & Supply Chain +- **Skill installation:** Review Skills Guard reports (`tools/skills_guard.py`) before installing third-party skills. The audit log at `~/.hermes/skills/.hub/audit.log` tracks every install and removal. +- **MCP safety:** OSV malware checking runs automatically for `npx`/`uvx` packages before MCP server processes are spawned. +- **CI/CD:** GitHub Actions are pinned to full commit SHAs. The `supply-chain-audit.yml` workflow blocks PRs containing `.pth` files or suspicious `base64`+`exec` patterns. + +### Credential Storage +- API keys and tokens belong exclusively in `~/.hermes/.env` — never in `config.yaml` or checked into version control. +- The credential pool system (`agent/credential_pool.py`) handles key rotation and fallback. Credentials are resolved from environment variables, not stored in plaintext databases. + +--- + +## 5. Disclosure Process + +- **Coordinated Disclosure:** 90-day window or until a fix is released, whichever comes first. +- **Communication:** All updates occur via the GHSA thread or email correspondence with security@nousresearch.com. +- **Credits:** Reporters are credited in release notes unless anonymity is requested. diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py index 4d2331548..479776428 100644 --- a/agent/auxiliary_client.py +++ b/agent/auxiliary_client.py @@ -1835,9 +1835,15 @@ def auxiliary_max_tokens_param(value: int) -> dict: # Every auxiliary LLM consumer should use these instead of manually # constructing clients and calling .chat.completions.create(). -# Client cache: (provider, async_mode, base_url, api_key) -> (client, default_model) +# Client cache: (provider, async_mode, base_url, api_key, api_mode, runtime_key) -> (client, default_model, loop) +# NOTE: loop identity is NOT part of the key. On async cache hits we check +# whether the cached loop is the *current* loop; if not, the stale entry is +# replaced in-place. This bounds cache growth to one entry per unique +# provider config rather than one per (config × event-loop), which previously +# caused unbounded fd accumulation in long-running gateway processes (#10200). _client_cache: Dict[tuple, tuple] = {} _client_cache_lock = threading.Lock() +_CLIENT_CACHE_MAX_SIZE = 64 # safety belt — evict oldest when exceeded def neuter_async_httpx_del() -> None: @@ -1970,39 +1976,49 @@ def _get_cached_client( Async clients (AsyncOpenAI) use httpx.AsyncClient internally, which binds to the event loop that was current when the client was created. Using such a client on a *different* loop causes deadlocks or - RuntimeError. To prevent cross-loop issues (especially in gateway - mode where _run_async() may spawn fresh loops in worker threads), the - cache key for async clients includes the current event loop's identity - so each loop gets its own client instance. + RuntimeError. To prevent cross-loop issues, the cache validates on + every async hit that the cached loop is the *current, open* loop. + If the loop changed (e.g. a new gateway worker-thread loop), the stale + entry is replaced in-place rather than creating an additional entry. + + This keeps cache size bounded to one entry per unique provider config, + preventing the fd-exhaustion that previously occurred in long-running + gateways where recycled worker threads created unbounded entries (#10200). """ - # Include loop identity for async clients to prevent cross-loop reuse. - # httpx.AsyncClient (inside AsyncOpenAI) is bound to the loop where it - # was created — reusing it on a different loop causes deadlocks (#2681). - loop_id = 0 + # Resolve the current event loop for async clients so we can validate + # cached entries. Loop identity is NOT in the cache key — instead we + # check at hit time whether the cached loop is still current and open. + # This prevents unbounded cache growth from recycled worker-thread loops + # while still guaranteeing we never reuse a client on the wrong loop + # (which causes deadlocks, see #2681). current_loop = None if async_mode: try: import asyncio as _aio current_loop = _aio.get_event_loop() - loop_id = id(current_loop) except RuntimeError: pass runtime = _normalize_main_runtime(main_runtime) runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else () - cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", loop_id, runtime_key) + cache_key = (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key) with _client_cache_lock: if cache_key in _client_cache: cached_client, cached_default, cached_loop = _client_cache[cache_key] if async_mode: - # A cached async client whose loop has been closed will raise - # "Event loop is closed" when httpx tries to clean up its - # transport. Discard the stale client and create a fresh one. - if cached_loop is not None and cached_loop.is_closed(): - _force_close_async_httpx(cached_client) - del _client_cache[cache_key] - else: + # Validate: the cached client must be bound to the CURRENT, + # OPEN loop. If the loop changed or was closed, the httpx + # transport inside is dead — force-close and replace. + loop_ok = ( + cached_loop is not None + and cached_loop is current_loop + and not cached_loop.is_closed() + ) + if loop_ok: effective = _compat_model(cached_client, model, cached_default) return cached_client, effective + # Stale — evict and fall through to create a new client. + _force_close_async_httpx(cached_client) + del _client_cache[cache_key] else: effective = _compat_model(cached_client, model, cached_default) return cached_client, effective @@ -2022,6 +2038,12 @@ def _get_cached_client( bound_loop = current_loop with _client_cache_lock: if cache_key not in _client_cache: + # Safety belt: if the cache has grown beyond the max, evict + # the oldest entries (FIFO — dict preserves insertion order). + while len(_client_cache) >= _CLIENT_CACHE_MAX_SIZE: + evict_key, evict_entry = next(iter(_client_cache.items())) + _force_close_async_httpx(evict_entry[0]) + del _client_cache[evict_key] _client_cache[cache_key] = (client, default_model, bound_loop) else: client, default_model, _ = _client_cache[cache_key] diff --git a/cron/jobs.py b/cron/jobs.py index 47e0b66ef..06d782888 100644 --- a/cron/jobs.py +++ b/cron/jobs.py @@ -501,6 +501,12 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]] if schedule_changed: updated_schedule = updated["schedule"] + # The API may pass schedule as a raw string (e.g. "every 10m") + # instead of a pre-parsed dict. Normalize it the same way + # create_job() does so downstream code can call .get() safely. + if isinstance(updated_schedule, str): + updated_schedule = parse_schedule(updated_schedule) + updated["schedule"] = updated_schedule updated["schedule_display"] = updates.get( "schedule_display", updated_schedule.get("display", updated.get("schedule_display")), diff --git a/gateway/platforms/discord.py b/gateway/platforms/discord.py index 2d2ea93f9..091b15f61 100644 --- a/gateway/platforms/discord.py +++ b/gateway/platforms/discord.py @@ -1802,6 +1802,76 @@ class DiscordAdapter(BasePlatformAdapter): async def slash_btw(interaction: discord.Interaction, question: str): await self._run_simple_slash(interaction, f"/btw {question}") + # ── Auto-register any gateway-available commands not yet on the tree ── + # This ensures new commands added to COMMAND_REGISTRY in + # hermes_cli/commands.py automatically appear as Discord slash + # commands without needing a manual entry here. + try: + from hermes_cli.commands import COMMAND_REGISTRY, _is_gateway_available, _resolve_config_gates + + already_registered = set() + try: + already_registered = {cmd.name for cmd in tree.get_commands()} + except Exception: + pass + + config_overrides = _resolve_config_gates() + + for cmd_def in COMMAND_REGISTRY: + if not _is_gateway_available(cmd_def, config_overrides): + continue + # Discord command names: lowercase, hyphens OK, max 32 chars. + discord_name = cmd_def.name.lower()[:32] + if discord_name in already_registered: + continue + # Skip aliases that overlap with already-registered names + # (aliases for explicitly registered commands are handled above). + desc = (cmd_def.description or f"Run /{cmd_def.name}")[:100] + has_args = bool(cmd_def.args_hint) + + if has_args: + # Command takes optional arguments — create handler with + # an optional ``args`` string parameter. + def _make_args_handler(_name: str, _hint: str): + @discord.app_commands.describe(args=f"Arguments: {_hint}"[:100]) + async def _handler(interaction: discord.Interaction, args: str = ""): + await self._run_simple_slash( + interaction, f"/{_name} {args}".strip() + ) + _handler.__name__ = f"auto_slash_{_name.replace('-', '_')}" + return _handler + + handler = _make_args_handler(cmd_def.name, cmd_def.args_hint) + else: + # Parameterless command. + def _make_simple_handler(_name: str): + async def _handler(interaction: discord.Interaction): + await self._run_simple_slash(interaction, f"/{_name}") + _handler.__name__ = f"auto_slash_{_name.replace('-', '_')}" + return _handler + + handler = _make_simple_handler(cmd_def.name) + + auto_cmd = discord.app_commands.Command( + name=discord_name, + description=desc, + callback=handler, + ) + try: + tree.add_command(auto_cmd) + already_registered.add(discord_name) + except Exception: + # Silently skip commands that fail registration (e.g. + # name conflict with a subcommand group). + pass + + logger.debug( + "Discord auto-registered %d commands from COMMAND_REGISTRY", + len(already_registered), + ) + except Exception as e: + logger.warning("Discord auto-register from COMMAND_REGISTRY failed: %s", e) + # Register skills under a single /skill command group with category # subcommand groups. This uses 1 top-level slot instead of N, # supporting up to 25 categories × 25 skills = 625 skills. diff --git a/gateway/platforms/helpers.py b/gateway/platforms/helpers.py index c834dd89c..18d97fcb7 100644 --- a/gateway/platforms/helpers.py +++ b/gateway/platforms/helpers.py @@ -49,7 +49,10 @@ class MessageDeduplicator: return False now = time.time() if msg_id in self._seen: - return True + if now - self._seen[msg_id] < self._ttl: + return True + # Entry has expired — remove it and treat as new + del self._seen[msg_id] self._seen[msg_id] = now if len(self._seen) > self._max_size: cutoff = now - self._ttl diff --git a/gateway/run.py b/gateway/run.py index 0d8bbe5fb..d9a2bfd2f 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -3881,6 +3881,18 @@ class GatewayRunner: pass response = agent_result.get("final_response") or "" + + # Convert the agent's internal "(empty)" sentinel into a + # user-friendly message. "(empty)" means the model failed to + # produce visible content after exhausting all retries (nudge, + # prefill, empty-retry, fallback). Sending the raw sentinel + # looks like a bug; a short explanation is more helpful. + if response == "(empty)": + response = ( + "⚠️ The model returned no response after processing tool " + "results. This can happen with some models — try again or " + "rephrase your question." + ) agent_messages = agent_result.get("messages", []) _response_time = time.time() - _msg_start_time _api_calls = agent_result.get("api_calls", 0) @@ -6900,11 +6912,17 @@ class GatewayRunner: }) async def _handle_debug_command(self, event: MessageEvent) -> str: - """Handle /debug — upload debug report + logs and return paste URLs.""" + """Handle /debug — upload debug report (summary only) and return paste URLs. + + Gateway uploads ONLY the summary report (system info + log tails), + NOT full log files, to protect conversation privacy. Users who need + full log uploads should use ``hermes debug share`` from the CLI. + """ import asyncio from hermes_cli.debug import ( - _capture_dump, collect_debug_report, _read_full_log, - upload_to_pastebin, + _capture_dump, collect_debug_report, + upload_to_pastebin, _schedule_auto_delete, + _GATEWAY_PRIVACY_NOTICE, ) loop = asyncio.get_running_loop() @@ -6913,43 +6931,25 @@ class GatewayRunner: def _collect_and_upload(): dump_text = _capture_dump() report = collect_debug_report(log_lines=200, dump_text=dump_text) - agent_log = _read_full_log("agent") - gateway_log = _read_full_log("gateway") - - if agent_log: - agent_log = dump_text + "\n\n--- full agent.log ---\n" + agent_log - if gateway_log: - gateway_log = dump_text + "\n\n--- full gateway.log ---\n" + gateway_log urls = {} - failures = [] - try: urls["Report"] = upload_to_pastebin(report) except Exception as exc: return f"✗ Failed to upload debug report: {exc}" - if agent_log: - try: - urls["agent.log"] = upload_to_pastebin(agent_log) - except Exception: - failures.append("agent.log") + # Schedule auto-deletion after 1 hour + _schedule_auto_delete(list(urls.values())) - if gateway_log: - try: - urls["gateway.log"] = upload_to_pastebin(gateway_log) - except Exception: - failures.append("gateway.log") - - lines = ["**Debug report uploaded:**", ""] + lines = [_GATEWAY_PRIVACY_NOTICE, "", "**Debug report uploaded:**", ""] label_width = max(len(k) for k in urls) for label, url in urls.items(): lines.append(f"`{label:<{label_width}}` {url}") - if failures: - lines.append(f"\n_(failed to upload: {', '.join(failures)})_") - - lines.append("\nShare these links with the Hermes team for support.") + lines.append("") + lines.append("⏱ Pastes will auto-delete in 1 hour.") + lines.append("For full log uploads, use `hermes debug share` from the CLI.") + lines.append("Share these links with the Hermes team for support.") return "\n".join(lines) return await loop.run_in_executor(None, _collect_and_upload) @@ -9509,9 +9509,19 @@ class GatewayRunner: # BUT: never suppress delivery when the agent failed — the error # message is new content the user hasn't seen, and it must reach # them even if streaming had sent earlier partial output. + # + # Also never suppress when the final response is "(empty)" — this + # means the model failed to produce content after tool calls (common + # with mimo-v2-pro, GLM-5, etc.). The stream consumer may have + # sent intermediate text ("Let me search for that…") alongside the + # tool call, setting already_sent=True, but that text is NOT the + # final answer. Suppressing delivery here leaves the user staring + # at silence. (#10xxx — "agent stops after web search") _sc = stream_consumer_holder[0] if _sc and isinstance(response, dict) and not response.get("failed"): - if ( + _final = response.get("final_response") or "" + _is_empty_sentinel = not _final or _final == "(empty)" + if not _is_empty_sentinel and ( getattr(_sc, "final_response_sent", False) or getattr(_sc, "already_sent", False) ): @@ -9822,9 +9832,9 @@ def main(): config = None if args.config: - import json + import yaml with open(args.config, encoding="utf-8") as f: - data = json.load(f) + data = yaml.safe_load(f) config = GatewayConfig.from_dict(data) # Run the gateway - exit with code 1 if no platforms connected, diff --git a/gateway/session_context.py b/gateway/session_context.py index b9fdcdfaf..7f8aca3eb 100644 --- a/gateway/session_context.py +++ b/gateway/session_context.py @@ -37,18 +37,24 @@ needs to replace the import + call site: """ from contextvars import ContextVar +from typing import Any + +# Sentinel to distinguish "never set in this context" from "explicitly set to empty". +# When a contextvar holds _UNSET, we fall back to os.environ (CLI/cron compat). +# When it holds "" (after clear_session_vars resets it), we return "" — no fallback. +_UNSET: Any = object() # --------------------------------------------------------------------------- # Per-task session variables # --------------------------------------------------------------------------- -_SESSION_PLATFORM: ContextVar[str] = ContextVar("HERMES_SESSION_PLATFORM", default="") -_SESSION_CHAT_ID: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_ID", default="") -_SESSION_CHAT_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_CHAT_NAME", default="") -_SESSION_THREAD_ID: ContextVar[str] = ContextVar("HERMES_SESSION_THREAD_ID", default="") -_SESSION_USER_ID: ContextVar[str] = ContextVar("HERMES_SESSION_USER_ID", default="") -_SESSION_USER_NAME: ContextVar[str] = ContextVar("HERMES_SESSION_USER_NAME", default="") -_SESSION_KEY: ContextVar[str] = ContextVar("HERMES_SESSION_KEY", default="") +_SESSION_PLATFORM: ContextVar = ContextVar("HERMES_SESSION_PLATFORM", default=_UNSET) +_SESSION_CHAT_ID: ContextVar = ContextVar("HERMES_SESSION_CHAT_ID", default=_UNSET) +_SESSION_CHAT_NAME: ContextVar = ContextVar("HERMES_SESSION_CHAT_NAME", default=_UNSET) +_SESSION_THREAD_ID: ContextVar = ContextVar("HERMES_SESSION_THREAD_ID", default=_UNSET) +_SESSION_USER_ID: ContextVar = ContextVar("HERMES_SESSION_USER_ID", default=_UNSET) +_SESSION_USER_NAME: ContextVar = ContextVar("HERMES_SESSION_USER_NAME", default=_UNSET) +_SESSION_KEY: ContextVar = ContextVar("HERMES_SESSION_KEY", default=_UNSET) _VAR_MAP = { "HERMES_SESSION_PLATFORM": _SESSION_PLATFORM, @@ -91,10 +97,17 @@ def set_session_vars( def clear_session_vars(tokens: list) -> None: - """Restore session context variables to their pre-handler values.""" - if not tokens: - return - vars_in_order = [ + """Mark session context variables as explicitly cleared. + + Sets all variables to ``""`` so that ``get_session_env`` returns an empty + string instead of falling back to (potentially stale) ``os.environ`` + values. The *tokens* argument is accepted for API compatibility with + callers that saved the return value of ``set_session_vars``, but the + actual clearing uses ``var.set("")`` rather than ``var.reset(token)`` + to ensure the "explicitly cleared" state is distinguishable from + "never set" (which holds the ``_UNSET`` sentinel). + """ + for var in ( _SESSION_PLATFORM, _SESSION_CHAT_ID, _SESSION_CHAT_NAME, @@ -102,9 +115,8 @@ def clear_session_vars(tokens: list) -> None: _SESSION_USER_ID, _SESSION_USER_NAME, _SESSION_KEY, - ] - for var, token in zip(vars_in_order, tokens): - var.reset(token) + ): + var.set("") def get_session_env(name: str, default: str = "") -> str: @@ -113,8 +125,13 @@ def get_session_env(name: str, default: str = "") -> str: Drop-in replacement for ``os.getenv("HERMES_SESSION_*", default)``. Resolution order: - 1. Context variable (set by the gateway for concurrency-safe access) - 2. ``os.environ`` (used by CLI, cron scheduler, and tests) + 1. Context variable (set by the gateway for concurrency-safe access). + If the variable was explicitly set (even to ``""``) via + ``set_session_vars`` or ``clear_session_vars``, that value is + returned — **no fallback to os.environ**. + 2. ``os.environ`` (only when the context variable was never set in + this context — i.e. CLI, cron scheduler, and test processes that + don't use ``set_session_vars`` at all). 3. *default* """ import os @@ -122,7 +139,7 @@ def get_session_env(name: str, default: str = "") -> str: var = _VAR_MAP.get(name) if var is not None: value = var.get() - if value: + if value is not _UNSET: return value # Fall back to os.environ for CLI, cron, and test compatibility return os.getenv(name, default) diff --git a/hermes_cli/debug.py b/hermes_cli/debug.py index 3607db923..12cdb1ba6 100644 --- a/hermes_cli/debug.py +++ b/hermes_cli/debug.py @@ -27,6 +27,110 @@ _DPASTE_COM_URL = "https://dpaste.com/api/" # paste.rs caps at ~1 MB; we stay under that with headroom. _MAX_LOG_BYTES = 512_000 +# Auto-delete pastes after this many seconds (1 hour). +_AUTO_DELETE_SECONDS = 3600 + + +# --------------------------------------------------------------------------- +# Privacy / delete helpers +# --------------------------------------------------------------------------- + +_PRIVACY_NOTICE = """\ +⚠️ This will upload the following to a public paste service: + • System info (OS, Python version, Hermes version, provider, which API keys + are configured — NOT the actual keys) + • Recent log lines (agent.log, errors.log, gateway.log — may contain + conversation fragments and file paths) + • Full agent.log and gateway.log (up to 512 KB each — likely contains + conversation content, tool outputs, and file paths) + +Pastes auto-delete after 1 hour. +""" + +_GATEWAY_PRIVACY_NOTICE = ( + "⚠️ **Privacy notice:** This uploads system info + recent log tails " + "(may contain conversation fragments) to a public paste service. " + "Full logs are NOT included from the gateway — use `hermes debug share` " + "from the CLI for full log uploads.\n" + "Pastes auto-delete after 1 hour." +) + + +def _extract_paste_id(url: str) -> Optional[str]: + """Extract the paste ID from a paste.rs or dpaste.com URL. + + Returns the ID string, or None if the URL doesn't match a known service. + """ + url = url.strip().rstrip("/") + for prefix in ("https://paste.rs/", "http://paste.rs/"): + if url.startswith(prefix): + return url[len(prefix):] + return None + + +def delete_paste(url: str) -> bool: + """Delete a paste from paste.rs. Returns True on success. + + Only paste.rs supports unauthenticated DELETE. dpaste.com pastes + expire automatically but cannot be deleted via API. + """ + paste_id = _extract_paste_id(url) + if not paste_id: + raise ValueError( + f"Cannot delete: only paste.rs URLs are supported. Got: {url}" + ) + + target = f"{_PASTE_RS_URL}{paste_id}" + req = urllib.request.Request( + target, method="DELETE", + headers={"User-Agent": "hermes-agent/debug-share"}, + ) + with urllib.request.urlopen(req, timeout=30) as resp: + return 200 <= resp.status < 300 + + +def _schedule_auto_delete(urls: list[str], delay_seconds: int = _AUTO_DELETE_SECONDS): + """Spawn a detached process to delete paste.rs pastes after *delay_seconds*. + + The child process is fully detached (``start_new_session=True``) so it + survives the parent exiting (important for CLI mode). Only paste.rs + URLs are attempted — dpaste.com pastes auto-expire on their own. + """ + import subprocess + + paste_rs_urls = [u for u in urls if _extract_paste_id(u)] + if not paste_rs_urls: + return + + # Build a tiny inline Python script. No imports beyond stdlib. + url_list = ", ".join(f'"{u}"' for u in paste_rs_urls) + script = ( + "import time, urllib.request; " + f"time.sleep({delay_seconds}); " + f"[urllib.request.urlopen(urllib.request.Request(u, method='DELETE', " + f"headers={{'User-Agent': 'hermes-agent/auto-delete'}}), timeout=15) " + f"for u in [{url_list}]]" + ) + + try: + subprocess.Popen( + [sys.executable, "-c", script], + start_new_session=True, + stdout=subprocess.DEVNULL, + stderr=subprocess.DEVNULL, + ) + except Exception: + pass # Best-effort; manual delete still available. + + +def _delete_hint(url: str) -> str: + """Return a one-liner delete command for the given paste URL.""" + paste_id = _extract_paste_id(url) + if paste_id: + return f"hermes debug delete {url}" + # dpaste.com — no API delete, expires on its own. + return "(auto-expires per dpaste.com policy)" + def _upload_paste_rs(content: str) -> str: """Upload to paste.rs. Returns the paste URL. @@ -250,6 +354,9 @@ def run_debug_share(args): expiry = getattr(args, "expire", 7) local_only = getattr(args, "local", False) + if not local_only: + print(_PRIVACY_NOTICE) + print("Collecting debug report...") # Capture dump once — prepended to every paste for context. @@ -315,22 +422,56 @@ def run_debug_share(args): if failures: print(f"\n (failed to upload: {', '.join(failures)})") + # Schedule auto-deletion after 1 hour + _schedule_auto_delete(list(urls.values())) + print(f"\n⏱ Pastes will auto-delete in 1 hour.") + + # Manual delete fallback + print(f"To delete now: hermes debug delete ") + print(f"\nShare these links with the Hermes team for support.") +def run_debug_delete(args): + """Delete one or more paste URLs uploaded by /debug.""" + urls = getattr(args, "urls", []) + if not urls: + print("Usage: hermes debug delete [ ...]") + print(" Deletes paste.rs pastes uploaded by 'hermes debug share'.") + return + + for url in urls: + try: + ok = delete_paste(url) + if ok: + print(f" ✓ Deleted: {url}") + else: + print(f" ✗ Failed to delete: {url} (unexpected response)") + except ValueError as exc: + print(f" ✗ {exc}") + except Exception as exc: + print(f" ✗ Could not delete {url}: {exc}") + + def run_debug(args): """Route debug subcommands.""" subcmd = getattr(args, "debug_command", None) if subcmd == "share": run_debug_share(args) + elif subcmd == "delete": + run_debug_delete(args) else: # Default: show help - print("Usage: hermes debug share [--lines N] [--expire N] [--local]") + print("Usage: hermes debug ") print() print("Commands:") print(" share Upload debug report to a paste service and print URL") + print(" delete Delete a previously uploaded paste") print() - print("Options:") + print("Options (share):") print(" --lines N Number of log lines to include (default: 200)") print(" --expire N Paste expiry in days (default: 7)") print(" --local Print report locally instead of uploading") + print() + print("Options (delete):") + print(" ... One or more paste URLs to delete") diff --git a/hermes_cli/main.py b/hermes_cli/main.py index 21d544d87..266dd4250 100644 --- a/hermes_cli/main.py +++ b/hermes_cli/main.py @@ -5356,6 +5356,7 @@ Examples: hermes debug share --lines 500 Include more log lines hermes debug share --expire 30 Keep paste for 30 days hermes debug share --local Print report locally (no upload) + hermes debug delete Delete a previously uploaded paste """, ) debug_sub = debug_parser.add_subparsers(dest="debug_command") @@ -5375,6 +5376,14 @@ Examples: "--local", action="store_true", help="Print the report locally instead of uploading", ) + delete_parser = debug_sub.add_parser( + "delete", + help="Delete a paste uploaded by 'hermes debug share'", + ) + delete_parser.add_argument( + "urls", nargs="*", default=[], + help="One or more paste URLs to delete (e.g. https://paste.rs/abc123)", + ) debug_parser.set_defaults(func=cmd_debug) # ========================================================================= diff --git a/hermes_cli/memory_setup.py b/hermes_cli/memory_setup.py index e6a61316a..88186b8ec 100644 --- a/hermes_cli/memory_setup.py +++ b/hermes_cli/memory_setup.py @@ -58,9 +58,11 @@ def _prompt(label: str, default: str | None = None, secret: bool = False) -> str def _install_dependencies(provider_name: str) -> None: """Install pip dependencies declared in plugin.yaml.""" import subprocess - from pathlib import Path as _Path + from plugins.memory import find_provider_dir - plugin_dir = _Path(__file__).parent.parent / "plugins" / "memory" / provider_name + plugin_dir = find_provider_dir(provider_name) + if not plugin_dir: + return yaml_path = plugin_dir / "plugin.yaml" if not yaml_path.exists(): return diff --git a/plugins/memory/__init__.py b/plugins/memory/__init__.py index cd583e6d8..0ae65a25d 100644 --- a/plugins/memory/__init__.py +++ b/plugins/memory/__init__.py @@ -1,18 +1,22 @@ """Memory provider plugin discovery. -Scans ``plugins/memory//`` directories for memory provider plugins. -Each subdirectory must contain ``__init__.py`` with a class implementing -the MemoryProvider ABC. +Scans two directories for memory provider plugins: -Memory providers are separate from the general plugin system — they live -in the repo and are always available without user installation. Only ONE -can be active at a time, selected via ``memory.provider`` in config.yaml. +1. Bundled providers: ``plugins/memory//`` (shipped with hermes-agent) +2. User-installed providers: ``$HERMES_HOME/plugins//`` + +Each subdirectory must contain ``__init__.py`` with a class implementing +the MemoryProvider ABC. On name collisions, bundled providers take +precedence. + +Only ONE provider can be active at a time, selected via +``memory.provider`` in config.yaml. Usage: from plugins.memory import discover_memory_providers, load_memory_provider available = discover_memory_providers() # [(name, desc, available), ...] - provider = load_memory_provider("openviking") # MemoryProvider instance + provider = load_memory_provider("mnemosyne") # MemoryProvider instance """ from __future__ import annotations @@ -29,24 +33,101 @@ logger = logging.getLogger(__name__) _MEMORY_PLUGINS_DIR = Path(__file__).parent +# --------------------------------------------------------------------------- +# Directory helpers +# --------------------------------------------------------------------------- + +def _get_user_plugins_dir() -> Optional[Path]: + """Return ``$HERMES_HOME/plugins/`` or None if unavailable.""" + try: + from hermes_constants import get_hermes_home + d = get_hermes_home() / "plugins" + return d if d.is_dir() else None + except Exception: + return None + + +def _is_memory_provider_dir(path: Path) -> bool: + """Heuristic: does *path* look like a memory provider plugin? + + Checks for ``register_memory_provider`` or ``MemoryProvider`` in the + ``__init__.py`` source. Cheap text scan — no import needed. + """ + init_file = path / "__init__.py" + if not init_file.exists(): + return False + try: + source = init_file.read_text(errors="replace")[:8192] + return "register_memory_provider" in source or "MemoryProvider" in source + except Exception: + return False + + +def _iter_provider_dirs() -> List[Tuple[str, Path]]: + """Yield ``(name, path)`` for all discovered provider directories. + + Scans bundled first, then user-installed. Bundled takes precedence + on name collisions (first-seen wins via ``seen`` set). + """ + seen: set = set() + dirs: List[Tuple[str, Path]] = [] + + # 1. Bundled providers (plugins/memory//) + if _MEMORY_PLUGINS_DIR.is_dir(): + for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()): + if not child.is_dir() or child.name.startswith(("_", ".")): + continue + if not (child / "__init__.py").exists(): + continue + seen.add(child.name) + dirs.append((child.name, child)) + + # 2. User-installed providers ($HERMES_HOME/plugins//) + user_dir = _get_user_plugins_dir() + if user_dir: + for child in sorted(user_dir.iterdir()): + if not child.is_dir() or child.name.startswith(("_", ".")): + continue + if child.name in seen: + continue # bundled takes precedence + if not _is_memory_provider_dir(child): + continue # skip non-memory plugins + dirs.append((child.name, child)) + + return dirs + + +def find_provider_dir(name: str) -> Optional[Path]: + """Resolve a provider name to its directory. + + Checks bundled first, then user-installed. + """ + # Bundled + bundled = _MEMORY_PLUGINS_DIR / name + if bundled.is_dir() and (bundled / "__init__.py").exists(): + return bundled + # User-installed + user_dir = _get_user_plugins_dir() + if user_dir: + user = user_dir / name + if user.is_dir() and _is_memory_provider_dir(user): + return user + return None + + +# --------------------------------------------------------------------------- +# Public API +# --------------------------------------------------------------------------- + def discover_memory_providers() -> List[Tuple[str, str, bool]]: - """Scan plugins/memory/ for available providers. + """Scan bundled and user-installed directories for available providers. Returns list of (name, description, is_available) tuples. - Does NOT import the providers — just reads plugin.yaml for metadata - and does a lightweight availability check. + Bundled providers take precedence on name collisions. """ results = [] - if not _MEMORY_PLUGINS_DIR.is_dir(): - return results - - for child in sorted(_MEMORY_PLUGINS_DIR.iterdir()): - if not child.is_dir() or child.name.startswith(("_", ".")): - continue - init_file = child / "__init__.py" - if not init_file.exists(): - continue + for name, child in _iter_provider_dirs(): # Read description from plugin.yaml if available desc = "" yaml_file = child / "plugin.yaml" @@ -70,7 +151,7 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]: except Exception: available = False - results.append((child.name, desc, available)) + results.append((name, desc, available)) return results @@ -78,11 +159,15 @@ def discover_memory_providers() -> List[Tuple[str, str, bool]]: def load_memory_provider(name: str) -> Optional["MemoryProvider"]: """Load and return a MemoryProvider instance by name. + Checks both bundled (``plugins/memory//``) and user-installed + (``$HERMES_HOME/plugins//``) directories. Bundled takes + precedence on name collisions. + Returns None if the provider is not found or fails to load. """ - provider_dir = _MEMORY_PLUGINS_DIR / name - if not provider_dir.is_dir(): - logger.debug("Memory provider '%s' not found in %s", name, _MEMORY_PLUGINS_DIR) + provider_dir = find_provider_dir(name) + if not provider_dir: + logger.debug("Memory provider '%s' not found in bundled or user plugins", name) return None try: @@ -104,7 +189,10 @@ def _load_provider_from_dir(provider_dir: Path) -> Optional["MemoryProvider"]: - A top-level class that extends MemoryProvider — we instantiate it """ name = provider_dir.name - module_name = f"plugins.memory.{name}" + # Use a separate namespace for user-installed plugins so they don't + # collide with bundled providers in sys.modules. + _is_bundled = _MEMORY_PLUGINS_DIR in provider_dir.parents or provider_dir.parent == _MEMORY_PLUGINS_DIR + module_name = f"plugins.memory.{name}" if _is_bundled else f"_hermes_user_memory.{name}" init_file = provider_dir / "__init__.py" if not init_file.exists(): @@ -257,15 +345,16 @@ def discover_plugin_cli_commands() -> List[dict]: return results # Only look at the active provider's directory - plugin_dir = _MEMORY_PLUGINS_DIR / active_provider - if not plugin_dir.is_dir(): + plugin_dir = find_provider_dir(active_provider) + if not plugin_dir: return results cli_file = plugin_dir / "cli.py" if not cli_file.exists(): return results - module_name = f"plugins.memory.{active_provider}.cli" + _is_bundled = _MEMORY_PLUGINS_DIR in plugin_dir.parents or plugin_dir.parent == _MEMORY_PLUGINS_DIR + module_name = f"plugins.memory.{active_provider}.cli" if _is_bundled else f"_hermes_user_memory.{active_provider}.cli" try: # Import the CLI module (lightweight — no SDK needed) if module_name in sys.modules: diff --git a/run_agent.py b/run_agent.py index bea01e670..bde3a680f 100644 --- a/run_agent.py +++ b/run_agent.py @@ -754,6 +754,7 @@ class AIAgent: self._interrupt_requested = False self._interrupt_message = None # Optional message that triggered interrupt self._execution_thread_id: int | None = None # Set at run_conversation() start + self._interrupt_thread_signal_pending = False self._client_lock = threading.RLock() # Subagent delegation state @@ -1223,14 +1224,27 @@ class AIAgent: logger.warning("Memory provider plugin init failed: %s", _mpe) self._memory_manager = None - # Inject memory provider tool schemas into the tool surface + # Inject memory provider tool schemas into the tool surface. + # Skip tools whose names already exist (plugins may register the + # same tools via ctx.register_tool(), which lands in self.tools + # through get_tool_definitions()). Duplicate function names cause + # 400 errors on providers that enforce unique names (e.g. Xiaomi + # MiMo via Nous Portal). if self._memory_manager and self.tools is not None: + _existing_tool_names = { + t.get("function", {}).get("name") + for t in self.tools + if isinstance(t, dict) + } for _schema in self._memory_manager.get_all_tool_schemas(): + _tname = _schema.get("name", "") + if _tname and _tname in _existing_tool_names: + continue # already registered via plugin path _wrapped = {"type": "function", "function": _schema} self.tools.append(_wrapped) - _tname = _schema.get("name", "") if _tname: self.valid_tool_names.add(_tname) + _existing_tool_names.add(_tname) # Skills config: nudge interval for skill creation reminders self._skill_nudge_interval = 10 @@ -2949,7 +2963,15 @@ class AIAgent: # Signal all tools to abort any in-flight operations immediately. # Scope the interrupt to this agent's execution thread so other # agents running in the same process (gateway) are not affected. - _set_interrupt(True, self._execution_thread_id) + if self._execution_thread_id is not None: + _set_interrupt(True, self._execution_thread_id) + self._interrupt_thread_signal_pending = False + else: + # The interrupt arrived before run_conversation() finished + # binding the agent to its execution thread. Defer the tool-level + # interrupt signal until startup completes instead of targeting + # the caller thread by mistake. + self._interrupt_thread_signal_pending = True # Propagate interrupt to any running child agents (subagent delegation) with self._active_children_lock: children_copy = list(self._active_children) @@ -2965,7 +2987,9 @@ class AIAgent: """Clear any pending interrupt request and the per-thread tool interrupt signal.""" self._interrupt_requested = False self._interrupt_message = None - _set_interrupt(False, self._execution_thread_id) + self._interrupt_thread_signal_pending = False + if self._execution_thread_id is not None: + _set_interrupt(False, self._execution_thread_id) def _touch_activity(self, desc: str) -> None: """Update the last-activity timestamp and description (thread-safe).""" @@ -5523,9 +5547,27 @@ class AIAgent: t = threading.Thread(target=_call, daemon=True) t.start() + _last_heartbeat = time.time() + _HEARTBEAT_INTERVAL = 30.0 # seconds between gateway activity touches while t.is_alive(): t.join(timeout=0.3) + # Periodic heartbeat: touch the agent's activity tracker so the + # gateway's inactivity monitor knows we're alive while waiting + # for stream chunks. Without this, long thinking pauses (e.g. + # reasoning models) or slow prefill on local providers (Ollama) + # trigger false inactivity timeouts. The _call thread touches + # activity on each chunk, but the gap between API call start + # and first chunk can exceed the gateway timeout — especially + # when the stale-stream timeout is disabled (local providers). + _hb_now = time.time() + if _hb_now - _last_heartbeat >= _HEARTBEAT_INTERVAL: + _last_heartbeat = _hb_now + _waiting_secs = int(_hb_now - last_chunk_time["t"]) + self._touch_activity( + f"waiting for stream response ({_waiting_secs}s, no chunks yet)" + ) + # Detect stale streams: connections kept alive by SSE pings # but delivering no real chunks. Kill the client so the # inner retry loop can start a fresh connection. @@ -7142,8 +7184,22 @@ class AIAgent: # Each slot holds (function_name, function_args, function_result, duration, error_flag) results = [None] * num_tools + # Touch activity before launching workers so the gateway knows + # we're executing tools (not stuck). + self._current_tool = tool_names_str + self._touch_activity(f"executing {num_tools} tools concurrently: {tool_names_str}") + def _run_tool(index, tool_call, function_name, function_args): """Worker function executed in a thread.""" + # Set the activity callback on THIS worker thread so + # _wait_for_process (terminal commands) can fire heartbeats. + # The callback is thread-local; the main thread's callback + # is invisible to worker threads. + try: + from tools.environments.base import set_activity_callback + set_activity_callback(self._touch_activity) + except Exception: + pass start = time.time() try: result = self._invoke_tool(function_name, function_args, effective_task_id, tool_call.id) @@ -7173,8 +7229,26 @@ class AIAgent: f = executor.submit(_run_tool, i, tc, name, args) futures.append(f) - # Wait for all to complete (exceptions are captured inside _run_tool) - concurrent.futures.wait(futures) + # Wait for all to complete with periodic heartbeats so the + # gateway's inactivity monitor doesn't kill us during long + # concurrent tool batches. + _conc_start = time.time() + while True: + done, not_done = concurrent.futures.wait( + futures, timeout=30.0, + ) + if not not_done: + break + _conc_elapsed = int(time.time() - _conc_start) + _still_running = [ + parsed_calls[futures.index(f)][1] + for f in not_done + if f in futures + ] + self._touch_activity( + f"concurrent tools running ({_conc_elapsed}s, " + f"{len(not_done)} remaining: {', '.join(_still_running[:3])})" + ) finally: if spinner: # Build a summary message for the spinner stop @@ -7406,6 +7480,16 @@ class AIAgent: old_text=function_args.get("old_text"), store=self._memory_store, ) + # Bridge: notify external memory provider of built-in memory writes + if self._memory_manager and function_args.get("action") in ("add", "replace"): + try: + self._memory_manager.on_memory_write( + function_args.get("action", ""), + target, + function_args.get("content", ""), + ) + except Exception: + pass tool_duration = time.time() - tool_start_time if self._should_emit_quiet_tool_messages(): self._vprint(f" {_get_cute_tool_message_impl('memory', function_args, tool_duration, result=function_result)}") @@ -7882,6 +7966,7 @@ class AIAgent: self._thinking_prefill_retries = 0 self._post_tool_empty_retried = False self._last_content_with_tools = None + self._last_content_tools_all_housekeeping = False self._mute_post_response = False self._unicode_sanitization_passes = 0 @@ -8069,6 +8154,7 @@ class AIAgent: self._empty_content_retries = 0 self._thinking_prefill_retries = 0 self._last_content_with_tools = None + self._last_content_tools_all_housekeeping = False self._mute_post_response = False # Re-estimate after compression _preflight_tokens = estimate_request_tokens_rough( @@ -8128,11 +8214,19 @@ class AIAgent: # Record the execution thread so interrupt()/clear_interrupt() can # scope the tool-level interrupt signal to THIS agent's thread only. - # Must be set before clear_interrupt() which uses it. + # Must be set before any thread-scoped interrupt syncing. self._execution_thread_id = threading.current_thread().ident - # Clear any stale interrupt state at start - self.clear_interrupt() + # Always clear stale per-thread state from a previous turn. If an + # interrupt arrived before startup finished, preserve it and bind it + # to this execution thread now instead of dropping it on the floor. + _set_interrupt(False, self._execution_thread_id) + if self._interrupt_requested: + _set_interrupt(True, self._execution_thread_id) + self._interrupt_thread_signal_pending = False + else: + self._interrupt_message = None + self._interrupt_thread_signal_pending = False # External memory provider: prefetch once before the tool loop. # Reuse the cached result on every iteration to avoid re-calling @@ -10131,6 +10225,7 @@ class AIAgent: tc.function.name in _HOUSEKEEPING_TOOLS for tc in assistant_message.tool_calls ) + self._last_content_tools_all_housekeeping = _all_housekeeping if _all_housekeeping and self._has_stream_consumers(): self._mute_post_response = True elif self.quiet_mode: @@ -10320,15 +10415,22 @@ class AIAgent: break # If the previous turn already delivered real content alongside - # tool calls (e.g. "You're welcome!" + memory save), the model - # has nothing more to say. Use the earlier content immediately - # instead of wasting API calls on retries that won't help. + # HOUSEKEEPING tool calls (e.g. "You're welcome!" + memory save), + # the model has nothing more to say. Use the earlier content + # immediately instead of wasting API calls on retries. + # NOTE: Only use this shortcut when ALL tools in that turn were + # housekeeping (memory, todo, etc.). When substantive tools + # were called (terminal, search_files, etc.), the content was + # likely mid-task narration ("I'll scan the directory...") and + # the empty follow-up means the model choked — let the + # post-tool nudge below handle that instead of exiting early. fallback = getattr(self, '_last_content_with_tools', None) - if fallback: + if fallback and getattr(self, '_last_content_tools_all_housekeeping', False): _turn_exit_reason = "fallback_prior_turn_content" logger.info("Empty follow-up after tool calls — using prior turn content as final response") self._emit_status("↻ Empty response after tool calls — using earlier content as final answer") self._last_content_with_tools = None + self._last_content_tools_all_housekeeping = False self._empty_content_retries = 0 # Do NOT modify the assistant message content — the # old code injected "Calling the X tools..." which @@ -10339,13 +10441,18 @@ class AIAgent: break # ── Post-tool-call empty response nudge ─────────── - # The model returned empty after executing tool calls - # but there's no prior-turn content to fall back on. + # The model returned empty after executing tool calls. + # This covers two cases: + # (a) No prior-turn content at all — model went silent + # (b) Prior turn had content + SUBSTANTIVE tools (the + # fallback above was skipped because the content + # was mid-task narration, not a final answer) # Instead of giving up, nudge the model to continue by # appending a user-level hint. This is the #9400 case: - # weaker models (GLM-5, etc.) sometimes return empty - # after tool results instead of continuing to the next - # step. One retry with a nudge usually fixes it. + # weaker models (mimo-v2-pro, GLM-5, etc.) sometimes + # return empty after tool results instead of continuing + # to the next step. One retry with a nudge usually + # fixes it. _prior_was_tool = any( m.get("role") == "tool" for m in messages[-5:] # check recent messages @@ -10355,6 +10462,10 @@ class AIAgent: and not getattr(self, "_post_tool_empty_retried", False) ): self._post_tool_empty_retried = True + # Clear stale narration so it doesn't resurface + # on a later empty response after the nudge. + self._last_content_with_tools = None + self._last_content_tools_all_housekeeping = False logger.info( "Empty response after tool calls — nudging model " "to continue processing" diff --git a/tests/agent/test_memory_provider.py b/tests/agent/test_memory_provider.py index 505f40bd5..db2a70c2f 100644 --- a/tests/agent/test_memory_provider.py +++ b/tests/agent/test_memory_provider.py @@ -396,6 +396,108 @@ class TestPluginMemoryDiscovery: assert load_memory_provider("nonexistent_provider") is None +class TestUserInstalledProviderDiscovery: + """Memory providers installed to $HERMES_HOME/plugins/ should be found. + + Regression test for issues #4956 and #9099: load_memory_provider() and + discover_memory_providers() only scanned the bundled plugins/memory/ + directory, ignoring user-installed plugins. + """ + + def _make_user_memory_plugin(self, tmp_path, name="myprovider"): + """Create a minimal user memory provider plugin.""" + plugin_dir = tmp_path / "plugins" / name + plugin_dir.mkdir(parents=True) + (plugin_dir / "__init__.py").write_text( + "from agent.memory_provider import MemoryProvider\n" + "class MyProvider(MemoryProvider):\n" + f" @property\n" + f" def name(self): return {name!r}\n" + " def is_available(self): return True\n" + " def initialize(self, **kw): pass\n" + " def sync_turn(self, *a, **kw): pass\n" + " def get_tool_schemas(self): return []\n" + " def handle_tool_call(self, *a, **kw): return '{}'\n" + ) + (plugin_dir / "plugin.yaml").write_text( + f"name: {name}\ndescription: Test user provider\n" + ) + return plugin_dir + + def test_discover_finds_user_plugins(self, tmp_path, monkeypatch): + """discover_memory_providers() includes user-installed plugins.""" + from plugins.memory import discover_memory_providers, _get_user_plugins_dir + self._make_user_memory_plugin(tmp_path, "myexternal") + monkeypatch.setattr( + "plugins.memory._get_user_plugins_dir", + lambda: tmp_path / "plugins", + ) + providers = discover_memory_providers() + names = [n for n, _, _ in providers] + assert "myexternal" in names + assert "holographic" in names # bundled still found + + def test_load_user_plugin(self, tmp_path, monkeypatch): + """load_memory_provider() can load from $HERMES_HOME/plugins/.""" + from plugins.memory import load_memory_provider + self._make_user_memory_plugin(tmp_path, "myexternal") + monkeypatch.setattr( + "plugins.memory._get_user_plugins_dir", + lambda: tmp_path / "plugins", + ) + p = load_memory_provider("myexternal") + assert p is not None + assert p.name == "myexternal" + assert p.is_available() + + def test_bundled_takes_precedence(self, tmp_path, monkeypatch): + """Bundled provider wins when user plugin has the same name.""" + from plugins.memory import load_memory_provider, discover_memory_providers + # Create user plugin named "holographic" (same as bundled) + plugin_dir = tmp_path / "plugins" / "holographic" + plugin_dir.mkdir(parents=True) + (plugin_dir / "__init__.py").write_text( + "from agent.memory_provider import MemoryProvider\n" + "class Fake(MemoryProvider):\n" + " @property\n" + " def name(self): return 'holographic-FAKE'\n" + " def is_available(self): return True\n" + " def initialize(self, **kw): pass\n" + " def sync_turn(self, *a, **kw): pass\n" + " def get_tool_schemas(self): return []\n" + " def handle_tool_call(self, *a, **kw): return '{}'\n" + ) + monkeypatch.setattr( + "plugins.memory._get_user_plugins_dir", + lambda: tmp_path / "plugins", + ) + # Load should return bundled (name "holographic"), not user (name "holographic-FAKE") + p = load_memory_provider("holographic") + assert p is not None + assert p.name == "holographic" # bundled wins + + # discover should not duplicate + providers = discover_memory_providers() + holo_count = sum(1 for n, _, _ in providers if n == "holographic") + assert holo_count == 1 + + def test_non_memory_user_plugins_excluded(self, tmp_path, monkeypatch): + """User plugins that don't reference MemoryProvider are skipped.""" + from plugins.memory import discover_memory_providers + plugin_dir = tmp_path / "plugins" / "notmemory" + plugin_dir.mkdir(parents=True) + (plugin_dir / "__init__.py").write_text( + "def register(ctx):\n ctx.register_tool('foo', 'bar', {}, lambda: None)\n" + ) + monkeypatch.setattr( + "plugins.memory._get_user_plugins_dir", + lambda: tmp_path / "plugins", + ) + providers = discover_memory_providers() + names = [n for n, _, _ in providers] + assert "notmemory" not in names + + # --------------------------------------------------------------------------- # Sequential dispatch routing tests # --------------------------------------------------------------------------- @@ -736,3 +838,104 @@ class TestCommitMemorySessionRouting: mgr.add_provider(bad) mgr.on_session_end([]) # must not raise + + +# --------------------------------------------------------------------------- +# on_memory_write bridge — must fire from both concurrent AND sequential paths +# --------------------------------------------------------------------------- + + +class TestOnMemoryWriteBridge: + """Verify that MemoryManager.on_memory_write is called when built-in + memory writes happen. This is a regression test for #10174 where the + sequential tool execution path (_execute_tool_calls_sequential) was + missing the bridge call, so single memory tool calls never notified + external memory providers. + """ + + def test_on_memory_write_add(self): + """on_memory_write fires for 'add' actions.""" + mgr = MemoryManager() + p = FakeMemoryProvider("ext") + mgr.add_provider(p) + + mgr.on_memory_write("add", "memory", "new fact") + assert p.memory_writes == [("add", "memory", "new fact")] + + def test_on_memory_write_replace(self): + """on_memory_write fires for 'replace' actions.""" + mgr = MemoryManager() + p = FakeMemoryProvider("ext") + mgr.add_provider(p) + + mgr.on_memory_write("replace", "user", "updated pref") + assert p.memory_writes == [("replace", "user", "updated pref")] + + def test_on_memory_write_remove_not_bridged(self): + """The bridge intentionally skips 'remove' — only add/replace notify.""" + # This tests the contract that run_agent.py checks: + # function_args.get("action") in ("add", "replace") + mgr = MemoryManager() + p = FakeMemoryProvider("ext") + mgr.add_provider(p) + + # Manager itself doesn't filter — run_agent.py does. + # But providers should handle remove gracefully. + mgr.on_memory_write("remove", "memory", "old fact") + assert p.memory_writes == [("remove", "memory", "old fact")] + + def test_memory_manager_tool_injection_deduplicates(self): + """Memory manager tools already in self.tools (from plugin registry) + must not be appended again. Duplicate function names cause 400 errors + on providers that enforce unique names (e.g. Xiaomi MiMo via Nous Portal). + + Regression test for: duplicate mnemosyne_recall / mnemosyne_remember / + mnemosyne_stats in tools array → 400 from Nous Portal. + """ + mgr = MemoryManager() + p = FakeMemoryProvider("ext", tools=[ + {"name": "ext_recall", "description": "Recall", "parameters": {}}, + {"name": "ext_remember", "description": "Remember", "parameters": {}}, + ]) + mgr.add_provider(p) + + # Simulate self.tools already containing one of the plugin tools + # (as if it was registered via ctx.register_tool → get_tool_definitions) + existing_tools = [ + {"type": "function", "function": {"name": "ext_recall", "description": "Recall (from registry)", "parameters": {}}}, + {"type": "function", "function": {"name": "web_search", "description": "Search", "parameters": {}}}, + ] + + # Apply the same dedup logic from run_agent.py __init__ + _existing_names = { + t.get("function", {}).get("name") + for t in existing_tools + if isinstance(t, dict) + } + for _schema in mgr.get_all_tool_schemas(): + _tname = _schema.get("name", "") + if _tname and _tname in _existing_names: + continue + existing_tools.append({"type": "function", "function": _schema}) + if _tname: + _existing_names.add(_tname) + + # ext_recall should NOT be duplicated; ext_remember should be added + tool_names = [t["function"]["name"] for t in existing_tools] + assert tool_names.count("ext_recall") == 1, f"ext_recall duplicated: {tool_names}" + assert tool_names.count("ext_remember") == 1 + assert tool_names.count("web_search") == 1 + assert len(existing_tools) == 3 # web_search + ext_recall + ext_remember + + def test_on_memory_write_tolerates_provider_failure(self): + """If a provider's on_memory_write raises, others still get notified.""" + mgr = MemoryManager() + bad = FakeMemoryProvider("builtin") + bad.on_memory_write = MagicMock(side_effect=RuntimeError("boom")) + good = FakeMemoryProvider("good") + mgr.add_provider(bad) + mgr.add_provider(good) + + mgr.on_memory_write("add", "user", "test") + # Good provider still received the call despite bad provider crashing + assert good.memory_writes == [("add", "user", "test")] diff --git a/tests/gateway/test_discord_slash_commands.py b/tests/gateway/test_discord_slash_commands.py index c1c3c1df1..c2f2866eb 100644 --- a/tests/gateway/test_discord_slash_commands.py +++ b/tests/gateway/test_discord_slash_commands.py @@ -134,6 +134,57 @@ async def test_registers_native_restart_slash_command(adapter): ) +# ------------------------------------------------------------------ +# Auto-registration from COMMAND_REGISTRY +# ------------------------------------------------------------------ + + +@pytest.mark.asyncio +async def test_auto_registers_missing_gateway_commands(adapter): + """Commands in COMMAND_REGISTRY that aren't explicitly registered should + be auto-registered by the dynamic catch-all block.""" + adapter._run_simple_slash = AsyncMock() + adapter._register_slash_commands() + + tree_names = set(adapter._client.tree.commands.keys()) + + # These commands are gateway-available but were not in the original + # hardcoded registration list — they should be auto-registered. + expected_auto = {"debug", "yolo", "reload", "profile"} + for name in expected_auto: + assert name in tree_names, f"/{name} should be auto-registered on Discord" + + +@pytest.mark.asyncio +async def test_auto_registered_command_dispatches_correctly(adapter): + """Auto-registered commands should dispatch via _run_simple_slash.""" + adapter._run_simple_slash = AsyncMock() + adapter._register_slash_commands() + + # /debug has no args — test parameterless dispatch + debug_cmd = adapter._client.tree.commands["debug"] + interaction = SimpleNamespace() + adapter._run_simple_slash.reset_mock() + await debug_cmd.callback(interaction) + adapter._run_simple_slash.assert_awaited_once_with(interaction, "/debug") + + +@pytest.mark.asyncio +async def test_auto_registered_command_with_args(adapter): + """Auto-registered commands with args_hint should accept an optional args param.""" + adapter._run_simple_slash = AsyncMock() + adapter._register_slash_commands() + + # /branch has args_hint="[name]" — test dispatch with args + branch_cmd = adapter._client.tree.commands["branch"] + interaction = SimpleNamespace() + adapter._run_simple_slash.reset_mock() + await branch_cmd.callback(interaction, args="my-branch") + adapter._run_simple_slash.assert_awaited_once_with( + interaction, "/branch my-branch" + ) + + # ------------------------------------------------------------------ # _handle_thread_create_slash — success, session dispatch, failure # ------------------------------------------------------------------ diff --git a/tests/gateway/test_duplicate_reply_suppression.py b/tests/gateway/test_duplicate_reply_suppression.py index 5a0ea02f3..d8298db83 100644 --- a/tests/gateway/test_duplicate_reply_suppression.py +++ b/tests/gateway/test_duplicate_reply_suppression.py @@ -232,9 +232,72 @@ class TestAlreadySentWithoutResponsePreviewed: # =================================================================== -# Test 3: run.py queued-message path — _already_streamed detection +# Test 2b: run.py — empty response never suppressed (#10xxx) # =================================================================== +class TestEmptyResponseNotSuppressed: + """When the model returns '(empty)' after tool calls (e.g. mimo-v2-pro + going silent after web_search), the gateway must NOT suppress delivery + even if the stream consumer sent intermediate text earlier. + + Without this fix, the user sees partial streaming text ('Let me search + for that') and then silence — the '(empty)' sentinel is swallowed by + already_sent=True.""" + + def _make_mock_stream_consumer(self, already_sent=False, final_response_sent=False): + return SimpleNamespace( + already_sent=already_sent, + final_response_sent=final_response_sent, + ) + + def _apply_suppression_logic(self, response, sc): + """Reproduce the fixed logic from gateway/run.py return path.""" + if sc and isinstance(response, dict) and not response.get("failed"): + _final = response.get("final_response") or "" + _is_empty_sentinel = not _final or _final == "(empty)" + if not _is_empty_sentinel and ( + getattr(sc, "final_response_sent", False) + or getattr(sc, "already_sent", False) + ): + response["already_sent"] = True + + def test_empty_sentinel_not_suppressed_with_already_sent(self): + """'(empty)' final_response should NOT be suppressed even when + streaming sent intermediate content.""" + sc = self._make_mock_stream_consumer(already_sent=True, final_response_sent=True) + response = {"final_response": "(empty)"} + self._apply_suppression_logic(response, sc) + assert "already_sent" not in response + + def test_empty_string_not_suppressed_with_already_sent(self): + """Empty string final_response should NOT be suppressed.""" + sc = self._make_mock_stream_consumer(already_sent=True, final_response_sent=True) + response = {"final_response": ""} + self._apply_suppression_logic(response, sc) + assert "already_sent" not in response + + def test_none_response_not_suppressed_with_already_sent(self): + """None final_response should NOT be suppressed.""" + sc = self._make_mock_stream_consumer(already_sent=True, final_response_sent=True) + response = {"final_response": None} + self._apply_suppression_logic(response, sc) + assert "already_sent" not in response + + def test_real_response_still_suppressed_with_already_sent(self): + """Normal non-empty response should still be suppressed when + streaming delivered content.""" + sc = self._make_mock_stream_consumer(already_sent=True, final_response_sent=False) + response = {"final_response": "Here are the search results..."} + self._apply_suppression_logic(response, sc) + assert response.get("already_sent") is True + + def test_failed_empty_response_never_suppressed(self): + """Failed responses are never suppressed regardless of content.""" + sc = self._make_mock_stream_consumer(already_sent=True, final_response_sent=True) + response = {"final_response": "(empty)", "failed": True} + self._apply_suppression_logic(response, sc) + assert "already_sent" not in response + class TestQueuedMessageAlreadyStreamed: """The queued-message path should detect that the first response was already streamed (already_sent=True) even without response_previewed.""" diff --git a/tests/gateway/test_message_deduplicator.py b/tests/gateway/test_message_deduplicator.py new file mode 100644 index 000000000..59fe7e394 --- /dev/null +++ b/tests/gateway/test_message_deduplicator.py @@ -0,0 +1,89 @@ +"""Tests for MessageDeduplicator TTL enforcement (#10306). + +Previously, is_duplicate() returned True for any previously seen ID without +checking its age — expired entries were only purged when cache size exceeded +max_size. Normal workloads never overflowed, so messages stayed "duplicate" +forever. + +The fix checks TTL at query time: if the entry's timestamp plus TTL is in +the past, the entry is treated as expired and the message is allowed through. +""" + +import time +from unittest.mock import patch + +from gateway.platforms.helpers import MessageDeduplicator + + +class TestMessageDeduplicatorTTL: + """TTL-based expiration must work regardless of cache size.""" + + def test_duplicate_within_ttl(self): + """Same message within TTL window is duplicate.""" + dedup = MessageDeduplicator(ttl_seconds=60) + assert dedup.is_duplicate("msg-1") is False + assert dedup.is_duplicate("msg-1") is True + + def test_not_duplicate_after_ttl_expires(self): + """Same message AFTER TTL expires should NOT be duplicate.""" + dedup = MessageDeduplicator(ttl_seconds=5) + assert dedup.is_duplicate("msg-1") is False + + # Fast-forward time past TTL + dedup._seen["msg-1"] = time.time() - 10 # 10s ago, TTL is 5s + assert dedup.is_duplicate("msg-1") is False, \ + "Expired entry should not be treated as duplicate" + + def test_expired_entry_gets_refreshed(self): + """After an expired entry is allowed through, it should be re-tracked.""" + dedup = MessageDeduplicator(ttl_seconds=5) + assert dedup.is_duplicate("msg-1") is False + + # Expire the entry + dedup._seen["msg-1"] = time.time() - 10 + + # Should be allowed through (expired) + assert dedup.is_duplicate("msg-1") is False + # Now should be duplicate again (freshly tracked) + assert dedup.is_duplicate("msg-1") is True + + def test_different_messages_not_confused(self): + """Different message IDs are independent.""" + dedup = MessageDeduplicator(ttl_seconds=60) + assert dedup.is_duplicate("msg-1") is False + assert dedup.is_duplicate("msg-2") is False + assert dedup.is_duplicate("msg-1") is True + assert dedup.is_duplicate("msg-2") is True + + def test_empty_id_never_duplicate(self): + """Empty/None message IDs are never treated as duplicate.""" + dedup = MessageDeduplicator(ttl_seconds=60) + assert dedup.is_duplicate("") is False + assert dedup.is_duplicate("") is False + + def test_max_size_eviction_prunes_expired(self): + """Cache pruning on overflow removes expired entries.""" + dedup = MessageDeduplicator(max_size=5, ttl_seconds=60) + # Add 6 entries, with the first 3 expired + now = time.time() + for i in range(3): + dedup._seen[f"old-{i}"] = now - 120 # expired (2 min ago, TTL 60s) + for i in range(3): + dedup.is_duplicate(f"new-{i}") + # Now we have 6 entries. Next insert triggers pruning. + dedup.is_duplicate("trigger") + # The 3 expired entries should be gone, leaving 4 fresh ones + assert len(dedup._seen) == 4 + assert "old-0" not in dedup._seen + assert "new-0" in dedup._seen + + def test_ttl_zero_means_no_dedup(self): + """With TTL=0, all entries expire immediately.""" + dedup = MessageDeduplicator(ttl_seconds=0) + assert dedup.is_duplicate("msg-1") is False + # Entry was just added at time.time(), and TTL is 0, + # so now - seen_time >= 0 = ttl, meaning it's expired + # But time.time() might be the exact same float, so + # the check is `now - ts < ttl` which is `0 < 0` = False + # This means TTL=0 effectively disables dedup + assert dedup.is_duplicate("msg-1") is False diff --git a/tests/gateway/test_session_env.py b/tests/gateway/test_session_env.py index 5a643a1ef..85899e2fd 100644 --- a/tests/gateway/test_session_env.py +++ b/tests/gateway/test_session_env.py @@ -1,6 +1,8 @@ import asyncio import os +import pytest + from gateway.config import Platform from gateway.run import GatewayRunner from gateway.session import SessionContext, SessionSource @@ -8,9 +10,26 @@ from gateway.session_context import ( get_session_env, set_session_vars, clear_session_vars, + _VAR_MAP, + _UNSET, ) +@pytest.fixture(autouse=True) +def _reset_contextvars(): + """Reset all session contextvars to _UNSET between tests. + + In production each asyncio.Task gets a fresh context copy where the + defaults are _UNSET. In tests all functions share the same thread + context, so a clear_session_vars() from test A (which sets vars to "") + would leak into test B. This fixture ensures each test starts clean. + """ + yield + for var in _VAR_MAP.values(): + # Can't use var.reset() without a token; just set back to sentinel. + var.set(_UNSET) + + def test_set_session_env_sets_contextvars(monkeypatch): """_set_session_env should populate contextvars, not os.environ.""" runner = object.__new__(GatewayRunner) @@ -98,9 +117,11 @@ def test_get_session_env_falls_back_to_os_environ(monkeypatch): tokens = set_session_vars(platform="telegram") assert get_session_env("HERMES_SESSION_PLATFORM") == "telegram" - # Restore — should fall back to os.environ again + # After clear — should return "" (explicitly cleared), NOT fall back + # to os.environ. This is the fix for #10304: stale os.environ values + # must not leak through after a gateway session is cleaned up. clear_session_vars(tokens) - assert get_session_env("HERMES_SESSION_PLATFORM") == "discord" + assert get_session_env("HERMES_SESSION_PLATFORM") == "" def test_get_session_env_default_when_nothing_set(monkeypatch): @@ -164,9 +185,9 @@ def test_session_key_falls_back_to_os_environ(monkeypatch): tokens = set_session_vars(session_key="ctx-session-456") assert get_session_env("HERMES_SESSION_KEY") == "ctx-session-456" - # Restore — should fall back to os.environ + # After clear — should return "" (explicitly cleared), not os.environ (#10304) clear_session_vars(tokens) - assert get_session_env("HERMES_SESSION_KEY") == "env-session-123" + assert get_session_env("HERMES_SESSION_KEY") == "" def test_set_session_env_includes_session_key(): diff --git a/tests/hermes_cli/test_debug.py b/tests/hermes_cli/test_debug.py index f733c8ab6..864a64160 100644 --- a/tests/hermes_cli/test_debug.py +++ b/tests/hermes_cli/test_debug.py @@ -428,7 +428,9 @@ class TestRunDebug: run_debug(args) out = capsys.readouterr().out - assert "hermes debug share" in out + assert "hermes debug" in out + assert "share" in out + assert "delete" in out def test_share_subcommand_routes(self, hermes_home): from hermes_cli.debug import run_debug @@ -459,3 +461,187 @@ class TestArgparseIntegration: args = MagicMock() args.debug_command = None cmd_debug(args) + + +# --------------------------------------------------------------------------- +# Delete / auto-delete +# --------------------------------------------------------------------------- + +class TestExtractPasteId: + def test_paste_rs_url(self): + from hermes_cli.debug import _extract_paste_id + assert _extract_paste_id("https://paste.rs/abc123") == "abc123" + + def test_paste_rs_trailing_slash(self): + from hermes_cli.debug import _extract_paste_id + assert _extract_paste_id("https://paste.rs/abc123/") == "abc123" + + def test_http_variant(self): + from hermes_cli.debug import _extract_paste_id + assert _extract_paste_id("http://paste.rs/xyz") == "xyz" + + def test_non_paste_rs_returns_none(self): + from hermes_cli.debug import _extract_paste_id + assert _extract_paste_id("https://dpaste.com/ABCDEF") is None + + def test_empty_returns_none(self): + from hermes_cli.debug import _extract_paste_id + assert _extract_paste_id("") is None + + +class TestDeletePaste: + def test_delete_sends_delete_request(self): + from hermes_cli.debug import delete_paste + + mock_resp = MagicMock() + mock_resp.status = 200 + mock_resp.__enter__ = lambda s: s + mock_resp.__exit__ = MagicMock(return_value=False) + + with patch("hermes_cli.debug.urllib.request.urlopen", + return_value=mock_resp) as mock_open: + result = delete_paste("https://paste.rs/abc123") + + assert result is True + req = mock_open.call_args[0][0] + assert req.method == "DELETE" + assert "paste.rs/abc123" in req.full_url + + def test_delete_rejects_non_paste_rs(self): + from hermes_cli.debug import delete_paste + + with pytest.raises(ValueError, match="only paste.rs"): + delete_paste("https://dpaste.com/something") + + +class TestScheduleAutoDelete: + def test_spawns_detached_process(self): + from hermes_cli.debug import _schedule_auto_delete + + with patch("subprocess.Popen") as mock_popen: + _schedule_auto_delete( + ["https://paste.rs/abc", "https://paste.rs/def"], + delay_seconds=10, + ) + + mock_popen.assert_called_once() + call_args = mock_popen.call_args + # Verify detached + assert call_args[1]["start_new_session"] is True + # Verify the script references both URLs + script = call_args[0][0][2] # [python, -c, script] + assert "paste.rs/abc" in script + assert "paste.rs/def" in script + assert "time.sleep(10)" in script + + def test_skips_non_paste_rs_urls(self): + from hermes_cli.debug import _schedule_auto_delete + + with patch("subprocess.Popen") as mock_popen: + _schedule_auto_delete(["https://dpaste.com/something"]) + + mock_popen.assert_not_called() + + def test_handles_popen_failure_gracefully(self): + from hermes_cli.debug import _schedule_auto_delete + + with patch("subprocess.Popen", + side_effect=OSError("no such file")): + # Should not raise + _schedule_auto_delete(["https://paste.rs/abc"]) + + +class TestRunDebugDelete: + def test_deletes_valid_url(self, capsys): + from hermes_cli.debug import run_debug_delete + + args = MagicMock() + args.urls = ["https://paste.rs/abc"] + + with patch("hermes_cli.debug.delete_paste", return_value=True): + run_debug_delete(args) + + out = capsys.readouterr().out + assert "Deleted" in out + assert "paste.rs/abc" in out + + def test_handles_delete_failure(self, capsys): + from hermes_cli.debug import run_debug_delete + + args = MagicMock() + args.urls = ["https://paste.rs/abc"] + + with patch("hermes_cli.debug.delete_paste", + side_effect=Exception("network error")): + run_debug_delete(args) + + out = capsys.readouterr().out + assert "Could not delete" in out + + def test_no_urls_shows_usage(self, capsys): + from hermes_cli.debug import run_debug_delete + + args = MagicMock() + args.urls = [] + + run_debug_delete(args) + + out = capsys.readouterr().out + assert "Usage" in out + + +class TestShareIncludesAutoDelete: + """Verify that run_debug_share schedules auto-deletion and prints TTL.""" + + def test_share_schedules_auto_delete(self, hermes_home, capsys): + from hermes_cli.debug import run_debug_share + + args = MagicMock() + args.lines = 50 + args.expire = 7 + args.local = False + + with patch("hermes_cli.dump.run_dump"), \ + patch("hermes_cli.debug.upload_to_pastebin", + return_value="https://paste.rs/test1"), \ + patch("hermes_cli.debug._schedule_auto_delete") as mock_sched: + run_debug_share(args) + + # auto-delete was scheduled with the uploaded URLs + mock_sched.assert_called_once() + urls_arg = mock_sched.call_args[0][0] + assert "https://paste.rs/test1" in urls_arg + + out = capsys.readouterr().out + assert "auto-delete" in out + + def test_share_shows_privacy_notice(self, hermes_home, capsys): + from hermes_cli.debug import run_debug_share + + args = MagicMock() + args.lines = 50 + args.expire = 7 + args.local = False + + with patch("hermes_cli.dump.run_dump"), \ + patch("hermes_cli.debug.upload_to_pastebin", + return_value="https://paste.rs/test"), \ + patch("hermes_cli.debug._schedule_auto_delete"): + run_debug_share(args) + + out = capsys.readouterr().out + assert "public paste service" in out + + def test_local_no_privacy_notice(self, hermes_home, capsys): + from hermes_cli.debug import run_debug_share + + args = MagicMock() + args.lines = 50 + args.expire = 7 + args.local = True + + with patch("hermes_cli.dump.run_dump"): + run_debug_share(args) + + out = capsys.readouterr().out + assert "public paste service" not in out diff --git a/tests/run_agent/test_async_httpx_del_neuter.py b/tests/run_agent/test_async_httpx_del_neuter.py index ce8e20e70..960df7084 100644 --- a/tests/run_agent/test_async_httpx_del_neuter.py +++ b/tests/run_agent/test_async_httpx_del_neuter.py @@ -103,7 +103,7 @@ class TestCleanupStaleAsyncClients: mock_client._client = MagicMock() mock_client._client.is_closed = False - key = ("test_stale", True, "", "", id(loop)) + key = ("test_stale", True, "", "", "", ()) with _client_cache_lock: _client_cache[key] = (mock_client, "test-model", loop) @@ -127,7 +127,7 @@ class TestCleanupStaleAsyncClients: loop = asyncio.new_event_loop() # NOT closed mock_client = MagicMock() - key = ("test_live", True, "", "", id(loop)) + key = ("test_live", True, "", "", "", ()) with _client_cache_lock: _client_cache[key] = (mock_client, "test-model", loop) @@ -149,7 +149,7 @@ class TestCleanupStaleAsyncClients: ) mock_client = MagicMock() - key = ("test_sync", False, "", "", 0) + key = ("test_sync", False, "", "", "", ()) with _client_cache_lock: _client_cache[key] = (mock_client, "test-model", None) @@ -160,3 +160,131 @@ class TestCleanupStaleAsyncClients: finally: with _client_cache_lock: _client_cache.pop(key, None) + + +# --------------------------------------------------------------------------- +# Cache bounded growth (#10200) +# --------------------------------------------------------------------------- + +class TestClientCacheBoundedGrowth: + """Verify the cache stays bounded when loops change (fix for #10200). + + Previously, loop_id was part of the cache key, so every new event loop + created a new entry for the same provider config. Now loop identity is + validated at hit time and stale entries are replaced in-place. + """ + + def test_same_key_replaces_stale_loop_entry(self): + """When the loop changes, the old entry should be replaced, not duplicated.""" + from agent.auxiliary_client import ( + _client_cache, + _client_cache_lock, + _get_cached_client, + ) + + key = ("test_replace", True, "", "", "", ()) + + # Simulate a stale entry from a closed loop + old_loop = asyncio.new_event_loop() + old_loop.close() + old_client = MagicMock() + old_client._client = MagicMock() + old_client._client.is_closed = False + + with _client_cache_lock: + _client_cache[key] = (old_client, "old-model", old_loop) + + try: + # Now call _get_cached_client — should detect stale loop and evict + with patch("agent.auxiliary_client.resolve_provider_client") as mock_resolve: + mock_resolve.return_value = (MagicMock(), "new-model") + client, model = _get_cached_client( + "test_replace", async_mode=True, + ) + # The old entry should have been replaced + with _client_cache_lock: + assert key in _client_cache, "Key should still exist (replaced)" + entry = _client_cache[key] + assert entry[1] == "new-model", "Should have the new model" + finally: + with _client_cache_lock: + _client_cache.pop(key, None) + + def test_different_loops_do_not_grow_cache(self): + """Multiple event loops for the same provider should NOT create multiple entries.""" + from agent.auxiliary_client import ( + _client_cache, + _client_cache_lock, + ) + + key = ("test_no_grow", True, "", "", "", ()) + + loops = [] + try: + for i in range(5): + loop = asyncio.new_event_loop() + loops.append(loop) + mock_client = MagicMock() + mock_client._client = MagicMock() + mock_client._client.is_closed = False + + # Close previous loop entries (simulating worker thread recycling) + if i > 0: + loops[i - 1].close() + + with _client_cache_lock: + # Simulate what _get_cached_client does: replace on loop mismatch + if key in _client_cache: + old_entry = _client_cache[key] + del _client_cache[key] + _client_cache[key] = (mock_client, f"model-{i}", loop) + + # Only one entry should exist for this key + with _client_cache_lock: + count = sum(1 for k in _client_cache if k == key) + assert count == 1, f"Expected 1 entry, got {count}" + finally: + for loop in loops: + if not loop.is_closed(): + loop.close() + with _client_cache_lock: + _client_cache.pop(key, None) + + def test_max_cache_size_eviction(self): + """Cache should not exceed _CLIENT_CACHE_MAX_SIZE.""" + from agent.auxiliary_client import ( + _client_cache, + _client_cache_lock, + _CLIENT_CACHE_MAX_SIZE, + ) + + # Save existing cache state + with _client_cache_lock: + saved = dict(_client_cache) + _client_cache.clear() + + try: + # Fill to max + 5 + for i in range(_CLIENT_CACHE_MAX_SIZE + 5): + mock_client = MagicMock() + mock_client._client = MagicMock() + mock_client._client.is_closed = False + key = (f"evict_test_{i}", False, "", "", "", ()) + with _client_cache_lock: + # Inline the eviction logic (same as _get_cached_client) + while len(_client_cache) >= _CLIENT_CACHE_MAX_SIZE: + evict_key = next(iter(_client_cache)) + del _client_cache[evict_key] + _client_cache[key] = (mock_client, f"model-{i}", None) + + with _client_cache_lock: + assert len(_client_cache) <= _CLIENT_CACHE_MAX_SIZE, \ + f"Cache size {len(_client_cache)} exceeds max {_CLIENT_CACHE_MAX_SIZE}" + # The earliest entries should have been evicted + assert ("evict_test_0", False, "", "", "", ()) not in _client_cache + # The latest entries should be present + assert (f"evict_test_{_CLIENT_CACHE_MAX_SIZE + 4}", False, "", "", "", ()) in _client_cache + finally: + with _client_cache_lock: + _client_cache.clear() + _client_cache.update(saved) diff --git a/tests/run_agent/test_interrupt_propagation.py b/tests/run_agent/test_interrupt_propagation.py index a746efdac..ed1f21bfa 100644 --- a/tests/run_agent/test_interrupt_propagation.py +++ b/tests/run_agent/test_interrupt_propagation.py @@ -28,7 +28,8 @@ class TestInterruptPropagationToChild(unittest.TestCase): agent = AIAgent.__new__(AIAgent) agent._interrupt_requested = False agent._interrupt_message = None - agent._execution_thread_id = None # defaults to current thread in set_interrupt + agent._execution_thread_id = None + agent._interrupt_thread_signal_pending = False agent._active_children = [] agent._active_children_lock = threading.Lock() agent.quiet_mode = True @@ -46,15 +47,17 @@ class TestInterruptPropagationToChild(unittest.TestCase): assert parent._interrupt_requested is True assert child._interrupt_requested is True assert child._interrupt_message == "new user message" - assert is_interrupted() is True + assert is_interrupted() is False + assert parent._interrupt_thread_signal_pending is True def test_child_clear_interrupt_at_start_clears_thread(self): """child.clear_interrupt() at start of run_conversation clears the - per-thread interrupt flag for the current thread. + bound execution thread's interrupt flag. """ child = self._make_bare_agent() child._interrupt_requested = True child._interrupt_message = "msg" + child._execution_thread_id = threading.current_thread().ident # Interrupt for current thread is set set_interrupt(True) @@ -128,6 +131,36 @@ class TestInterruptPropagationToChild(unittest.TestCase): child_thread.join(timeout=1) set_interrupt(False) + def test_prestart_interrupt_binds_to_execution_thread(self): + """An interrupt that arrives before startup should bind to the agent thread.""" + agent = self._make_bare_agent() + barrier = threading.Barrier(2) + result = {} + + agent.interrupt("stop before start") + assert agent._interrupt_requested is True + assert agent._interrupt_thread_signal_pending is True + assert is_interrupted() is False + + def run_thread(): + from tools.interrupt import set_interrupt as _set_interrupt_for_test + + agent._execution_thread_id = threading.current_thread().ident + _set_interrupt_for_test(False, agent._execution_thread_id) + if agent._interrupt_requested: + _set_interrupt_for_test(True, agent._execution_thread_id) + agent._interrupt_thread_signal_pending = False + barrier.wait(timeout=5) + result["thread_interrupted"] = is_interrupted() + + t = threading.Thread(target=run_thread) + t.start() + barrier.wait(timeout=5) + t.join(timeout=2) + + assert result["thread_interrupted"] is True + assert agent._interrupt_thread_signal_pending is False + class TestPerThreadInterruptIsolation(unittest.TestCase): """Verify that interrupting one agent does NOT affect another agent's thread. diff --git a/tests/tools/test_browser_camofox.py b/tests/tools/test_browser_camofox.py index af36f7809..81d69967d 100644 --- a/tests/tools/test_browser_camofox.py +++ b/tests/tools/test_browser_camofox.py @@ -37,6 +37,18 @@ class TestCamofoxMode: monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377") assert is_camofox_mode() is True + def test_cdp_override_takes_priority(self, monkeypatch): + """When BROWSER_CDP_URL is set (via /browser connect), CDP takes priority over Camofox.""" + monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377") + monkeypatch.setenv("BROWSER_CDP_URL", "http://127.0.0.1:9222") + assert is_camofox_mode() is False + + def test_cdp_override_blank_does_not_disable_camofox(self, monkeypatch): + """Empty/whitespace BROWSER_CDP_URL should not suppress Camofox.""" + monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377") + monkeypatch.setenv("BROWSER_CDP_URL", " ") + assert is_camofox_mode() is True + def test_health_check_unreachable(self, monkeypatch): monkeypatch.setenv("CAMOFOX_URL", "http://localhost:19999") assert check_camofox_available() is False diff --git a/tests/tools/test_managed_modal_environment.py b/tests/tools/test_managed_modal_environment.py index 1d7241e0b..d36418336 100644 --- a/tests/tools/test_managed_modal_environment.py +++ b/tests/tools/test_managed_modal_environment.py @@ -296,7 +296,7 @@ def test_managed_modal_execute_times_out_and_cancels(monkeypatch): modal_common = sys.modules["tools.environments.modal_utils"] calls = [] - monotonic_values = iter([0.0, 12.5]) + monotonic_values = iter([0.0, 0.0, 0.0, 12.5, 12.5]) def fake_request(method, url, headers=None, json=None, timeout=None): calls.append((method, url, json, timeout)) diff --git a/tests/tools/test_session_search.py b/tests/tools/test_session_search.py index 852ac7b9e..f5d75bb91 100644 --- a/tests/tools/test_session_search.py +++ b/tests/tools/test_session_search.py @@ -290,6 +290,63 @@ class TestSessionSearch: assert result["results"] == [] assert result["sessions_searched"] == 0 + def test_limit_none_coerced_to_default(self): + """Model sends limit=null → should fall back to 3, not TypeError.""" + from unittest.mock import MagicMock + from tools.session_search_tool import session_search + + mock_db = MagicMock() + mock_db.search_messages.return_value = [] + + result = json.loads(session_search( + query="test", db=mock_db, limit=None, + )) + assert result["success"] is True + + def test_limit_type_object_coerced_to_default(self): + """Model sends limit as a type object → should fall back to 3, not TypeError.""" + from unittest.mock import MagicMock + from tools.session_search_tool import session_search + + mock_db = MagicMock() + mock_db.search_messages.return_value = [] + + result = json.loads(session_search( + query="test", db=mock_db, limit=int, + )) + assert result["success"] is True + + def test_limit_string_coerced(self): + """Model sends limit as string '2' → should coerce to int.""" + from unittest.mock import MagicMock + from tools.session_search_tool import session_search + + mock_db = MagicMock() + mock_db.search_messages.return_value = [] + + result = json.loads(session_search( + query="test", db=mock_db, limit="2", + )) + assert result["success"] is True + + def test_limit_clamped_to_range(self): + """Negative or zero limit should be clamped to 1.""" + from unittest.mock import MagicMock + from tools.session_search_tool import session_search + + mock_db = MagicMock() + mock_db.search_messages.return_value = [] + + result = json.loads(session_search( + query="test", db=mock_db, limit=-5, + )) + assert result["success"] is True + + result = json.loads(session_search( + query="test", db=mock_db, limit=0, + )) + assert result["success"] is True + def test_current_root_session_excludes_child_lineage(self): """Delegation child hits should be excluded when they resolve to the current root session.""" from unittest.mock import MagicMock diff --git a/tools/browser_camofox.py b/tools/browser_camofox.py index fbd1c962b..88f486f19 100644 --- a/tools/browser_camofox.py +++ b/tools/browser_camofox.py @@ -54,7 +54,15 @@ def get_camofox_url() -> str: def is_camofox_mode() -> bool: - """True when Camofox backend is configured.""" + """True when Camofox backend is configured and no CDP override is active. + + When the user has explicitly connected to a live Chrome instance via + ``/browser connect`` (which sets ``BROWSER_CDP_URL``), the CDP connection + takes priority over Camofox so the browser tools operate on the real + browser instead of being silently routed to the Camofox backend. + """ + if os.getenv("BROWSER_CDP_URL", "").strip(): + return False return bool(get_camofox_url()) diff --git a/tools/environments/modal_utils.py b/tools/environments/modal_utils.py index 0db819471..161aad261 100644 --- a/tools/environments/modal_utils.py +++ b/tools/environments/modal_utils.py @@ -105,6 +105,10 @@ class BaseModalExecutionEnvironment(BaseEnvironment): if self._client_timeout_grace_seconds is not None: deadline = time.monotonic() + prepared.timeout + self._client_timeout_grace_seconds + _last_activity_touch = time.monotonic() + _modal_exec_start = time.monotonic() + _ACTIVITY_INTERVAL = 10.0 # match _wait_for_process cadence + while True: if is_interrupted(): try: @@ -128,6 +132,22 @@ class BaseModalExecutionEnvironment(BaseEnvironment): pass return self._timeout_result_for_modal(prepared.timeout) + # Periodic activity touch so the gateway knows we're alive + _now = time.monotonic() + if _now - _last_activity_touch >= _ACTIVITY_INTERVAL: + _last_activity_touch = _now + try: + from tools.environments.base import _get_activity_callback + _cb = _get_activity_callback() + except Exception: + _cb = None + if _cb: + try: + _elapsed = int(_now - _modal_exec_start) + _cb(f"modal command running ({_elapsed}s elapsed)") + except Exception: + pass + time.sleep(self._poll_interval_seconds) def _before_execute(self) -> None: diff --git a/tools/mcp_tool.py b/tools/mcp_tool.py index 50655fa38..5f4505224 100644 --- a/tools/mcp_tool.py +++ b/tools/mcp_tool.py @@ -506,7 +506,7 @@ class SamplingHandler: "type": "function", "function": { "name": tu.name, - "arguments": json.dumps(tu.input) if isinstance(tu.input, dict) else str(tu.input), + "arguments": json.dumps(tu.input, ensure_ascii=False) if isinstance(tu.input, dict) else str(tu.input), }, }) msg_dict: dict = {"role": msg.role, "tool_calls": tc_list} @@ -1274,7 +1274,7 @@ def _interrupted_call_result() -> str: """Standardized JSON error for a user-interrupted MCP tool call.""" return json.dumps({ "error": "MCP call interrupted: user sent a new message" - }) + }, ensure_ascii=False) # --------------------------------------------------------------------------- @@ -1361,7 +1361,7 @@ def _make_tool_handler(server_name: str, tool_name: str, tool_timeout: float): if not server or not server.session: return json.dumps({ "error": f"MCP server '{server_name}' is not connected" - }) + }, ensure_ascii=False) async def _call(): result = await server.session.call_tool(tool_name, arguments=args) @@ -1375,7 +1375,7 @@ def _make_tool_handler(server_name: str, tool_name: str, tool_timeout: float): "error": _sanitize_error( error_text or "MCP tool returned an error" ) - }) + }, ensure_ascii=False) # Collect text from content blocks parts: List[str] = [] @@ -1394,9 +1394,9 @@ def _make_tool_handler(server_name: str, tool_name: str, tool_timeout: float): return json.dumps({ "result": text_result, "structuredContent": structured, - }) - return json.dumps({"result": structured}) - return json.dumps({"result": text_result}) + }, ensure_ascii=False) + return json.dumps({"result": structured}, ensure_ascii=False) + return json.dumps({"result": text_result}, ensure_ascii=False) try: return _run_on_mcp_loop(_call(), timeout=tool_timeout) @@ -1411,7 +1411,7 @@ def _make_tool_handler(server_name: str, tool_name: str, tool_timeout: float): "error": _sanitize_error( f"MCP call failed: {type(exc).__name__}: {exc}" ) - }) + }, ensure_ascii=False) return _handler @@ -1425,7 +1425,7 @@ def _make_list_resources_handler(server_name: str, tool_timeout: float): if not server or not server.session: return json.dumps({ "error": f"MCP server '{server_name}' is not connected" - }) + }, ensure_ascii=False) async def _call(): result = await server.session.list_resources() @@ -1441,7 +1441,7 @@ def _make_list_resources_handler(server_name: str, tool_timeout: float): if hasattr(r, "mimeType") and r.mimeType: entry["mimeType"] = r.mimeType resources.append(entry) - return json.dumps({"resources": resources}) + return json.dumps({"resources": resources}, ensure_ascii=False) try: return _run_on_mcp_loop(_call(), timeout=tool_timeout) @@ -1455,7 +1455,7 @@ def _make_list_resources_handler(server_name: str, tool_timeout: float): "error": _sanitize_error( f"MCP call failed: {type(exc).__name__}: {exc}" ) - }) + }, ensure_ascii=False) return _handler @@ -1471,7 +1471,7 @@ def _make_read_resource_handler(server_name: str, tool_timeout: float): if not server or not server.session: return json.dumps({ "error": f"MCP server '{server_name}' is not connected" - }) + }, ensure_ascii=False) uri = args.get("uri") if not uri: @@ -1487,7 +1487,7 @@ def _make_read_resource_handler(server_name: str, tool_timeout: float): parts.append(block.text) elif hasattr(block, "blob"): parts.append(f"[binary data, {len(block.blob)} bytes]") - return json.dumps({"result": "\n".join(parts) if parts else ""}) + return json.dumps({"result": "\n".join(parts) if parts else ""}, ensure_ascii=False) try: return _run_on_mcp_loop(_call(), timeout=tool_timeout) @@ -1501,7 +1501,7 @@ def _make_read_resource_handler(server_name: str, tool_timeout: float): "error": _sanitize_error( f"MCP call failed: {type(exc).__name__}: {exc}" ) - }) + }, ensure_ascii=False) return _handler @@ -1515,7 +1515,7 @@ def _make_list_prompts_handler(server_name: str, tool_timeout: float): if not server or not server.session: return json.dumps({ "error": f"MCP server '{server_name}' is not connected" - }) + }, ensure_ascii=False) async def _call(): result = await server.session.list_prompts() @@ -1536,7 +1536,7 @@ def _make_list_prompts_handler(server_name: str, tool_timeout: float): for a in p.arguments ] prompts.append(entry) - return json.dumps({"prompts": prompts}) + return json.dumps({"prompts": prompts}, ensure_ascii=False) try: return _run_on_mcp_loop(_call(), timeout=tool_timeout) @@ -1550,7 +1550,7 @@ def _make_list_prompts_handler(server_name: str, tool_timeout: float): "error": _sanitize_error( f"MCP call failed: {type(exc).__name__}: {exc}" ) - }) + }, ensure_ascii=False) return _handler @@ -1566,7 +1566,7 @@ def _make_get_prompt_handler(server_name: str, tool_timeout: float): if not server or not server.session: return json.dumps({ "error": f"MCP server '{server_name}' is not connected" - }) + }, ensure_ascii=False) name = args.get("name") if not name: @@ -1593,7 +1593,7 @@ def _make_get_prompt_handler(server_name: str, tool_timeout: float): resp = {"messages": messages} if hasattr(result, "description") and result.description: resp["description"] = result.description - return json.dumps(resp) + return json.dumps(resp, ensure_ascii=False) try: return _run_on_mcp_loop(_call(), timeout=tool_timeout) @@ -1607,7 +1607,7 @@ def _make_get_prompt_handler(server_name: str, tool_timeout: float): "error": _sanitize_error( f"MCP call failed: {type(exc).__name__}: {exc}" ) - }) + }, ensure_ascii=False) return _handler diff --git a/tools/session_search_tool.py b/tools/session_search_tool.py index 9be73a04a..1398bdfff 100644 --- a/tools/session_search_tool.py +++ b/tools/session_search_tool.py @@ -310,7 +310,15 @@ def session_search( if db is None: return tool_error("Session database not available.", success=False) - limit = min(limit, 5) # Cap at 5 sessions to avoid excessive LLM calls + # Defensive: models (especially open-source) may send non-int limit values + # (None when JSON null, string "int", or even a type object). Coerce to a + # safe integer before any arithmetic/comparison to prevent TypeError. + if not isinstance(limit, int): + try: + limit = int(limit) + except (TypeError, ValueError): + limit = 3 + limit = max(1, min(limit, 5)) # Clamp to [1, 5] # Recent sessions mode: when query is empty, return metadata for recent sessions. # No LLM calls — just DB queries for titles, previews, timestamps.