From 97acd66b4c58c7945f573a6efd6059e781eb4f8f Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 01:29:57 -0700 Subject: [PATCH 01/61] fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(curator): authoritative absorbed_into declarations on skill delete Closes #18671. The classification pipeline that feeds cron-ref rewriting used to infer consolidation vs pruning from two brittle signals: the curator model's post-hoc YAML summary block, and a substring heuristic scanning other tool calls for the removed skill's name. Both miss in real consolidations — the model forgets the YAML under reasoning pressure, and the heuristic misses when the umbrella's patch content describes the absorbed behavior abstractly instead of naming the old slug. When both miss, the skill falls through to 'no-evidence fallback' pruned, and #18253's cron rewriter drops the cron ref entirely instead of mapping it to the umbrella. Same observable symptom as pre-#18253: 'Skill(s) not found and skipped' at the next cron run. The fix makes the model declare intent at the moment of deletion. skill_manage(action='delete') now accepts absorbed_into: - absorbed_into='' -> consolidated, target must exist on disk - absorbed_into='' -> explicit prune, no forwarding target - missing -> legacy path, falls through to heuristic/YAML The curator reconciler reads these declarations off llm_meta.tool_calls BEFORE either the YAML block or the substring heuristic. Declaration wins. Fallback logic stays intact for backward compat with any caller (human or older curator conversation) that doesn't populate the arg. Changes - tools/skill_manager_tool.py: add absorbed_into param to skill_manage + _delete_skill. Validate target exists when non-empty. Reject absorbed_into=. Wire through dispatcher + registry + schema. - agent/curator.py: new _extract_absorbed_into_declarations() walks tool calls for skill_manage(delete) with the arg. _reconcile_classification accepts absorbed_declarations= and treats them as authoritative. Curator prompt updated to require the arg on every delete. - Tests: 7 new skill_manager tests covering the tool contract (valid target, empty string, nonexistent target, self-reference, whitespace, backward compat, dispatcher plumbing). 11 new curator tests covering the extractor + authoritative reconciler path + mixed-legacy-and- declared runs. Validation - 307/307 targeted tests pass (curator + cron + skill_manager suites). - E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing all 3. Model emits NO YAML block. Heuristic misses (patch prose doesn't name old slugs). Delete calls carry absorbed_into. Result: both PR skills correctly classified 'consolidated' + cron rewritten ['pr-review-format', 'pr-review-checklist', 'stale-junk'] -> ['hermes-agent-dev']; stale-junk pruned via absorbed_into=''. - E2E backward-compat: delete without absorbed_into, model emits YAML -> routed via existing 'model' source, cron still rewritten correctly. * feat(curator): capture + restore cron skill links across snapshot/rollback Before this, rolling back a curator run restored the skills tree but cron jobs still pointed at the umbrella skills the curator had rewritten them to. The user would see their old narrow skills back on disk but their cron jobs still configured with the merged umbrella — not actually 'back to how it was'. Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json alongside the skills tarball, as cron-jobs.json. The manifest gets a new 'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI confirm dialog) can surface what's in the snapshot. If jobs.json is missing/unreadable/malformed, snapshot proceeds without cron data — the skills backup is the core guarantee; cron is additive. Rollback side: after the skills extract succeeds, the new _restore_cron_skill_links() reconciles the backed-up jobs into the live jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and only on jobs matched by id. Everything else about a cron job — schedule, last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live state the user or scheduler has modified since the snapshot; overwriting it would regress unrelated activity. Reconciliation rules: - Job in backup AND live, skills differ → skills restored. - Job in backup AND live, skills match → no-op. - Job in backup, NOT in live → skipped (user deleted it after snapshot; their choice is later than the snapshot). - Job in live, NOT in backup → untouched (user created it after snapshot). - Snapshot missing cron-jobs.json at all → rollback still succeeds, reports 'not captured' (older pre-feature snapshots keep working). Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the scheduler uses, so rollback doesn't race tick(). Also: - hermes_cli/curator.py: rollback confirm dialog now shows 'cron jobs: N (will be restored for skill-link fields only)' when the snapshot has cron data, or 'not in snapshot ()' otherwise. - rollback()'s message string includes a 'cron links: ...' clause summarizing the reconciliation outcome. Tests - 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json captured-as-raw, full rollback-restores-skills-and-cron, rollback touches only skill fields, rollback skips user-deleted jobs, rollback leaves user-created jobs untouched, rollback still works with pre-feature snapshot that has no cron-jobs.json, standalone unit test on _restore_cron_skill_links exercising the full report shape. Validation - 484/484 targeted tests pass (curator + cron + skill_manager suites). - E2E: real snapshot_skills, real cron rewrite, real rollback. Before: ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage']. After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id, name, prompt) preserved across the round trip. --- agent/curator.py | 109 +++++++- agent/curator_backup.py | 261 ++++++++++++++++++- hermes_cli/curator.py | 14 +- tests/agent/test_curator_backup.py | 278 +++++++++++++++++++++ tests/agent/test_curator_classification.py | 263 +++++++++++++++++++ tests/tools/test_skill_manager_tool.py | 70 ++++++ tools/skill_manager_tool.py | 65 ++++- 7 files changed, 1049 insertions(+), 11 deletions(-) diff --git a/agent/curator.py b/agent/curator.py index 2eebe10ef5..cce3d8c103 100644 --- a/agent/curator.py +++ b/agent/curator.py @@ -387,6 +387,11 @@ CURATOR_REVIEW_PROMPT = ( " - skill_manage action=write_file — add a references/, templates/, " "or scripts/ file under an existing skill (the skill must already " "exist)\n" + " - skill_manage action=delete — archive a skill. MUST pass " + "`absorbed_into=` when you've merged its content into another " + "skill, or `absorbed_into=\"\"` when you're truly pruning with no " + "forwarding target. This drives cron-job skill-reference migration — " + "guessing from your YAML summary after the fact is fragile.\n" " - terminal — mv a sibling into the archive " "OR move its content into a support subfile\n\n" "'keep' is a legitimate decision ONLY when the skill is already a " @@ -637,15 +642,76 @@ def _parse_structured_summary( return out +def _extract_absorbed_into_declarations( + tool_calls: List[Dict[str, Any]], +) -> Dict[str, Dict[str, Any]]: + """Walk this run's tool calls and extract model-declared absorption targets. + + The curator prompt requires every ``skill_manage(action='delete')`` call + to pass ``absorbed_into=`` when consolidating, or + ``absorbed_into=""`` when truly pruning. This is the single authoritative + signal for classification — the model's own declaration at the moment of + deletion, which beats both post-hoc YAML summary parsing and substring + heuristics on other tool calls. + + Returns ``{skill_name: {"into": "" | "", "declared": True}}``. + Entries with ``into == ""`` are explicit prunings. + Skills without a ``skill_manage(delete)`` call, or with one that omitted + ``absorbed_into``, are not in the returned dict — caller falls back to + the existing heuristic/YAML logic for those (backward compat with older + curator runs and any callers that don't populate the arg). + """ + out: Dict[str, Dict[str, Any]] = {} + for tc in tool_calls or []: + if not isinstance(tc, dict): + continue + if tc.get("name") != "skill_manage": + continue + raw = tc.get("arguments") or "" + args: Dict[str, Any] = {} + if isinstance(raw, dict): + args = raw + elif isinstance(raw, str): + try: + args = json.loads(raw) + except Exception: + continue + if not isinstance(args, dict): + continue + if args.get("action") != "delete": + continue + name = args.get("name") + if not isinstance(name, str) or not name.strip(): + continue + # absorbed_into must be present (even empty string is meaningful); + # missing key means the model didn't declare intent. + if "absorbed_into" not in args: + continue + target = args.get("absorbed_into") + if target is None: + continue + if not isinstance(target, str): + continue + out[name.strip()] = {"into": target.strip(), "declared": True} + return out + + def _reconcile_classification( removed: List[str], heuristic: Dict[str, List[Dict[str, Any]]], model_block: Dict[str, List[Dict[str, str]]], destinations: Set[str], + absorbed_declarations: Optional[Dict[str, Dict[str, Any]]] = None, ) -> Dict[str, List[Dict[str, Any]]]: """Merge heuristic (tool-call evidence) with the model's structured block. - Rules: + Rules (evaluated in order; first match wins): + - **Model-declared `absorbed_into` at delete time is authoritative.** Any + entry in ``absorbed_declarations`` beats every other signal. This is + the model telling us directly, at the moment of deletion, what it did. + ``into != ""`` and target exists → consolidated. ``into == ""`` → + pruned. ``into != ""`` but target doesn't exist → hallucination; fall + through to the usual signals. - Model-declared consolidation wins when its ``into`` target exists in ``destinations`` (survived or newly-created). This gives the model authority over intent + rationale. @@ -666,6 +732,8 @@ def _reconcile_classification( model_cons = {e["from"]: e for e in model_block.get("consolidations", [])} model_pruned = {e["name"]: e for e in model_block.get("prunings", [])} + declared = absorbed_declarations or {} + consolidated: List[Dict[str, Any]] = [] pruned: List[Dict[str, Any]] = [] @@ -673,6 +741,36 @@ def _reconcile_classification( mc = model_cons.get(name) mp = model_pruned.get(name) hc = heur_cons.get(name) + dec = declared.get(name) + + # Authoritative: model declared `absorbed_into` at the delete call. + if dec is not None: + into_claim = dec.get("into", "") + if into_claim and into_claim in destinations: + entry: Dict[str, Any] = { + "name": name, + "into": into_claim, + "source": "absorbed_into (model-declared at delete)", + "reason": (mc.get("reason") or "") if mc else "", + } + if hc and hc.get("evidence"): + entry["evidence"] = hc["evidence"] + consolidated.append(entry) + continue + if into_claim == "": + # Explicit prune declaration + pruned.append({ + "name": name, + "source": "absorbed_into=\"\" (model-declared prune)", + "reason": (mp.get("reason") or "") if mp else "", + }) + continue + # into_claim is non-empty but target doesn't exist: the model + # named a nonexistent umbrella at delete time. The tool already + # rejects this at the skill_manage layer, so we shouldn't see it + # in practice — but if it slips through (e.g. the umbrella was + # deleted LATER in the same run), fall through to the usual + # signals rather than trusting a broken reference. # Model says consolidated — trust it if the destination is real. if mc and mc.get("into") in destinations: @@ -808,11 +906,20 @@ def _write_run_report( ) model_block = _parse_structured_summary(llm_meta.get("final", "") or "") destinations = set(after_names) | set(added or []) + # Authoritative signal: extract per-delete `absorbed_into` declarations + # from this run's tool calls. These beat both the YAML summary block and + # the substring heuristic — the model is telling us directly, at the + # moment of deletion, whether each archived skill was consolidated + # (into=) or pruned (into=""). + absorbed_declarations = _extract_absorbed_into_declarations( + llm_meta.get("tool_calls", []) or [] + ) classification = _reconcile_classification( removed=removed, heuristic=heuristic, model_block=model_block, destinations=destinations, + absorbed_declarations=absorbed_declarations, ) consolidated = classification["consolidated"] pruned = classification["pruned"] diff --git a/agent/curator_backup.py b/agent/curator_backup.py index 268de64f41..fe74920521 100644 --- a/agent/curator_backup.py +++ b/agent/curator_backup.py @@ -21,6 +21,18 @@ It DOES include: pointer — otherwise the curator would immediately re-fire on the next tick) - ``.bundled_manifest`` (so protection markers stay consistent) + +Alongside the skills tarball, each snapshot also captures a copy of +``~/.hermes/cron/jobs.json`` as ``cron-jobs.json`` when it exists. Cron +jobs reference skills by name in their ``skills``/``skill`` fields; the +curator's consolidation pass rewrites those in place via +``cron.jobs.rewrite_skill_refs()``. Without capturing the pre-run state, +rolling back the skills tree would leave cron jobs pointing at the +umbrella skills even though the narrow skills they were originally +configured with have been restored. We store the whole jobs.json for +fidelity but rollback only touches the ``skills``/``skill`` fields — the +rest (schedule, next_run_at, enabled, prompt, etc.) is live state and +we leave it alone. """ from __future__ import annotations @@ -63,6 +75,60 @@ def _skills_dir() -> Path: return get_hermes_home() / "skills" +def _cron_jobs_file() -> Path: + """Source path for the live cron jobs store (``~/.hermes/cron/jobs.json``).""" + return get_hermes_home() / "cron" / "jobs.json" + + +CRON_JOBS_FILENAME = "cron-jobs.json" + + +def _backup_cron_jobs_into(dest: Path) -> Dict[str, Any]: + """Copy the live cron jobs.json into ``dest`` as ``cron-jobs.json``. + + Returns a small dict describing what was captured so the caller can + fold it into the manifest. Never raises — if the cron file is missing + or unreadable, the return dict has ``backed_up=False`` and the reason, + and the snapshot proceeds without cron data (the snapshot is still + useful for rolling back skills). + """ + src = _cron_jobs_file() + info: Dict[str, Any] = {"backed_up": False, "jobs_count": 0} + if not src.exists(): + info["reason"] = "no cron/jobs.json present" + return info + try: + raw = src.read_text(encoding="utf-8") + except OSError as e: + logger.debug("Failed to read cron/jobs.json for backup: %s", e) + info["reason"] = f"read error: {e}" + return info + # Count jobs as a nice diagnostic — but don't fail the snapshot if the + # file is unparseable; just store the raw text and let rollback deal + # with it (or not, if it's corrupted). jobs.json wraps the list as + # `{"jobs": [...], "updated_at": ...}` — we count via that shape, and + # fall back to bare-list shape just in case the format ever changes. + try: + parsed = json.loads(raw) + if isinstance(parsed, dict): + inner = parsed.get("jobs") + if isinstance(inner, list): + info["jobs_count"] = len(inner) + elif isinstance(parsed, list): + info["jobs_count"] = len(parsed) + except (json.JSONDecodeError, TypeError): + info["jobs_count"] = 0 + info["parse_warning"] = "jobs.json was not valid JSON at snapshot time" + try: + (dest / CRON_JOBS_FILENAME).write_text(raw, encoding="utf-8") + except OSError as e: + logger.debug("Failed to write cron backup file: %s", e) + info["reason"] = f"write error: {e}" + return info + info["backed_up"] = True + return info + + def _utc_id(now: Optional[datetime] = None) -> str: """UTC ISO-ish filesystem-safe timestamp: ``2026-05-01T13-05-42Z``.""" if now is None: @@ -116,7 +182,8 @@ def _count_skill_files(base: Path) -> int: def _write_manifest(dest: Path, reason: str, archive_path: Path, - skills_counted: int) -> None: + skills_counted: int, + cron_info: Optional[Dict[str, Any]] = None) -> None: manifest = { "id": dest.name, "reason": reason, @@ -125,6 +192,15 @@ def _write_manifest(dest: Path, reason: str, archive_path: Path, "archive_bytes": archive_path.stat().st_size, "skill_files": skills_counted, } + if cron_info is not None: + manifest["cron_jobs"] = { + "backed_up": bool(cron_info.get("backed_up", False)), + "jobs_count": int(cron_info.get("jobs_count", 0)), + } + if not cron_info.get("backed_up"): + manifest["cron_jobs"]["reason"] = cron_info.get("reason", "not captured") + if cron_info.get("parse_warning"): + manifest["cron_jobs"]["parse_warning"] = cron_info["parse_warning"] (dest / "manifest.json").write_text( json.dumps(manifest, indent=2, sort_keys=True), encoding="utf-8" ) @@ -181,7 +257,14 @@ def snapshot_skills(reason: str = "manual") -> Optional[Path]: # arcname: store paths relative to skills/ so extraction # drops cleanly back into the skills dir. tf.add(str(entry), arcname=entry.name, recursive=True) - _write_manifest(dest, reason, archive, _count_skill_files(skills)) + # Capture cron/jobs.json alongside the tarball. Never fails the + # snapshot — the skills side is the core guarantee; cron is + # additive. We still record in the manifest whether it was + # captured so rollback can surface "no cron data in this snapshot". + cron_info = _backup_cron_jobs_into(dest) + _write_manifest(dest, reason, archive, + _count_skill_files(skills), + cron_info=cron_info) except (OSError, tarfile.TarError) as e: logger.debug("Curator snapshot failed: %s", e, exc_info=True) # Clean up partial snapshot @@ -298,6 +381,149 @@ def _resolve_backup(backup_id: Optional[str]) -> Optional[Path]: return candidates[0] if candidates else None +def _restore_cron_skill_links(snapshot_dir: Path) -> Dict[str, Any]: + """Reconcile backed-up cron skill links into the live ``cron/jobs.json``. + + We do NOT overwrite the whole cron file. Only the ``skills`` and + ``skill`` fields are restored, and only on jobs that still exist in the + current file (matched by ``id``). Everything else about the job — + schedule, next_run_at, last_run_at, enabled, prompt, workdir, hooks — + is live state that the user/scheduler has modified since the snapshot; + overwriting it would regress unrelated cron activity. + + Rules: + - Jobs present in backup AND live, with differing skills → skills restored. + - Jobs present in backup AND live, with matching skills → no-op. + - Jobs present in backup but gone from live (user deleted the job + after the snapshot) → skipped, noted in the return report. + - Jobs present in live but not in backup (user created a new cron + job after the snapshot) → left untouched. + + Never raises; failures are captured in the return dict. Writes through + ``cron.jobs`` to pick up the same lock + atomic-write path that tick() + uses, so we don't race the scheduler. + """ + report: Dict[str, Any] = { + "attempted": False, + "restored": [], + "skipped_missing": [], + "unchanged": 0, + "error": None, + } + backup_file = snapshot_dir / CRON_JOBS_FILENAME + if not backup_file.exists(): + report["error"] = f"snapshot has no {CRON_JOBS_FILENAME}" + return report + + try: + backup_text = backup_file.read_text(encoding="utf-8") + backup_parsed = json.loads(backup_text) + except (OSError, json.JSONDecodeError) as e: + report["error"] = f"failed to load backed-up jobs: {e}" + return report + # jobs.json on disk is `{"jobs": [...], "updated_at": ...}`; accept both + # that shape and a bare list for forward compat. + if isinstance(backup_parsed, dict): + backup_jobs = backup_parsed.get("jobs") + elif isinstance(backup_parsed, list): + backup_jobs = backup_parsed + else: + backup_jobs = None + if not isinstance(backup_jobs, list): + report["error"] = "backed-up cron-jobs.json has no jobs list" + return report + + # Build a lookup of the backed-up skill state keyed by job id. + # We only need the two skill-ish fields (legacy single and modern list). + backup_by_id: Dict[str, Dict[str, Any]] = {} + for job in backup_jobs: + if not isinstance(job, dict): + continue + jid = job.get("id") + if not isinstance(jid, str) or not jid: + continue + backup_by_id[jid] = { + "skills": job.get("skills"), + "skill": job.get("skill"), + "name": job.get("name") or jid, + } + + if not backup_by_id: + report["attempted"] = True # we tried but there was nothing to do + return report + + # Load and rewrite the live jobs under the scheduler's lock. + try: + from cron.jobs import load_jobs, save_jobs, _jobs_file_lock + except ImportError as e: + report["error"] = f"cron module unavailable: {e}" + return report + + report["attempted"] = True + try: + with _jobs_file_lock: + live_jobs = load_jobs() + changed = False + + live_ids = set() + for live in live_jobs: + if not isinstance(live, dict): + continue + jid = live.get("id") + if not isinstance(jid, str) or not jid: + continue + live_ids.add(jid) + + backup = backup_by_id.get(jid) + if backup is None: + continue # live job didn't exist at snapshot time + + cur_skills = live.get("skills") + cur_skill = live.get("skill") + bkp_skills = backup.get("skills") + bkp_skill = backup.get("skill") + + if cur_skills == bkp_skills and cur_skill == bkp_skill: + report["unchanged"] += 1 + continue + + # Restore. Preserve absence (don't force the key to appear + # if the backup didn't have it either). + if bkp_skills is None: + live.pop("skills", None) + else: + live["skills"] = bkp_skills + if bkp_skill is None: + live.pop("skill", None) + else: + live["skill"] = bkp_skill + + report["restored"].append({ + "job_id": jid, + "job_name": backup.get("name") or jid, + "from": {"skills": cur_skills, "skill": cur_skill}, + "to": {"skills": bkp_skills, "skill": bkp_skill}, + }) + changed = True + + # Jobs in backup but not in live = user deleted them after snapshot + for jid, backup in backup_by_id.items(): + if jid not in live_ids: + report["skipped_missing"].append({ + "job_id": jid, + "job_name": backup.get("name") or jid, + }) + + if changed: + save_jobs(live_jobs) + except Exception as e: # noqa: BLE001 — rollback must not die mid-restore + logger.debug("Cron skill-link restore failed: %s", e, exc_info=True) + report["error"] = f"restore failed mid-flight: {e}" + + return report + + + def rollback(backup_id: Optional[str] = None) -> Tuple[bool, str, Optional[Path]]: """Restore ``~/.hermes/skills/`` from a snapshot. @@ -408,8 +634,35 @@ def rollback(backup_id: Optional[str] = None) -> Tuple[bool, str, Optional[Path] except OSError: pass - logger.info("Curator rollback: restored from %s", target.name) - return (True, f"restored from snapshot {target.name}", target) + # Reconcile cron skill-links. Surgical: only the skills/skill fields + # on jobs matched by id. Everything else in jobs.json is live state + # (schedule, next_run_at, enabled, prompt, etc.) and we leave it + # alone. Failures here don't fail the overall rollback — the skills + # tree is already restored, which is the main guarantee. + cron_report = _restore_cron_skill_links(target) + + summary_bits = [f"restored from snapshot {target.name}"] + if cron_report.get("attempted"): + restored_n = len(cron_report.get("restored") or []) + skipped_n = len(cron_report.get("skipped_missing") or []) + if cron_report.get("error"): + summary_bits.append(f"cron links: error — {cron_report['error']}") + elif restored_n == 0 and skipped_n == 0 and cron_report.get("unchanged", 0) == 0: + # Attempted but nothing matched — empty snapshot or no overlapping ids. + pass + else: + parts = [] + if restored_n: + parts.append(f"{restored_n} job(s) had skill links restored") + if skipped_n: + parts.append(f"{skipped_n} backed-up job(s) no longer exist (skipped)") + if cron_report.get("unchanged"): + parts.append(f"{cron_report['unchanged']} already matched") + summary_bits.append("cron links: " + ", ".join(parts)) + + logger.info("Curator rollback: restored from %s (cron_report=%s)", + target.name, cron_report) + return (True, "; ".join(summary_bits), target) # --------------------------------------------------------------------------- diff --git a/hermes_cli/curator.py b/hermes_cli/curator.py index b6646d7299..df69aa7d5d 100644 --- a/hermes_cli/curator.py +++ b/hermes_cli/curator.py @@ -302,9 +302,21 @@ def _cmd_rollback(args) -> int: print(f" reason: {manifest.get('reason', '?')}") print(f" created_at: {manifest.get('created_at', '?')}") print(f" skill files: {manifest.get('skill_files', '?')}") + cron = manifest.get("cron_jobs") or {} + if isinstance(cron, dict): + if cron.get("backed_up"): + print( + f" cron jobs: {cron.get('jobs_count', 0)} " + f"(will be restored for skill-link fields only)" + ) + else: + reason = cron.get("reason", "not captured") + print(f" cron jobs: not in snapshot ({reason})") print( "\nThis will replace the current ~/.hermes/skills/ tree (a safety " - "snapshot of the current state is taken first so this is undoable)." + "snapshot of the current state is taken first so this is undoable). " + "Cron jobs that still exist will have their skills/skill fields " + "restored from the snapshot; all other cron fields are left alone." ) if not getattr(args, "yes", False): diff --git a/tests/agent/test_curator_backup.py b/tests/agent/test_curator_backup.py index 1d906ed745..b375f98688 100644 --- a/tests/agent/test_curator_backup.py +++ b/tests/agent/test_curator_backup.py @@ -314,3 +314,281 @@ def test_dry_run_skips_snapshot(backup_env, monkeypatch): assert not any(r.get("reason") == "pre-curator-run" for r in rows), ( "dry-run must not create a pre-run snapshot" ) + + +# --------------------------------------------------------------------------- +# cron-jobs backup + rollback (the part issue #18671's follow-up adds) +# --------------------------------------------------------------------------- + + +def _write_cron_jobs(home: Path, jobs: list) -> Path: + """Write a synthetic cron/jobs.json under HERMES_HOME. Returns the path. + Mirrors cron.jobs.save_jobs() wrapper shape: `{"jobs": [...], "updated_at": ...}`. + """ + cron_dir = home / "cron" + cron_dir.mkdir(parents=True, exist_ok=True) + path = cron_dir / "jobs.json" + path.write_text( + json.dumps({"jobs": jobs, "updated_at": "2026-05-01T00:00:00Z"}, indent=2), + encoding="utf-8", + ) + return path + + +def _reload_cron_jobs(home: Path): + """Reload cron.jobs so its module-level HERMES_DIR picks up the tmp HOME.""" + import hermes_constants + importlib.reload(hermes_constants) + if "cron.jobs" in sys.modules: + import cron.jobs as _cj + importlib.reload(_cj) + else: + import cron.jobs as _cj # noqa: F401 + import cron.jobs as cj + return cj + + +def test_snapshot_includes_cron_jobs(backup_env): + """With a cron/jobs.json present, snapshot writes cron-jobs.json and records it in manifest.""" + cb = backup_env["cb"] + _write_skill(backup_env["skills"], "alpha") + _write_cron_jobs(backup_env["home"], [ + {"id": "job-a", "name": "a", "schedule": "every 1h", "skills": ["alpha"]}, + {"id": "job-b", "name": "b", "schedule": "every 2h", "skill": "alpha"}, + ]) + + snap = cb.snapshot_skills(reason="test") + assert snap is not None + assert (snap / cb.CRON_JOBS_FILENAME).exists() + + mf = json.loads((snap / "manifest.json").read_text(encoding="utf-8")) + assert mf["cron_jobs"]["backed_up"] is True + assert mf["cron_jobs"]["jobs_count"] == 2 + + +def test_snapshot_without_cron_jobs_file_still_succeeds(backup_env): + """No cron/jobs.json on disk → snapshot succeeds, manifest records absence.""" + cb = backup_env["cb"] + _write_skill(backup_env["skills"], "alpha") + # Deliberately do not create ~/.hermes/cron/jobs.json + + snap = cb.snapshot_skills(reason="test") + assert snap is not None + assert not (snap / cb.CRON_JOBS_FILENAME).exists() + + mf = json.loads((snap / "manifest.json").read_text(encoding="utf-8")) + assert mf["cron_jobs"]["backed_up"] is False + assert "cron/jobs.json" in mf["cron_jobs"]["reason"] + + +def test_snapshot_cron_jobs_malformed_json_still_captured(backup_env): + """Malformed jobs.json is still copied to the snapshot (fidelity over + validation); the manifest notes the parse warning.""" + cb = backup_env["cb"] + _write_skill(backup_env["skills"], "alpha") + (backup_env["home"] / "cron").mkdir() + (backup_env["home"] / "cron" / "jobs.json").write_text("{oh no", encoding="utf-8") + + snap = cb.snapshot_skills(reason="test") + assert snap is not None + # Raw file was copied even though we couldn't parse it + assert (snap / cb.CRON_JOBS_FILENAME).read_text() == "{oh no" + + mf = json.loads((snap / "manifest.json").read_text(encoding="utf-8")) + assert mf["cron_jobs"]["backed_up"] is True + assert mf["cron_jobs"]["jobs_count"] == 0 + assert "parse_warning" in mf["cron_jobs"] + + +def test_rollback_restores_cron_skill_links(backup_env): + """End-to-end: snapshot with job [alpha,beta], curator-style in-place + rewrite to [umbrella], then rollback → skills restored to [alpha,beta].""" + cb = backup_env["cb"] + home = backup_env["home"] + _write_skill(backup_env["skills"], "alpha") + _write_skill(backup_env["skills"], "beta") + _write_skill(backup_env["skills"], "umbrella") + + cj = _reload_cron_jobs(home) + cj.create_job(name="weekly", prompt="p", schedule="every 7d", + skills=["alpha", "beta"]) + + snap = cb.snapshot_skills(reason="pre-curator-run") + assert snap is not None + + # Simulate the curator's in-place cron rewrite after consolidation + cj.rewrite_skill_refs( + consolidated={"alpha": "umbrella", "beta": "umbrella"}, + pruned=[], + ) + live_after_curator = cj.load_jobs() + assert live_after_curator[0]["skills"] == ["umbrella"] + + # Now roll back + ok, msg, _ = cb.rollback(backup_id=snap.name) + assert ok, msg + assert "cron links" in msg + + live_after_rollback = cj.load_jobs() + # skills restored; legacy `skill` mirror follows first element + assert live_after_rollback[0]["skills"] == ["alpha", "beta"] + + +def test_rollback_only_touches_skill_fields(backup_env): + """Every field other than skills/skill must remain untouched across rollback. + Schedule, enabled, prompt, timestamps — all live state, hands off.""" + cb = backup_env["cb"] + home = backup_env["home"] + _write_skill(backup_env["skills"], "alpha") + + # Hand-rolled jobs.json with varied fields (no real create_job — we want + # exact field control). + _write_cron_jobs(home, [{ + "id": "stable-id", + "name": "original-name", + "prompt": "original prompt", + "schedule": "every 1h", + "skills": ["alpha"], + "enabled": True, + "last_run_at": "2026-04-01T00:00:00Z", + }]) + snap = cb.snapshot_skills(reason="pre-curator-run") + assert snap is not None + + # User/scheduler activity AFTER the snapshot: rename the job, change + # the schedule, update timestamps, and (curator) rewrite the skills list. + cj = _reload_cron_jobs(home) + jobs = cj.load_jobs() + jobs[0]["name"] = "renamed-since-snapshot" + jobs[0]["schedule"] = "every 30m" + jobs[0]["last_run_at"] = "2026-05-01T12:00:00Z" + jobs[0]["skills"] = ["umbrella"] # pretend curator did this + cj.save_jobs(jobs) + + ok, _, _ = cb.rollback(backup_id=snap.name) + assert ok + + after = cj.load_jobs() + job = after[0] + # skills: restored + assert job["skills"] == ["alpha"] + # everything else: untouched (live state preserved) + assert job["name"] == "renamed-since-snapshot" + assert job["schedule"] == "every 30m" + assert job["last_run_at"] == "2026-05-01T12:00:00Z" + assert job["prompt"] == "original prompt" + + +def test_rollback_skips_jobs_the_user_deleted(backup_env): + """If the user deleted a cron job after the snapshot, rollback must + NOT resurrect it — the user's delete is a later, explicit choice.""" + cb = backup_env["cb"] + home = backup_env["home"] + _write_skill(backup_env["skills"], "alpha") + + _write_cron_jobs(home, [ + {"id": "keep-me", "name": "keep", "schedule": "every 1h", "skills": ["alpha"]}, + {"id": "delete-me", "name": "gone", "schedule": "every 1h", "skills": ["alpha"]}, + ]) + snap = cb.snapshot_skills(reason="pre-curator-run") + + # User deletes one job after the snapshot + cj = _reload_cron_jobs(home) + cj.save_jobs([j for j in cj.load_jobs() if j["id"] != "delete-me"]) + + ok, _, _ = cb.rollback(backup_id=snap.name) + assert ok + + live_after = cj.load_jobs() + live_ids = {j["id"] for j in live_after} + assert "keep-me" in live_ids + assert "delete-me" not in live_ids # not resurrected + + +def test_rollback_leaves_new_jobs_untouched(backup_env): + """Jobs created AFTER the snapshot must pass through rollback unchanged.""" + cb = backup_env["cb"] + home = backup_env["home"] + _write_skill(backup_env["skills"], "alpha") + _write_cron_jobs(home, [ + {"id": "original", "name": "o", "schedule": "every 1h", "skills": ["alpha"]}, + ]) + snap = cb.snapshot_skills(reason="pre-curator-run") + + cj = _reload_cron_jobs(home) + jobs = cj.load_jobs() + jobs.append({"id": "new-after-snapshot", "name": "new", + "schedule": "every 15m", "skills": ["brand-new-skill"]}) + cj.save_jobs(jobs) + + ok, _, _ = cb.rollback(backup_id=snap.name) + assert ok + + live = cj.load_jobs() + by_id = {j["id"]: j for j in live} + assert "new-after-snapshot" in by_id + # New job's fields completely preserved + assert by_id["new-after-snapshot"]["skills"] == ["brand-new-skill"] + assert by_id["new-after-snapshot"]["schedule"] == "every 15m" + + +def test_rollback_with_snapshot_missing_cron_succeeds(backup_env): + """Older snapshots (created before this feature shipped) have no + cron-jobs.json. Rollback must still restore the skills tree and not + error out.""" + cb = backup_env["cb"] + home = backup_env["home"] + _write_skill(backup_env["skills"], "alpha") + + # No cron/jobs.json at snapshot time — simulates a pre-feature snapshot + snap = cb.snapshot_skills(reason="test") + assert snap is not None + assert not (snap / cb.CRON_JOBS_FILENAME).exists() + + # Later the user created a cron job + _write_cron_jobs(home, [ + {"id": "later-job", "name": "l", "schedule": "every 1h", "skills": ["x"]}, + ]) + + ok, msg, _ = cb.rollback(backup_id=snap.name) + # Main rollback still succeeds; cron report notes the missing file. + assert ok, msg + # Jobs.json untouched (nothing to restore from) + cj = _reload_cron_jobs(home) + jobs = cj.load_jobs() + assert jobs[0]["id"] == "later-job" + assert jobs[0]["skills"] == ["x"] + + +def test_restore_cron_skill_links_standalone(backup_env): + """Unit-level test on _restore_cron_skill_links without the full rollback. + Verifies the report structure carefully.""" + cb = backup_env["cb"] + home = backup_env["home"] + + # Prime a snapshot dir manually with cron-jobs.json + backups_dir = home / "skills" / ".curator_backups" / "fake-id" + backups_dir.mkdir(parents=True) + (backups_dir / cb.CRON_JOBS_FILENAME).write_text(json.dumps([ + {"id": "job-1", "name": "one", "skills": ["narrow-a", "narrow-b"]}, + {"id": "job-2", "name": "two", "skill": "legacy-single"}, + {"id": "job-gone", "name": "deleted", "skills": ["whatever"]}, + ]), encoding="utf-8") + + # Live jobs: job-1 got rewritten, job-2 unchanged, job-gone deleted + _write_cron_jobs(home, [ + {"id": "job-1", "name": "one", "skills": ["umbrella"], "schedule": "every 1h"}, + {"id": "job-2", "name": "two", "skill": "legacy-single", "schedule": "every 1h"}, + {"id": "job-new", "name": "new", "skills": ["x"], "schedule": "every 1h"}, + ]) + _reload_cron_jobs(home) + + report = cb._restore_cron_skill_links(backups_dir) + assert report["attempted"] is True + assert report["error"] is None + assert report["unchanged"] == 1 # job-2 matched + assert len(report["restored"]) == 1 # job-1 got restored + assert report["restored"][0]["job_id"] == "job-1" + assert report["restored"][0]["to"]["skills"] == ["narrow-a", "narrow-b"] + assert len(report["skipped_missing"]) == 1 + assert report["skipped_missing"][0]["job_id"] == "job-gone" diff --git a/tests/agent/test_curator_classification.py b/tests/agent/test_curator_classification.py index edf7394faf..031d66529b 100644 --- a/tests/agent/test_curator_classification.py +++ b/tests/agent/test_curator_classification.py @@ -548,3 +548,266 @@ def test_reconcile_model_block_visible_in_full_report(curator_env): md = (run_dir / "REPORT.md").read_text() assert "duplicate content, now a subsection" in md assert "pre-curator junk" in md + + +# --------------------------------------------------------------------------- +# _extract_absorbed_into_declarations — authoritative signal from delete calls +# --------------------------------------------------------------------------- + + +def test_extract_absorbed_into_picks_up_consolidation(curator_env): + """Delete call with absorbed_into= yields a declaration.""" + declarations = curator_env._extract_absorbed_into_declarations([ + { + "name": "skill_manage", + "arguments": json.dumps({ + "action": "delete", + "name": "narrow-skill", + "absorbed_into": "umbrella", + }), + }, + ]) + assert declarations == { + "narrow-skill": {"into": "umbrella", "declared": True}, + } + + +def test_extract_absorbed_into_empty_string_is_explicit_prune(curator_env): + """absorbed_into='' is recorded as an explicit prune declaration.""" + declarations = curator_env._extract_absorbed_into_declarations([ + { + "name": "skill_manage", + "arguments": json.dumps({ + "action": "delete", + "name": "stale", + "absorbed_into": "", + }), + }, + ]) + assert declarations == {"stale": {"into": "", "declared": True}} + + +def test_extract_absorbed_into_missing_arg_ignored(curator_env): + """Delete call without absorbed_into is skipped — fallback to heuristic.""" + declarations = curator_env._extract_absorbed_into_declarations([ + { + "name": "skill_manage", + "arguments": json.dumps({ + "action": "delete", + "name": "legacy-skill", + }), + }, + ]) + assert declarations == {} + + +def test_extract_absorbed_into_ignores_non_delete_actions(curator_env): + """Patch, create, write_file etc. must not leak into declarations.""" + declarations = curator_env._extract_absorbed_into_declarations([ + { + "name": "skill_manage", + "arguments": json.dumps({ + "action": "patch", + "name": "umbrella", + "old_string": "...", + "new_string": "...", + "absorbed_into": "something", # bogus on non-delete, must be ignored + }), + }, + ]) + assert declarations == {} + + +def test_extract_absorbed_into_accepts_dict_arguments(curator_env): + """arguments can arrive as a dict (defensive path) — still works.""" + declarations = curator_env._extract_absorbed_into_declarations([ + { + "name": "skill_manage", + "arguments": { + "action": "delete", + "name": "narrow", + "absorbed_into": "umbrella", + }, + }, + ]) + assert declarations == {"narrow": {"into": "umbrella", "declared": True}} + + +def test_extract_absorbed_into_strips_whitespace(curator_env): + declarations = curator_env._extract_absorbed_into_declarations([ + { + "name": "skill_manage", + "arguments": json.dumps({ + "action": "delete", + "name": " narrow ", + "absorbed_into": " umbrella ", + }), + }, + ]) + assert declarations == {"narrow": {"into": "umbrella", "declared": True}} + + +def test_extract_absorbed_into_ignores_non_skill_manage_calls(curator_env): + declarations = curator_env._extract_absorbed_into_declarations([ + {"name": "terminal", "arguments": json.dumps({"command": "ls"})}, + {"name": "read_file", "arguments": json.dumps({"path": "/tmp/x"})}, + ]) + assert declarations == {} + + +def test_extract_absorbed_into_handles_malformed_arguments(curator_env): + """Garbage JSON in arguments must not crash the extractor.""" + declarations = curator_env._extract_absorbed_into_declarations([ + {"name": "skill_manage", "arguments": "{not json"}, + {"name": "skill_manage", "arguments": None}, + {"name": "skill_manage"}, # no arguments key at all + ]) + assert declarations == {} + + +# --------------------------------------------------------------------------- +# _reconcile_classification with absorbed_into declarations (authoritative) +# --------------------------------------------------------------------------- + + +def test_reconcile_absorbed_into_beats_everything_else(curator_env): + """Model declared absorbed_into at delete; YAML/heuristic disagree — declaration wins. + + This is the exact #18671 regression: the model forgets to emit the YAML + summary block, the heuristic's substring match misses because the + umbrella's patch content doesn't literally contain the old skill's + slug. Previously this fell through to 'no-evidence fallback' prune, + which dropped the cron ref instead of rewriting. With absorbed_into + declared, the model tells us directly. + """ + out = curator_env._reconcile_classification( + removed=["pr-review-format"], + heuristic={"consolidated": [], "pruned": [{"name": "pr-review-format"}]}, + model_block={"consolidations": [], "prunings": []}, # model forgot YAML block + destinations={"hermes-agent-dev"}, + absorbed_declarations={ + "pr-review-format": {"into": "hermes-agent-dev", "declared": True}, + }, + ) + assert len(out["consolidated"]) == 1 + assert out["pruned"] == [] + e = out["consolidated"][0] + assert e["name"] == "pr-review-format" + assert e["into"] == "hermes-agent-dev" + assert "absorbed_into" in e["source"] + + +def test_reconcile_absorbed_into_empty_is_explicit_prune(curator_env): + """absorbed_into='' takes precedence and routes to pruned, not fallback.""" + out = curator_env._reconcile_classification( + removed=["stale"], + heuristic={"consolidated": [], "pruned": [{"name": "stale"}]}, + model_block={"consolidations": [], "prunings": []}, + destinations=set(), + absorbed_declarations={ + "stale": {"into": "", "declared": True}, + }, + ) + assert out["consolidated"] == [] + assert len(out["pruned"]) == 1 + assert "model-declared prune" in out["pruned"][0]["source"] + + +def test_reconcile_absorbed_into_nonexistent_target_falls_through(curator_env): + """If the declared umbrella doesn't exist in destinations, fall through to + heuristic/YAML logic. Shouldn't happen in practice (the tool validates at + delete time) but the reconciler is defensive.""" + out = curator_env._reconcile_classification( + removed=["thing"], + heuristic={ + "consolidated": [{"name": "thing", "into": "real-umbrella", "evidence": "..."}], + "pruned": [], + }, + model_block={"consolidations": [], "prunings": []}, + destinations={"real-umbrella"}, + absorbed_declarations={ + "thing": {"into": "ghost-umbrella", "declared": True}, + }, + ) + assert len(out["consolidated"]) == 1 + assert out["consolidated"][0]["into"] == "real-umbrella" + assert "tool-call audit" in out["consolidated"][0]["source"] + + +def test_reconcile_declaration_preserves_yaml_reason(curator_env): + """When the model both declared absorbed_into AND emitted YAML with reason, + the reason carries through so REPORT.md still has it.""" + out = curator_env._reconcile_classification( + removed=["narrow"], + heuristic={"consolidated": [], "pruned": []}, + model_block={ + "consolidations": [{ + "from": "narrow", + "into": "umbrella", + "reason": "duplicate of umbrella's main content", + }], + "prunings": [], + }, + destinations={"umbrella"}, + absorbed_declarations={ + "narrow": {"into": "umbrella", "declared": True}, + }, + ) + assert len(out["consolidated"]) == 1 + e = out["consolidated"][0] + assert e["into"] == "umbrella" + assert "absorbed_into" in e["source"] + assert e["reason"] == "duplicate of umbrella's main content" + + +def test_reconcile_without_declarations_preserves_legacy_behavior(curator_env): + """Backward compat: no absorbed_declarations arg → all existing logic intact.""" + out = curator_env._reconcile_classification( + removed=["thing"], + heuristic={ + "consolidated": [{"name": "thing", "into": "umbrella", "evidence": "..."}], + "pruned": [], + }, + model_block={"consolidations": [], "prunings": []}, + destinations={"umbrella"}, + # no absorbed_declarations — defaults to None → behaves identically to pre-change + ) + assert len(out["consolidated"]) == 1 + assert out["consolidated"][0]["into"] == "umbrella" + + +def test_reconcile_mixed_declarations_and_legacy_calls(curator_env): + """Real-world run: some deletes declared absorbed_into, some didn't. + Declared ones use the authoritative path; others fall through to YAML/heuristic. + """ + out = curator_env._reconcile_classification( + removed=["declared-cons", "declared-prune", "legacy-cons", "legacy-prune"], + heuristic={ + "consolidated": [ + {"name": "legacy-cons", "into": "umbrella-a", "evidence": "..."}, + ], + "pruned": [{"name": "legacy-prune"}], + }, + model_block={"consolidations": [], "prunings": []}, + destinations={"umbrella-a", "umbrella-b"}, + absorbed_declarations={ + "declared-cons": {"into": "umbrella-b", "declared": True}, + "declared-prune": {"into": "", "declared": True}, + }, + ) + cons_by_name = {e["name"]: e for e in out["consolidated"]} + pruned_by_name = {e["name"]: e for e in out["pruned"]} + + assert "declared-cons" in cons_by_name + assert cons_by_name["declared-cons"]["into"] == "umbrella-b" + assert "absorbed_into" in cons_by_name["declared-cons"]["source"] + + assert "legacy-cons" in cons_by_name + assert cons_by_name["legacy-cons"]["into"] == "umbrella-a" + assert "tool-call audit" in cons_by_name["legacy-cons"]["source"] + + assert "declared-prune" in pruned_by_name + assert "model-declared prune" in pruned_by_name["declared-prune"]["source"] + + assert "legacy-prune" in pruned_by_name + assert "no-evidence fallback" in pruned_by_name["legacy-prune"]["source"] diff --git a/tests/tools/test_skill_manager_tool.py b/tests/tools/test_skill_manager_tool.py index 00eaf51ea0..004924b9f4 100644 --- a/tests/tools/test_skill_manager_tool.py +++ b/tests/tools/test_skill_manager_tool.py @@ -371,6 +371,57 @@ class TestDeleteSkill: _delete_skill("my-skill") assert not (tmp_path / "devops").exists() + def test_delete_with_absorbed_into_valid_target(self, tmp_path): + with _skill_dir(tmp_path): + _create_skill("umbrella", VALID_SKILL_CONTENT) + _create_skill("narrow", VALID_SKILL_CONTENT) + result = _delete_skill("narrow", absorbed_into="umbrella") + assert result["success"] is True + assert "absorbed into 'umbrella'" in result["message"] + assert not (tmp_path / "narrow").exists() + assert (tmp_path / "umbrella").exists() + + def test_delete_with_absorbed_into_empty_string_means_pruned(self, tmp_path): + with _skill_dir(tmp_path): + _create_skill("stale-skill", VALID_SKILL_CONTENT) + result = _delete_skill("stale-skill", absorbed_into="") + assert result["success"] is True + # Empty absorbed_into is explicit prune — no "absorbed into" suffix in message + assert "absorbed into" not in result["message"] + + def test_delete_with_absorbed_into_nonexistent_target_rejected(self, tmp_path): + with _skill_dir(tmp_path): + _create_skill("narrow", VALID_SKILL_CONTENT) + result = _delete_skill("narrow", absorbed_into="ghost-umbrella") + assert result["success"] is False + assert "does not exist" in result["error"] + # Skill must NOT have been deleted on validation failure + assert (tmp_path / "narrow").exists() + + def test_delete_with_absorbed_into_equals_self_rejected(self, tmp_path): + with _skill_dir(tmp_path): + _create_skill("narrow", VALID_SKILL_CONTENT) + result = _delete_skill("narrow", absorbed_into="narrow") + assert result["success"] is False + assert "cannot equal" in result["error"] + assert (tmp_path / "narrow").exists() + + def test_delete_with_absorbed_into_whitespace_only_treated_as_prune(self, tmp_path): + # Leading/trailing whitespace only: .strip() → "" → pruned path + with _skill_dir(tmp_path): + _create_skill("narrow", VALID_SKILL_CONTENT) + result = _delete_skill("narrow", absorbed_into=" ") + assert result["success"] is True + assert "absorbed into" not in result["message"] + + def test_delete_without_absorbed_into_backward_compat(self, tmp_path): + # Legacy callers that don't pass the arg still work — the curator + # reconciler falls back to its heuristic+YAML logic for such deletes. + with _skill_dir(tmp_path): + _create_skill("my-skill", VALID_SKILL_CONTENT) + result = _delete_skill("my-skill") + assert result["success"] is True + # --------------------------------------------------------------------------- # write_file / remove_file @@ -485,6 +536,25 @@ class TestSkillManageDispatcher: result = json.loads(raw) assert result["success"] is True + def test_delete_via_dispatcher_threads_absorbed_into(self, tmp_path): + # Dispatcher must plumb absorbed_into through to _delete_skill so the + # validation + message suffix paths are exercised end-to-end. + with _skill_dir(tmp_path): + skill_manage(action="create", name="umbrella", content=VALID_SKILL_CONTENT) + skill_manage(action="create", name="narrow", content=VALID_SKILL_CONTENT) + raw = skill_manage(action="delete", name="narrow", absorbed_into="umbrella") + result = json.loads(raw) + assert result["success"] is True + assert "absorbed into 'umbrella'" in result["message"] + + def test_delete_via_dispatcher_rejects_missing_absorbed_target(self, tmp_path): + with _skill_dir(tmp_path): + skill_manage(action="create", name="narrow", content=VALID_SKILL_CONTENT) + raw = skill_manage(action="delete", name="narrow", absorbed_into="ghost") + result = json.loads(raw) + assert result["success"] is False + assert "does not exist" in result["error"] + class TestSecurityScanGate: """_security_scan_skill is gated by skills.guard_agent_created config flag.""" diff --git a/tools/skill_manager_tool.py b/tools/skill_manager_tool.py index e1b9a5f055..d8d44f1a8b 100644 --- a/tools/skill_manager_tool.py +++ b/tools/skill_manager_tool.py @@ -560,8 +560,18 @@ def _patch_skill( } -def _delete_skill(name: str) -> Dict[str, Any]: - """Delete a skill.""" +def _delete_skill(name: str, absorbed_into: Optional[str] = None) -> Dict[str, Any]: + """Delete a skill. + + ``absorbed_into`` declares intent: + - ``None`` / missing → caller didn't declare (legacy / non-curator path); + accepted for backward compat but logs a warning because the curator + classification pipeline can't tell consolidation from pruning without it. + - ``""`` (empty) → explicit "truly pruned, no forwarding target". + - ``""`` → content was absorbed into that umbrella; the + target must exist on disk. Validated here so the model can't claim an + umbrella that doesn't exist. + """ existing = _find_skill(name) if not existing: return {"success": False, "error": f"Skill '{name}' not found."} @@ -570,6 +580,24 @@ def _delete_skill(name: str) -> Dict[str, Any]: if pinned_err: return {"success": False, "error": pinned_err} + # Validate absorbed_into target when declared non-empty + if absorbed_into is not None and isinstance(absorbed_into, str) and absorbed_into.strip(): + target_name = absorbed_into.strip() + if target_name == name: + return { + "success": False, + "error": f"absorbed_into='{target_name}' cannot equal the skill being deleted.", + } + target = _find_skill(target_name) + if not target: + return { + "success": False, + "error": ( + f"absorbed_into='{target_name}' does not exist. " + f"Create or patch the umbrella skill first, then retry the delete." + ), + } + skill_dir = existing["path"] skills_root = _containing_skills_root(skill_dir) shutil.rmtree(skill_dir) @@ -579,9 +607,13 @@ def _delete_skill(name: str) -> Dict[str, Any]: if parent != skills_root and parent.exists() and not any(parent.iterdir()): parent.rmdir() + message = f"Skill '{name}' deleted." + if absorbed_into is not None and isinstance(absorbed_into, str) and absorbed_into.strip(): + message += f" Content absorbed into '{absorbed_into.strip()}'." + return { "success": True, - "message": f"Skill '{name}' deleted.", + "message": message, } @@ -702,6 +734,7 @@ def skill_manage( old_string: str = None, new_string: str = None, replace_all: bool = False, + absorbed_into: str = None, ) -> str: """ Manage user-created skills. Dispatches to the appropriate action handler. @@ -726,7 +759,7 @@ def skill_manage( result = _patch_skill(name, old_string, new_string, file_path, replace_all) elif action == "delete": - result = _delete_skill(name) + result = _delete_skill(name, absorbed_into=absorbed_into) elif action == "write_file": if not file_path: @@ -778,6 +811,13 @@ SKILL_MANAGE_SCHEMA = { "patch (old_string/new_string — preferred for fixes), " "edit (full SKILL.md rewrite — major overhauls only), " "delete, write_file, remove_file.\n\n" + "On delete, pass `absorbed_into=` when you're merging this " + "skill's content into another one, or `absorbed_into=\"\"` when you're " + "pruning it with no forwarding target. This lets the curator tell " + "consolidation from pruning without guessing, so downstream consumers " + "(cron jobs that reference the old skill name, etc.) get updated " + "correctly. The target you name in `absorbed_into` must already " + "exist — create/patch the umbrella first, then delete.\n\n" "Create when: complex task succeeded (5+ calls), errors overcome, " "user-corrected approach worked, non-trivial workflow discovered, " "or user asks you to remember a procedure.\n" @@ -855,6 +895,20 @@ SKILL_MANAGE_SCHEMA = { "type": "string", "description": "Content for the file. Required for 'write_file'." }, + "absorbed_into": { + "type": "string", + "description": ( + "For 'delete' only — declares intent so the curator can " + "tell consolidation from pruning without guessing. " + "Pass the umbrella skill name when this skill's content " + "was merged into another (the target must already exist). " + "Pass an empty string when the skill is truly stale and " + "being pruned with no forwarding target. Omitting the arg " + "on delete is supported for backward compatibility but " + "downstream tooling (e.g. cron-job skill reference " + "rewriting) will have to guess at intent." + ) + }, }, "required": ["action", "name"], }, @@ -877,6 +931,7 @@ registry.register( file_content=args.get("file_content"), old_string=args.get("old_string"), new_string=args.get("new_string"), - replace_all=args.get("replace_all", False)), + replace_all=args.get("replace_all", False), + absorbed_into=args.get("absorbed_into")), emoji="📝", ) From c73594fe4196b5ee331d25f86774e66ad0f67a69 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 01:36:53 -0700 Subject: [PATCH 02/61] fix(skills): rescan skill_commands cache when platform scope changes (#18739) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The process-global `_skill_commands` dict in agent/skill_commands.py was seeded by whichever platform scanned first, and `get_skill_commands()` only rescanned when the cache was empty. In a long-lived gateway process serving multiple platforms (Telegram + Discord + Slack), the first platform's `skills.platform_disabled` view was silently inherited by the others — so a skill disabled for Telegram would also disappear from Discord's slash menu, and vice versa. Track the platform scope the cache was populated for (`_skill_commands_platform`) and rescan in `get_skill_commands()` when the currently-active platform no longer matches. Platform resolution uses the same precedence as `_is_skill_disabled`: `HERMES_PLATFORM` env var then `HERMES_SESSION_PLATFORM` from the gateway session context. Fixes #14536 Salvages #14570 by LeonSGP43. Co-authored-by: LeonSGP --- agent/skill_commands.py | 41 +++++++++++++++++++++-- tests/agent/test_skill_commands.py | 52 ++++++++++++++++++++++++++++++ 2 files changed, 90 insertions(+), 3 deletions(-) diff --git a/agent/skill_commands.py b/agent/skill_commands.py index ad1f03824d..0276d5fc9a 100644 --- a/agent/skill_commands.py +++ b/agent/skill_commands.py @@ -6,6 +6,7 @@ can invoke skills via /skill-name commands. import json import logging +import os import re from pathlib import Path from typing import Any, Dict, Optional @@ -20,10 +21,35 @@ from agent.skill_preprocessing import ( logger = logging.getLogger(__name__) _skill_commands: Dict[str, Dict[str, Any]] = {} +_skill_commands_platform: Optional[str] = None # Patterns for sanitizing skill names into clean hyphen-separated slugs. _SKILL_INVALID_CHARS = re.compile(r"[^a-z0-9-]") _SKILL_MULTI_HYPHEN = re.compile(r"-{2,}") + +def _resolve_skill_commands_platform() -> Optional[str]: + """Return the current platform scope used for disabled-skill filtering. + + Used to detect when the active platform has shifted so + :func:`get_skill_commands` can drop a stale cache that was populated + for a different platform's ``skills.platform_disabled`` view (#14536). + + Resolves from (in order) ``HERMES_PLATFORM`` env var and + ``HERMES_SESSION_PLATFORM`` from the gateway session context. Returns + ``None`` when no platform scope is active (e.g. classic CLI, RL + rollouts, standalone scripts). + """ + try: + from gateway.session_context import get_session_env + + resolved_platform = ( + os.getenv("HERMES_PLATFORM") + or get_session_env("HERMES_SESSION_PLATFORM") + ) + except Exception: + resolved_platform = os.getenv("HERMES_PLATFORM") + return resolved_platform or None + def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tuple[dict[str, Any], Path | None, str] | None: """Load a skill by name/path and return (loaded_payload, skill_dir, display_name).""" raw_identifier = (skill_identifier or "").strip() @@ -218,7 +244,8 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]: Returns: Dict mapping "/skill-name" to {name, description, skill_md_path, skill_dir}. """ - global _skill_commands + global _skill_commands, _skill_commands_platform + _skill_commands_platform = _resolve_skill_commands_platform() _skill_commands = {} try: from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform, _get_disabled_skill_names @@ -278,8 +305,16 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]: def get_skill_commands() -> Dict[str, Dict[str, Any]]: - """Return the current skill commands mapping (scan first if empty).""" - if not _skill_commands: + """Return the current skill commands mapping (scan first if empty). + + Rescans when the active platform scope changes (e.g. a gateway + process serving Telegram and Discord concurrently) so each platform + sees its own ``skills.platform_disabled`` view (#14536). + """ + if ( + not _skill_commands + or _skill_commands_platform != _resolve_skill_commands_platform() + ): scan_skill_commands() return _skill_commands diff --git a/tests/agent/test_skill_commands.py b/tests/agent/test_skill_commands.py index 6879baed82..bdea17385c 100644 --- a/tests/agent/test_skill_commands.py +++ b/tests/agent/test_skill_commands.py @@ -125,6 +125,58 @@ class TestScanSkillCommands: assert "/knowledge-brain" in result assert result["/knowledge-brain"]["name"] == "knowledge-brain" + def test_get_skill_commands_rescans_when_platform_scope_changes(self, tmp_path): + """Platform-specific disabled-skill caches must not leak across platforms. + + Regression test for #14536: a gateway process serving Telegram + and Discord concurrently would seed the process-global cache + with whichever platform scanned first, and subsequent + ``get_skill_commands()`` calls from the other platform silently + inherited that filter. + """ + import agent.skill_commands as sc_mod + from agent.skill_commands import get_skill_commands + + def _disabled_skills(): + platform = os.getenv("HERMES_PLATFORM") + if platform == "telegram": + return {"telegram-only"} + if platform == "discord": + return {"discord-only"} + return set() + + with ( + patch("tools.skills_tool.SKILLS_DIR", tmp_path), + patch("tools.skills_tool._get_disabled_skill_names", side_effect=_disabled_skills), + patch.object(sc_mod, "_skill_commands", {}), + patch.object(sc_mod, "_skill_commands_platform", None), + ): + _make_skill(tmp_path, "shared") + _make_skill(tmp_path, "telegram-only") + _make_skill(tmp_path, "discord-only") + + with patch.dict(os.environ, {"HERMES_PLATFORM": "telegram"}): + telegram_commands = dict(get_skill_commands()) + + assert "/shared" in telegram_commands + assert "/discord-only" in telegram_commands + assert "/telegram-only" not in telegram_commands + + with patch.dict(os.environ, {"HERMES_PLATFORM": "discord"}): + discord_commands = dict(get_skill_commands()) + + assert "/shared" in discord_commands + assert "/telegram-only" in discord_commands + assert "/discord-only" not in discord_commands + + # Switching back to telegram must also rescan — not re-serve + # the discord view that was just cached. + with patch.dict(os.environ, {"HERMES_PLATFORM": "telegram"}): + telegram_again = dict(get_skill_commands()) + + assert "/telegram-only" not in telegram_again + assert "/discord-only" in telegram_again + def test_special_chars_stripped_from_cmd_key(self, tmp_path): """Skill names with +, /, or other special chars produce clean cmd keys.""" From e2cea6eeba36e8d6b96c0ed08bc4514b5c07c464 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 01:36:57 -0700 Subject: [PATCH 03/61] fix(gateway): include external_dirs skills in Telegram/Discord slash commands (#18741) Skills configured through `skills.external_dirs` in config.yaml were visible via `hermes skills list`, `get_skill_commands()`, and the agent's `/skill-name` dispatch, but silently excluded from the Telegram and Discord slash-command menus. The filter in `_collect_gateway_skill_entries` only accepted skills whose `skill_md_path` started with `SKILLS_DIR`, so anything under an external directory fell through. Widen the accepted-prefix set to include all configured external dirs alongside the local skills dir. Every prefix is now slash-terminated so `/my-skills` cannot also admit `/my-skills-extra`. Also guard against empty `skill_md_path` values so they can't accidentally match. Fixes #8110 Salvages #8790 by luyao618. Co-authored-by: Yao <34041715+luyao618@users.noreply.github.com> --- hermes_cli/commands.py | 17 +++++++- tests/hermes_cli/test_commands.py | 67 +++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+), 2 deletions(-) diff --git a/hermes_cli/commands.py b/hermes_cli/commands.py index 41b1dad500..2175661ba9 100644 --- a/hermes_cli/commands.py +++ b/hermes_cli/commands.py @@ -611,13 +611,26 @@ def _collect_gateway_skill_entries( try: from agent.skill_commands import get_skill_commands from tools.skills_tool import SKILLS_DIR + from agent.skill_utils import get_external_skills_dirs _skills_dir = str(SKILLS_DIR.resolve()) - _hub_dir = str((SKILLS_DIR / ".hub").resolve()) + _hub_dir = str((SKILLS_DIR / ".hub").resolve()).rstrip("/") + "/" + # Build set of allowed directory prefixes: local skills dir + any + # user-configured ``skills.external_dirs``. Ensure each prefix ends + # with ``/`` so ``/my-skills`` does not also match ``/my-skills-extra``. + # Without this widening, external skills are visible in + # ``hermes skills list`` and the agent's ``/skill-name`` dispatch but + # silently excluded from gateway slash menus (#8110). + _allowed_prefixes = [_skills_dir.rstrip("/") + "/"] + _allowed_prefixes.extend( + str(d).rstrip("/") + "/" for d in get_external_skills_dirs() + ) skill_cmds = get_skill_commands() for cmd_key in sorted(skill_cmds): info = skill_cmds[cmd_key] skill_path = info.get("skill_md_path", "") - if not skill_path.startswith(_skills_dir): + if not skill_path: + continue + if not any(skill_path.startswith(prefix) for prefix in _allowed_prefixes): continue if skill_path.startswith(_hub_dir): continue diff --git a/tests/hermes_cli/test_commands.py b/tests/hermes_cli/test_commands.py index a35adbe4cc..296143a032 100644 --- a/tests/hermes_cli/test_commands.py +++ b/tests/hermes_cli/test_commands.py @@ -899,6 +899,73 @@ class TestTelegramMenuCommands: assert "my_enabled_skill" in menu_names assert "my_disabled_skill" not in menu_names + def test_external_dir_skills_included_in_telegram_menu(self, tmp_path, monkeypatch): + """External skills (``skills.external_dirs``) must appear in the Telegram menu. + + Regression test for #8110 — external skills were visible to the + agent and CLI but silently excluded from gateway slash menus + because ``_collect_gateway_skill_entries`` only accepted skills + whose path started with ``SKILLS_DIR``. + + Also verifies the trailing-slash boundary: a directory that + simply shares a prefix with a configured ``external_dirs`` entry + (``/tmp/my-skills-extra`` vs ``/tmp/my-skills``) must NOT be + admitted. + """ + from unittest.mock import patch + + local_dir = tmp_path / "skills" + local_dir.mkdir() + external_dir = tmp_path / "my-skills" + external_dir.mkdir() + lookalike_dir = tmp_path / "my-skills-extra" + lookalike_dir.mkdir() + + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + (tmp_path / "config.yaml").write_text( + f"skills:\n external_dirs:\n - {external_dir}\n" + ) + + fake_cmds = { + "/local-one": { + "name": "local-one", + "description": "Local", + "skill_md_path": f"{local_dir}/local-one/SKILL.md", + "skill_dir": f"{local_dir}/local-one", + }, + "/morning-briefing": { + "name": "morning-briefing", + "description": "External skill", + "skill_md_path": f"{external_dir}/morning-briefing/SKILL.md", + "skill_dir": f"{external_dir}/morning-briefing", + }, + "/lookalike-skill": { + "name": "lookalike-skill", + "description": "Lives in a sibling dir that shares a prefix", + "skill_md_path": f"{lookalike_dir}/lookalike-skill/SKILL.md", + "skill_dir": f"{lookalike_dir}/lookalike-skill", + }, + } + + with ( + patch("agent.skill_commands.get_skill_commands", return_value=fake_cmds), + patch("tools.skills_tool.SKILLS_DIR", local_dir), + patch( + "agent.skill_utils.get_external_skills_dirs", + return_value=[external_dir], + ), + ): + menu, _ = telegram_menu_commands(max_commands=100) + + menu_names = {n for n, _ in menu} + assert "local_one" in menu_names, "local skill must appear" + assert "morning_briefing" in menu_names, ( + "external skill from skills.external_dirs must appear (fixes #8110)" + ) + assert "lookalike_skill" not in menu_names, ( + "prefix-match sibling directories must not be admitted" + ) + def test_special_chars_in_skill_names_sanitized(self, tmp_path, monkeypatch): """Skills with +, /, or other special chars produce valid Telegram names.""" from unittest.mock import patch From c5e3a6fb5bb33477d639219de14922caedda98ef Mon Sep 17 00:00:00 2001 From: CoreyNoDream Date: Sat, 2 May 2026 15:14:03 +0800 Subject: [PATCH 04/61] fix(cli): decode .env as UTF-8 to avoid GBK crash on Windows Path.read_text() uses the system locale by default. On Windows CN/JP/KR locales (GBK/CP932/CP949), reading a UTF-8 .env raises UnicodeDecodeError as soon as it contains any non-ASCII byte (e.g. an em dash). Pin encoding="utf-8" on every .env read in hermes_cli to match how the rest of the codebase (load_dotenv at doctor.py:26) already decodes it. Adds a regression test that monkeypatches Path.read_text to simulate a GBK locale and asserts 'hermes doctor' no longer raises. Refs #18637 --- hermes_cli/doctor.py | 7 +++-- hermes_cli/main.py | 2 +- hermes_cli/memory_setup.py | 2 +- tests/hermes_cli/test_doctor.py | 51 +++++++++++++++++++++++++++++++++ 4 files changed, 58 insertions(+), 4 deletions(-) diff --git a/hermes_cli/doctor.py b/hermes_cli/doctor.py index f0822bdce8..122ed141cc 100644 --- a/hermes_cli/doctor.py +++ b/hermes_cli/doctor.py @@ -263,8 +263,11 @@ def run_doctor(args): if env_path.exists(): check_ok(f"{_DHH}/.env file exists") - # Check for common issues - content = env_path.read_text() + # Check for common issues. Pin encoding to UTF-8 because .env files are + # written as UTF-8 everywhere in the codebase, while Path.read_text() + # defaults to the system locale — which crashes on non-UTF-8 Windows + # locales (e.g. GBK) as soon as the file contains any non-ASCII byte. + content = env_path.read_text(encoding="utf-8") if _has_provider_env_config(content): check_ok("API key or custom endpoint configured") else: diff --git a/hermes_cli/main.py b/hermes_cli/main.py index 856d85c636..ed8c24c8fa 100644 --- a/hermes_cli/main.py +++ b/hermes_cli/main.py @@ -289,7 +289,7 @@ def _has_any_provider_configured() -> bool: env_file = get_env_path() if env_file.exists(): try: - for line in env_file.read_text().splitlines(): + for line in env_file.read_text(encoding="utf-8").splitlines(): line = line.strip() if line.startswith("#") or "=" not in line: continue diff --git a/hermes_cli/memory_setup.py b/hermes_cli/memory_setup.py index 88186b8ec6..158f80a766 100644 --- a/hermes_cli/memory_setup.py +++ b/hermes_cli/memory_setup.py @@ -361,7 +361,7 @@ def _write_env_vars(env_path: Path, env_writes: dict) -> None: existing_lines = [] if env_path.exists(): - existing_lines = env_path.read_text().splitlines() + existing_lines = env_path.read_text(encoding="utf-8").splitlines() updated_keys = set() new_lines = [] diff --git a/tests/hermes_cli/test_doctor.py b/tests/hermes_cli/test_doctor.py index 5fafcb81f6..4a5981c07a 100644 --- a/tests/hermes_cli/test_doctor.py +++ b/tests/hermes_cli/test_doctor.py @@ -51,6 +51,57 @@ class TestProviderEnvDetection: assert not _has_provider_env_config(content) +class TestDoctorEnvFileEncoding: + """Regression for #18637 (bug 3): `hermes doctor` crashed on Windows + Chinese locale (GBK) because `.env` was read with Path.read_text() which + defaults to the system locale encoding, not UTF-8.""" + + def test_doctor_reads_env_as_utf8_even_when_locale_is_not_utf8( + self, monkeypatch, tmp_path + ): + import pathlib + + hermes_home = tmp_path / ".hermes" + hermes_home.mkdir() + # Write a UTF-8 .env containing an em dash (U+2014 = e2 80 94). The + # 0x94 byte is exactly the one the issue reporter hit: it's invalid + # as a GBK trailing byte in this position, so locale-default reads + # raise UnicodeDecodeError on Chinese Windows. + env_path = hermes_home / ".env" + env_path.write_text( + "OPENAI_API_KEY=sk-test # em-dash here — should not crash\n", + encoding="utf-8", + ) + + monkeypatch.setattr(doctor_mod, "HERMES_HOME", hermes_home) + + orig_read_text = pathlib.Path.read_text + + def gbk_like_read_text(self, encoding=None, errors=None, **kwargs): + # Simulate a GBK locale: refuse to decode this specific UTF-8 + # .env unless the caller pins encoding="utf-8". + if self == env_path and encoding != "utf-8": + raise UnicodeDecodeError( + "gbk", b"\x94", 0, 1, "illegal multibyte sequence" + ) + return orig_read_text(self, encoding=encoding, errors=errors, **kwargs) + + monkeypatch.setattr(pathlib.Path, "read_text", gbk_like_read_text) + + # Short-circuit the expensive tool-availability probe — we only + # need doctor to reach the .env read without crashing. + fake_model_tools = types.SimpleNamespace( + check_tool_availability=lambda *a, **kw: (_ for _ in ()).throw(SystemExit(0)), + TOOLSET_REQUIREMENTS={}, + ) + monkeypatch.setitem(sys.modules, "model_tools", fake_model_tools) + + # Run doctor. If the .env read still uses locale encoding, this + # raises UnicodeDecodeError and the test fails. + with pytest.raises(SystemExit): + doctor_mod.run_doctor(Namespace(fix=False)) + + class TestDoctorToolAvailabilityOverrides: def test_marks_honcho_available_when_configured(self, monkeypatch): monkeypatch.setattr(doctor, "_honcho_is_configured_for_doctor", lambda: True) From 98c98821ff1e3195dec55fde081fd59efdc5726b Mon Sep 17 00:00:00 2001 From: teknium1 <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 01:39:51 -0700 Subject: [PATCH 05/61] chore(release): map CoreyNoDream email for AUTHOR_MAP Follow-up for PR #18721 salvage. --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index 8e0afe4de3..c055e5783c 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -370,6 +370,7 @@ AUTHOR_MAP = { "xowiekk@gmail.com": "Xowiek", "1243352777@qq.com": "zons-zhaozhy", "e.silacandmr@gmail.com": "Es1la", + "h3057183414@gmail.com": "CoreyNoDream", # ── bulk addition: 75 emails resolved via API, PR salvage bodies, noreply # crossref, and GH contributor list matching (April 2026 audit) ── "1115117931@qq.com": "aaronagent", From 699b3679bcaf000165902f246b6f5a6b99133efd Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 01:49:55 -0700 Subject: [PATCH 06/61] fix(constants): warn once when get_hermes_home() falls back under an active profile (#18746) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When HERMES_HOME is unset but ~/.hermes/active_profile names a non-default profile, any data this process writes lands in the default profile — not the one the operator expects. Before this change the fallback was silent, so cross-profile contamination (#18594) was invisible until a user noticed their memory/state ended up in the wrong place. Now we emit a one-shot warning to stderr the first time this happens in a process. No raise — there are 30+ module-level callers of get_hermes_home() and raising from any of them would brick import. Behavior is otherwise unchanged; subprocess spawners (systemd template, kanban dispatcher, docker entrypoint) already propagate HERMES_HOME correctly. Bypasses logging.getLogger() because this runs before logging is configured in a significant fraction of callers (module import time). Refs #18594. Credit to @liuhao1024 for surfacing the silent-fallback case in PR #18600; we kept the diagnostic signal without the import-time raise. --- hermes_constants.py | 52 +++++++++- tests/test_hermes_home_profile_warning.py | 116 ++++++++++++++++++++++ 2 files changed, 167 insertions(+), 1 deletion(-) create mode 100644 tests/test_hermes_home_profile_warning.py diff --git a/hermes_constants.py b/hermes_constants.py index 35dbf86ab2..e63a4ec301 100644 --- a/hermes_constants.py +++ b/hermes_constants.py @@ -8,14 +8,64 @@ import os from pathlib import Path +_profile_fallback_warned: bool = False + + def get_hermes_home() -> Path: """Return the Hermes home directory (default: ~/.hermes). Reads HERMES_HOME env var, falls back to ~/.hermes. This is the single source of truth — all other copies should import this. + + When ``HERMES_HOME`` is unset but an ``active_profile`` file indicates + a non-default profile is active, logs a loud one-shot warning to + ``errors.log`` so cross-profile data corruption is diagnosable instead + of silent. Behavior is unchanged otherwise — we still return + ``~/.hermes`` — because raising here would brick 30+ module-level + callers that import this at load time. Subprocess spawners are + expected to propagate ``HERMES_HOME`` explicitly (see the systemd + template in ``hermes_cli/gateway.py`` and the kanban dispatcher in + ``hermes_cli/kanban_db.py``). See https://github.com/NousResearch/hermes-agent/issues/18594. """ val = os.environ.get("HERMES_HOME", "").strip() - return Path(val) if val else Path.home() / ".hermes" + if val: + return Path(val) + + # Guard: if a non-default profile is sticky-active, warn once that + # the fallback to the default profile is almost certainly wrong. + global _profile_fallback_warned + if not _profile_fallback_warned: + try: + # Inline the default-root resolution from get_default_hermes_root() + # to stay import-safe (this function is called from module scope + # in 30+ files; we cannot afford to trigger logging setup here). + active_path = (Path.home() / ".hermes" / "active_profile") + active = active_path.read_text().strip() if active_path.exists() else "" + except (UnicodeDecodeError, OSError): + active = "" + if active and active != "default": + _profile_fallback_warned = True + # Write directly to stderr. We intentionally do NOT route this + # through ``logging`` because (a) this function is called at + # module-import time from 30+ sites, often before logging is + # configured, and (b) root-logger propagation would double-emit + # on consoles where a StreamHandler is already attached. + import sys + msg = ( + f"[HERMES_HOME fallback] HERMES_HOME is unset but active " + f"profile is {active!r}. Falling back to ~/.hermes, which " + f"is the DEFAULT profile — not {active!r}. Any data this " + f"process writes will land in the wrong profile. The " + f"subprocess spawner should pass HERMES_HOME explicitly " + f"(see issue #18594)." + ) + try: + sys.stderr.write(msg + "\n") + sys.stderr.flush() + except Exception: + pass + + return Path.home() / ".hermes" def get_default_hermes_root() -> Path: diff --git a/tests/test_hermes_home_profile_warning.py b/tests/test_hermes_home_profile_warning.py new file mode 100644 index 0000000000..ce51a01aa8 --- /dev/null +++ b/tests/test_hermes_home_profile_warning.py @@ -0,0 +1,116 @@ +"""Tests for get_hermes_home() profile-mode fallback warning. + +Regression test for https://github.com/NousResearch/hermes-agent/issues/18594. + +When HERMES_HOME is unset but an active_profile file indicates a non-default +profile is active, get_hermes_home() should: + 1. STILL return ~/.hermes (raising would brick 30+ module-level callers) + 2. Emit a loud one-shot warning to stderr so operators can diagnose + cross-profile data contamination after the fact. + +The warning goes to stderr directly (not through logging) because this +function is called at module-import time from 30+ sites, often before the +logging subsystem has been configured. +""" + +from pathlib import Path + +import pytest + + +@pytest.fixture +def fresh_constants(monkeypatch, tmp_path): + """Import hermes_constants fresh and reset the one-shot warn flag.""" + import importlib + import hermes_constants + importlib.reload(hermes_constants) + monkeypatch.setattr(Path, "home", lambda: tmp_path) + monkeypatch.delenv("HERMES_HOME", raising=False) + return hermes_constants + + +class TestGetHermesHomeProfileWarning: + def test_classic_mode_no_active_profile_no_warning( + self, fresh_constants, tmp_path, capsys + ): + """Classic mode: no active_profile file → silent, returns ~/.hermes.""" + result = fresh_constants.get_hermes_home() + assert result == tmp_path / ".hermes" + assert "HERMES_HOME fallback" not in capsys.readouterr().err + + def test_default_active_profile_no_warning( + self, fresh_constants, tmp_path, capsys + ): + """active_profile=default → still no warning, returns ~/.hermes.""" + hermes_dir = tmp_path / ".hermes" + hermes_dir.mkdir() + (hermes_dir / "active_profile").write_text("default\n") + result = fresh_constants.get_hermes_home() + assert result == tmp_path / ".hermes" + assert "HERMES_HOME fallback" not in capsys.readouterr().err + + def test_named_profile_unset_home_warns_once( + self, fresh_constants, tmp_path, capsys + ): + """active_profile=coder + HERMES_HOME unset → warn loudly, still return fallback.""" + hermes_dir = tmp_path / ".hermes" + hermes_dir.mkdir() + (hermes_dir / "active_profile").write_text("coder\n") + + result = fresh_constants.get_hermes_home() + + # 1. Still returns the fallback — no import-time crash + assert result == tmp_path / ".hermes" + # 2. Stderr got the warning exactly once + err = capsys.readouterr().err + assert err.count("HERMES_HOME fallback") == 1 + assert "'coder'" in err + assert "#18594" in err + + # 3. One-shot: second and third calls don't re-warn + fresh_constants.get_hermes_home() + fresh_constants.get_hermes_home() + err2 = capsys.readouterr().err + assert "HERMES_HOME fallback" not in err2 + + def test_hermes_home_set_suppresses_warning( + self, fresh_constants, tmp_path, capsys, monkeypatch + ): + """Even if active_profile is 'coder', setting HERMES_HOME suppresses warning.""" + profile_dir = tmp_path / ".hermes" / "profiles" / "coder" + profile_dir.mkdir(parents=True) + (tmp_path / ".hermes" / "active_profile").write_text("coder\n") + monkeypatch.setenv("HERMES_HOME", str(profile_dir)) + + result = fresh_constants.get_hermes_home() + + assert result == profile_dir + assert "HERMES_HOME fallback" not in capsys.readouterr().err + + def test_unreadable_active_profile_no_crash( + self, fresh_constants, tmp_path, capsys + ): + """active_profile that can't be decoded → fall through silently.""" + hermes_dir = tmp_path / ".hermes" + hermes_dir.mkdir() + # Write bytes that aren't valid utf-8 + (hermes_dir / "active_profile").write_bytes(b"\xff\xfe\x00\x00") + + result = fresh_constants.get_hermes_home() + + assert result == tmp_path / ".hermes" + # Shouldn't crash; shouldn't warn either (can't tell what profile was intended) + assert "HERMES_HOME fallback" not in capsys.readouterr().err + + def test_empty_active_profile_no_warning( + self, fresh_constants, tmp_path, capsys + ): + """Empty active_profile file → treated as default, no warning.""" + hermes_dir = tmp_path / ".hermes" + hermes_dir.mkdir() + (hermes_dir / "active_profile").write_text("") + + result = fresh_constants.get_hermes_home() + + assert result == tmp_path / ".hermes" + assert "HERMES_HOME fallback" not in capsys.readouterr().err From 9bf260472bca9f8097bf442f5c5e6dd1984dd4c3 Mon Sep 17 00:00:00 2001 From: liuhao1024 Date: Sat, 2 May 2026 03:33:13 +0800 Subject: [PATCH 07/61] fix(tools): deduplicate tool names at API boundary for Vertex/Azure/Bedrock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Providers like Google Vertex, Azure, and Amazon Bedrock reject API requests with duplicate tool names (HTTP 400: 'Tool names must be unique'). The upstream injection paths in run_agent.py already dedup after PR #17335, but two API-boundary functions pass tools through without checking: - agent/auxiliary_client.py: _build_call_kwargs() (all non-Anthropic providers in chat_completions mode) - agent/anthropic_adapter.py: convert_tools_to_anthropic() (Anthropic Messages API path) Add defensive dedup guards at both sites. Duplicates are dropped with a warning log, converting a hard 400 failure into a recoverable condition. This is intentionally conservative — the root-cause dedup in run_agent.py is the primary defense; these guards add resilience against future injection-path regressions. Includes 8 new tests covering unique passthrough, duplicate removal, empty/None edge cases. Closes #18478 --- agent/anthropic_adapter.py | 16 ++++++- agent/auxiliary_client.py | 21 ++++++++- tests/agent/test_anthropic_adapter.py | 52 +++++++++++++++++++++ tests/agent/test_auxiliary_client.py | 66 +++++++++++++++++++++++++++ 4 files changed, 153 insertions(+), 2 deletions(-) diff --git a/agent/anthropic_adapter.py b/agent/anthropic_adapter.py index efee8f6bf1..8d8334acd1 100644 --- a/agent/anthropic_adapter.py +++ b/agent/anthropic_adapter.py @@ -1241,10 +1241,24 @@ def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]: if not tools: return [] result = [] + seen_names: set = set() for t in tools: fn = t.get("function", {}) + name = fn.get("name", "") + # Defensive dedup: Anthropic rejects requests with duplicate tool + # names. Upstream injection paths already dedup, but this guard + # converts a hard API failure into a warning. See: #18478 + if name and name in seen_names: + logger.warning( + "convert_tools_to_anthropic: duplicate tool name '%s' " + "— dropping second occurrence", + name, + ) + continue + if name: + seen_names.add(name) result.append({ - "name": fn.get("name", ""), + "name": name, "description": fn.get("description", ""), "input_schema": _normalize_tool_input_schema( fn.get("parameters", {"type": "object", "properties": {}}) diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py index df3fdeccc6..27d4c7ed34 100644 --- a/agent/auxiliary_client.py +++ b/agent/auxiliary_client.py @@ -3237,7 +3237,26 @@ def _build_call_kwargs( kwargs["max_tokens"] = max_tokens if tools: - kwargs["tools"] = tools + # Defensive dedup: providers like Google Vertex, Azure, and Bedrock + # reject requests with duplicate tool names (HTTP 400). The upstream + # injection paths (run_agent.py) already dedup, but this guard + # converts a hard API failure into a warning if an upstream regression + # reintroduces duplicates. See: #18478 + _seen: set = set() + _deduped: list = [] + for _t in tools: + _tname = (_t.get("function") or {}).get("name", "") + if _tname and _tname in _seen: + logger.warning( + "_build_call_kwargs: duplicate tool name '%s' removed " + "(provider=%s model=%s)", + _tname, provider, model, + ) + continue + if _tname: + _seen.add(_tname) + _deduped.append(_t) + kwargs["tools"] = _deduped # Provider-specific extra_body merged_extra = dict(extra_body or {}) diff --git a/tests/agent/test_anthropic_adapter.py b/tests/agent/test_anthropic_adapter.py index 8105363b2e..2e676aef62 100644 --- a/tests/agent/test_anthropic_adapter.py +++ b/tests/agent/test_anthropic_adapter.py @@ -1836,3 +1836,55 @@ class TestResolveMessagesMaxTokens: result = _resolve_anthropic_messages_max_tokens(0.5, "claude-opus-4-6") assert result > 0 assert result != 0 + + +# --------------------------------------------------------------------------- +# convert_tools_to_anthropic — tool dedup at API boundary +# --------------------------------------------------------------------------- + +class TestConvertToolsToAnthropicDedup: + """convert_tools_to_anthropic must deduplicate tool names. + + Anthropic rejects requests with duplicate tool names. This guard converts + a hard failure into a warning log. See: + https://github.com/NousResearch/hermes-agent/issues/18478 + """ + + def _make_openai_tool(self, name: str) -> dict: + return { + "type": "function", + "function": { + "name": name, + "description": f"Tool {name}", + "parameters": {"type": "object", "properties": {}}, + }, + } + + def test_unique_tools_pass_through(self): + tools = [self._make_openai_tool("alpha"), self._make_openai_tool("beta")] + result = convert_tools_to_anthropic(tools) + assert len(result) == 2 + names = [t["name"] for t in result] + assert names == ["alpha", "beta"] + + def test_duplicate_tool_names_are_deduplicated(self): + """RED test — must fail until dedup guard is added.""" + tools = [ + self._make_openai_tool("lcm_grep"), + self._make_openai_tool("lcm_describe"), + self._make_openai_tool("lcm_grep"), # duplicate + self._make_openai_tool("lcm_expand"), + self._make_openai_tool("lcm_describe"), # duplicate + ] + result = convert_tools_to_anthropic(tools) + names = [t["name"] for t in result] + assert len(names) == len(set(names)), ( + f"Duplicate tool names found: {names}" + ) + assert len(result) == 3 # lcm_grep, lcm_describe, lcm_expand + + def test_empty_tools_returns_empty(self): + assert convert_tools_to_anthropic([]) == [] + + def test_none_tools_returns_empty(self): + assert convert_tools_to_anthropic(None) == [] diff --git a/tests/agent/test_auxiliary_client.py b/tests/agent/test_auxiliary_client.py index 32290b0612..bc74fc7306 100644 --- a/tests/agent/test_auxiliary_client.py +++ b/tests/agent/test_auxiliary_client.py @@ -16,6 +16,7 @@ from agent.auxiliary_client import ( auxiliary_max_tokens_param, call_llm, async_call_llm, + _build_call_kwargs, _read_codex_access_token, _get_provider_chain, _is_payment_error, @@ -1752,3 +1753,68 @@ class TestVisionAutoSkipsKimiCoding: "kimi-coding", "kimi-coding-cn", }) + + +# --------------------------------------------------------------------------- +# _build_call_kwargs — tool dedup at API boundary +# --------------------------------------------------------------------------- + +class TestBuildCallKwargsToolDedup: + """_build_call_kwargs must deduplicate tool names before passing to API. + + Providers like Google Vertex, Azure, and Bedrock reject requests with + duplicate tool names (HTTP 400). This guard converts a hard failure into + a warning log so agent turns succeed even if an upstream injection path + regresses. See: https://github.com/NousResearch/hermes-agent/issues/18478 + """ + + def _make_tool(self, name: str) -> dict: + return { + "type": "function", + "function": { + "name": name, + "description": f"Tool {name}", + "parameters": {"type": "object", "properties": {}}, + }, + } + + def test_unique_tools_pass_through_unchanged(self): + tools = [self._make_tool("alpha"), self._make_tool("beta")] + kwargs = _build_call_kwargs( + provider="openai", model="gpt-4o", messages=[], tools=tools, + ) + assert len(kwargs["tools"]) == 2 + names = [t["function"]["name"] for t in kwargs["tools"]] + assert names == ["alpha", "beta"] + + def test_duplicate_tool_names_are_deduplicated(self): + """RED test — must fail until dedup guard is added.""" + tools = [ + self._make_tool("lcm_grep"), + self._make_tool("lcm_describe"), + self._make_tool("lcm_grep"), # duplicate + self._make_tool("lcm_expand"), + self._make_tool("lcm_describe"), # duplicate + ] + kwargs = _build_call_kwargs( + provider="google", model="gemini-2.5-pro", messages=[], tools=tools, + ) + result_tools = kwargs["tools"] + names = [t["function"]["name"] for t in result_tools] + # Must be deduplicated — no repeated names + assert len(names) == len(set(names)), ( + f"Duplicate tool names found: {names}" + ) + assert len(result_tools) == 3 # lcm_grep, lcm_describe, lcm_expand + + def test_empty_tools_unchanged(self): + kwargs = _build_call_kwargs( + provider="openai", model="gpt-4o", messages=[], tools=[], + ) + assert kwargs.get("tools") == [] or "tools" not in kwargs + + def test_none_tools_unchanged(self): + kwargs = _build_call_kwargs( + provider="openai", model="gpt-4o", messages=[], tools=None, + ) + assert "tools" not in kwargs From 2470434d60991a46e0fd4733e4a69722acb97ebe Mon Sep 17 00:00:00 2001 From: Jacob Lizarraga Date: Thu, 30 Apr 2026 14:12:54 -0700 Subject: [PATCH 08/61] fix(telegram): probe polling liveness after reconnect to detect wedged Updater MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After a transient Telegram 502, _handle_polling_network_error's stop()+start_polling() cycle can leave PTB's Updater with `running=True` but a wedged consumer task that never makes progress. No error_callback fires in that state, so the reconnect ladder never advances past attempt 1, the MAX_NETWORK_RETRIES fatal-error path is never reached, and the gateway sits silent indefinitely. Schedule a heartbeat probe (60s after a successful reconnect) that verifies Updater.running is still True and bot.get_me() responds within a tight asyncio.wait_for timeout. Either failure feeds back into the reconnect ladder so the existing escalation path fires. No PTB-internal coupling, no Application rebuild — minimal additive defense inside the existing reconnect abstraction. Tests cover healthy / Updater non-running / probe timeout / probe network error / already-fatal cases, plus an integration check that the probe is actually scheduled after a successful start_polling(). Closes the silent-wedge case observed in the wild after a transient Telegram 502; existing reconnect tests updated to mock bot.get_me() now that the success path schedules a heartbeat probe. --- gateway/platforms/telegram.py | 55 +++++ .../test_telegram_network_reconnect.py | 189 ++++++++++++++++++ 2 files changed, 244 insertions(+) diff --git a/gateway/platforms/telegram.py b/gateway/platforms/telegram.py index cbee25393e..188038a1ad 100644 --- a/gateway/platforms/telegram.py +++ b/gateway/platforms/telegram.py @@ -512,6 +512,17 @@ class TelegramAdapter(BasePlatformAdapter): self.name, attempt, ) self._polling_network_error_count = 0 + # start_polling() returning is necessary but not sufficient: + # PTB's Updater can be left in a state where `running` is True + # but the underlying long-poll task is wedged on a stale httpx + # connection and never makes progress. No error_callback fires + # in that state, so the reconnect ladder won't advance on its + # own. Schedule a deferred probe to detect the wedge and + # re-enter the ladder if needed. + if not self.has_fatal_error: + probe = asyncio.ensure_future(self._verify_polling_after_reconnect()) + self._background_tasks.add(probe) + probe.add_done_callback(self._background_tasks.discard) except Exception as retry_err: logger.warning("[%s] Telegram polling reconnect failed: %s", self.name, retry_err) # start_polling failed — polling is dead and no further error @@ -523,6 +534,50 @@ class TelegramAdapter(BasePlatformAdapter): self._background_tasks.add(task) task.add_done_callback(self._background_tasks.discard) + async def _verify_polling_after_reconnect(self) -> None: + """Heartbeat probe scheduled after a successful reconnect. + + PTB's Updater can survive a botched stop()+start_polling() cycle + with `running=True` but a wedged consumer task. No error callback + fires, so the reconnect ladder doesn't advance on its own. This + probe detects the wedge by: + + 1. Sleeping HEARTBEAT_PROBE_DELAY so a healthy long-poll has time + to complete at least one cycle. + 2. Verifying `Updater.running` is still True. + 3. Probing the bot endpoint with a tight asyncio timeout. A + wedged httpx pool fails this probe; a healthy one returns + well under the timeout. + + On any failure, re-enter the reconnect ladder so the existing + MAX_NETWORK_RETRIES path can ultimately escalate to fatal-error. + """ + HEARTBEAT_PROBE_DELAY = 60 + PROBE_TIMEOUT = 10 + + await asyncio.sleep(HEARTBEAT_PROBE_DELAY) + + if self.has_fatal_error: + return + if not (self._app and self._app.updater and self._app.updater.running): + logger.warning( + "[%s] Updater not running %ds after reconnect — treating as wedged", + self.name, HEARTBEAT_PROBE_DELAY, + ) + await self._handle_polling_network_error( + RuntimeError("Updater not running after reconnect heartbeat") + ) + return + + try: + await asyncio.wait_for(self._app.bot.get_me(), PROBE_TIMEOUT) + except Exception as probe_err: + logger.warning( + "[%s] Polling heartbeat probe failed %ds after reconnect: %s", + self.name, HEARTBEAT_PROBE_DELAY, probe_err, + ) + await self._handle_polling_network_error(probe_err) + async def _handle_polling_conflict(self, error: Exception) -> None: if self.has_fatal_error and self.fatal_error_code == "telegram_polling_conflict": return diff --git a/tests/gateway/test_telegram_network_reconnect.py b/tests/gateway/test_telegram_network_reconnect.py index 532639b2db..81b7bed12e 100644 --- a/tests/gateway/test_telegram_network_reconnect.py +++ b/tests/gateway/test_telegram_network_reconnect.py @@ -132,6 +132,7 @@ async def test_reconnect_success_resets_error_count(): mock_app = MagicMock() mock_app.updater = mock_updater + mock_app.bot.get_me = AsyncMock(return_value=MagicMock()) # heartbeat probe path adapter._app = mock_app with patch("asyncio.sleep", new_callable=AsyncMock): @@ -139,6 +140,15 @@ async def test_reconnect_success_resets_error_count(): assert adapter._polling_network_error_count == 0 + # Clean up the heartbeat-probe task scheduled after a successful reconnect. + pending = [t for t in adapter._background_tasks if not t.done()] + for t in pending: + t.cancel() + try: + await t + except (asyncio.CancelledError, Exception): + pass + @pytest.mark.asyncio async def test_reconnect_triggers_fatal_after_max_retries(): @@ -284,3 +294,182 @@ async def test_drain_helper_noop_without_app(): adapter._app = None # Should not raise await adapter._drain_polling_connections() + + +# ── Heartbeat probe ────────────────────────────────────────────────────── + + +@pytest.mark.asyncio +async def test_heartbeat_probe_no_op_when_polling_healthy(): + """ + Probe scheduled after a successful reconnect: Updater.running=True and + bot.get_me() returns quickly → recovery confirmed, no further action. + """ + adapter = _make_adapter() + + mock_updater = MagicMock() + mock_updater.running = True + + mock_app = MagicMock() + mock_app.updater = mock_updater + mock_app.bot.get_me = AsyncMock(return_value=MagicMock()) + adapter._app = mock_app + + adapter._handle_polling_network_error = AsyncMock() + + with patch("asyncio.sleep", new_callable=AsyncMock): + await adapter._verify_polling_after_reconnect() + + mock_app.bot.get_me.assert_awaited_once() + adapter._handle_polling_network_error.assert_not_awaited() + + +@pytest.mark.asyncio +async def test_heartbeat_probe_reenters_ladder_when_updater_not_running(): + """ + If Updater.running has flipped to False by the heartbeat delay, treat + as wedged: re-enter the reconnect ladder. + """ + adapter = _make_adapter() + + mock_updater = MagicMock() + mock_updater.running = False + + mock_app = MagicMock() + mock_app.updater = mock_updater + mock_app.bot.get_me = AsyncMock() + adapter._app = mock_app + + adapter._handle_polling_network_error = AsyncMock() + + with patch("asyncio.sleep", new_callable=AsyncMock): + await adapter._verify_polling_after_reconnect() + + mock_app.bot.get_me.assert_not_called() + adapter._handle_polling_network_error.assert_awaited_once() + err = adapter._handle_polling_network_error.await_args.args[0] + assert isinstance(err, RuntimeError) + assert "not running" in str(err).lower() + + +@pytest.mark.asyncio +async def test_heartbeat_probe_reenters_ladder_when_get_me_times_out(): + """ + If bot.get_me() hangs longer than PROBE_TIMEOUT, treat as wedged. + Simulates the connection-pool wedge that motivated this fix. + """ + adapter = _make_adapter() + + mock_updater = MagicMock() + mock_updater.running = True + + async def hang_forever(*args, **kwargs): + await asyncio.sleep(3600) + + mock_app = MagicMock() + mock_app.updater = mock_updater + mock_app.bot.get_me = AsyncMock(side_effect=hang_forever) + adapter._app = mock_app + + adapter._handle_polling_network_error = AsyncMock() + + async def fast_wait_for(coro, timeout): + if asyncio.iscoroutine(coro): + coro.close() + raise asyncio.TimeoutError() + + with patch("asyncio.sleep", new_callable=AsyncMock): + with patch("gateway.platforms.telegram.asyncio.wait_for", new=fast_wait_for): + await adapter._verify_polling_after_reconnect() + + adapter._handle_polling_network_error.assert_awaited_once() + + +@pytest.mark.asyncio +async def test_heartbeat_probe_reenters_ladder_on_get_me_network_error(): + """ + Any exception raised by bot.get_me() (NetworkError, ConnectionError, etc.) + should re-enter the reconnect ladder with the original exception. + """ + adapter = _make_adapter() + + mock_updater = MagicMock() + mock_updater.running = True + + mock_app = MagicMock() + mock_app.updater = mock_updater + mock_app.bot.get_me = AsyncMock(side_effect=ConnectionError("pool wedged")) + adapter._app = mock_app + + adapter._handle_polling_network_error = AsyncMock() + + with patch("asyncio.sleep", new_callable=AsyncMock): + await adapter._verify_polling_after_reconnect() + + adapter._handle_polling_network_error.assert_awaited_once() + assert isinstance( + adapter._handle_polling_network_error.await_args.args[0], ConnectionError + ) + + +@pytest.mark.asyncio +async def test_heartbeat_probe_skips_when_already_fatal(): + """ + If the adapter is already in fatal-error state by the time the probe + delay elapses, the probe should bail without further action. + """ + adapter = _make_adapter() + adapter._set_fatal_error("telegram_polling_conflict", "already fatal", retryable=False) + + mock_app = MagicMock() + mock_app.bot.get_me = AsyncMock() + adapter._app = mock_app + + adapter._handle_polling_network_error = AsyncMock() + + with patch("asyncio.sleep", new_callable=AsyncMock): + await adapter._verify_polling_after_reconnect() + + mock_app.bot.get_me.assert_not_called() + adapter._handle_polling_network_error.assert_not_awaited() + + +@pytest.mark.asyncio +async def test_reconnect_schedules_heartbeat_probe_on_success(): + """ + After a successful start_polling() in the reconnect path, a probe task + must be added to _background_tasks. Without it, a wedged Updater would + sit silent indefinitely with no further error_callback to advance the + reconnect ladder. + """ + adapter = _make_adapter() + adapter._polling_network_error_count = 1 + + mock_updater = MagicMock() + mock_updater.running = True + mock_updater.stop = AsyncMock() + mock_updater.start_polling = AsyncMock() # succeeds + + mock_app = MagicMock() + mock_app.updater = mock_updater + mock_app.bot.get_me = AsyncMock(return_value=MagicMock()) + adapter._app = mock_app + + initial_count = len(adapter._background_tasks) + + with patch("asyncio.sleep", new_callable=AsyncMock): + await adapter._handle_polling_network_error(Exception("Bad Gateway")) + + assert len(adapter._background_tasks) > initial_count, ( + "Expected a heartbeat probe task to be scheduled after a successful " + "reconnect's start_polling()" + ) + + # Clean up. + pending = [t for t in adapter._background_tasks if not t.done()] + for t in pending: + t.cancel() + try: + await t + except (asyncio.CancelledError, Exception): + pass From 8825e9044c2657b726f589f4287cf827f77ff44e Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:00:06 -0700 Subject: [PATCH 09/61] fix(discord): complete #18741 for /skill autocomplete and drop legacy 25x25 caps (#18745) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ``discord_skill_commands_by_category`` was lagging the flat ``discord_skill_commands`` collector on two counts. Both were actively dropping skills from Discord's ``/skill`` autocomplete dropdown. 1. External-dir skills were filtered out. #18741 widened the flat collector to accept ``SKILLS_DIR + skills.external_dirs`` but left this sibling collector — the one ``_register_skill_group`` actually uses on Discord — still matching ``SKILLS_DIR`` only. External skills were visible in ``hermes skills list`` and the agent's ``/skill-name`` dispatch but silently absent from Discord's ``/skill`` picker. Widen the accepted roots to match, and derive categories from whichever root the skill lives under so ``/mlops/foo/SKILL.md`` still lands in the ``mlops`` group. 2. 25-group × 25-subcommand caps were still applied. PR #11580 refactored ``/skill`` to a flat autocomplete (whose options Discord fetches dynamically — no per-command payload concern) and its docstring promises "no hidden skills." The collector kept the old nested-layout caps anyway, silently dropping anything past the 25th alphabetical category. On installs with 29 category dirs today (real example: tail categories ``social-media``, ``software-development``, ``yuanbao`` going missing) this was biting immediately. Remove the caps; ``hidden`` now reports only 32-char name-clamp collisions against reserved names. Tests: guard both behaviors. ``test_no_legacy_25x25_cap`` builds 30 categories × 30 skills each and asserts all 900 are returned. ``test_external_dirs_skills_included`` monkeypatches ``get_external_skills_dirs`` and asserts an external-dir skill makes it into the result grouped under its own top-level directory. --- hermes_cli/commands.py | 101 +++++++++++++++----------- tests/hermes_cli/test_commands.py | 113 ++++++++++++++++++++++++++++++ 2 files changed, 174 insertions(+), 40 deletions(-) diff --git a/hermes_cli/commands.py b/hermes_cli/commands.py index 2175661ba9..ba05602241 100644 --- a/hermes_cli/commands.py +++ b/hermes_cli/commands.py @@ -734,24 +734,40 @@ def discord_skill_commands( def discord_skill_commands_by_category( reserved_names: set[str], ) -> tuple[dict[str, list[tuple[str, str, str]]], list[tuple[str, str, str]], int]: - """Return skill entries organized by category for Discord ``/skill`` subcommand groups. + """Return skill entries organized by category for Discord ``/skill`` autocomplete. - Skills whose directory is nested at least 2 levels under ``SKILLS_DIR`` + Skills whose directory is nested at least 2 levels under a scan root (e.g. ``creative/ascii-art/SKILL.md``) are grouped by their top-level category. Root-level skills (e.g. ``dogfood/SKILL.md``) are returned as - *uncategorized* — the caller should register them as direct subcommands - of the ``/skill`` group. + *uncategorized*. - The same filtering as :func:`discord_skill_commands` is applied: hub - skills excluded, per-platform disabled excluded, names clamped. + Scan roots include the local ``SKILLS_DIR`` **and** any configured + ``skills.external_dirs`` — matching the widened filter applied to the + flat ``discord_skill_commands()`` collector in #18741. Without this + parity, external-dir skills are visible via ``hermes skills list`` and + the agent's ``/skill-name`` dispatch but silently absent from Discord's + ``/skill`` autocomplete. + + Filtering mirrors :func:`discord_skill_commands`: hub skills excluded, + per-platform disabled excluded, names clamped to 32 chars, descriptions + clamped to 100 chars. + + The legacy 25-group × 25-subcommand caps (from the old nested + ``/skill `` layout) are **not** applied — the live caller + (``_register_skill_group`` in ``gateway/platforms/discord.py``, refactored + in PR #11580) flattens these results and feeds them into a single + autocomplete callback, which scales to thousands of entries without any + per-command payload concerns. ``hidden_count`` is retained in the return + tuple for backward compatibility and still reports skills dropped for + other reasons (32-char clamp collision vs a reserved name). Returns: ``(categories, uncategorized, hidden_count)`` - *categories*: ``{category_name: [(name, description, cmd_key), ...]}`` - *uncategorized*: ``[(name, description, cmd_key), ...]`` - - *hidden_count*: skills dropped due to Discord group limits - (25 subcommand groups, 25 subcommands per group) + - *hidden_count*: skills dropped due to name clamp collisions + against already-registered command names. """ from pathlib import Path as _P @@ -770,9 +786,23 @@ def discord_skill_commands_by_category( try: from agent.skill_commands import get_skill_commands + from agent.skill_utils import get_external_skills_dirs from tools.skills_tool import SKILLS_DIR + _skills_dir = SKILLS_DIR.resolve() _hub_dir = (SKILLS_DIR / ".hub").resolve() + # Build list of (resolved_root, is_local) tuples. Each external dir + # becomes its own scan root for category derivation — a skill at + # ``/mlops/foo/SKILL.md`` is still categorized as "mlops". + _scan_roots: list[_P] = [_skills_dir] + try: + for ext in get_external_skills_dirs(): + try: + _scan_roots.append(_P(ext).resolve()) + except Exception: + continue + except Exception: + pass skill_cmds = get_skill_commands() for cmd_key in sorted(skill_cmds): @@ -781,20 +811,35 @@ def discord_skill_commands_by_category( if not skill_path: continue sp = _P(skill_path).resolve() - # Skip skills outside SKILLS_DIR or from the hub - if not str(sp).startswith(str(_skills_dir)): - continue + # Hub skills are loaded via the skill hub, not surfaced as + # slash commands. if str(sp).startswith(str(_hub_dir)): continue + # Accept skill if it lives under any scan root; record the + # matching root so we can derive the category correctly. + matched_root: _P | None = None + for root in _scan_roots: + try: + sp.relative_to(root) + except ValueError: + continue + matched_root = root + break + if matched_root is None: + continue skill_name = info.get("name", "") if skill_name in _platform_disabled: continue raw_name = cmd_key.lstrip("/") - # Clamp to 32 chars (Discord limit) + # Clamp to 32 chars (Discord per-command name limit) discord_name = raw_name[:32] if discord_name in _names_used: + # Collision with a previously-registered name — drop and + # count. Almost always caused by a reserved built-in name, + # not by another skill (frontmatter names are unique). + hidden += 1 continue _names_used.add(discord_name) @@ -802,12 +847,9 @@ def discord_skill_commands_by_category( if len(desc) > 100: desc = desc[:97] + "..." - # Determine category from the relative path within SKILLS_DIR. - # e.g. creative/ascii-art/SKILL.md → parts = ("creative", "ascii-art") - try: - rel = sp.parent.relative_to(_skills_dir) - except ValueError: - continue + # Determine category from the relative path within the matched + # scan root. e.g. creative/ascii-art/SKILL.md → ("creative", ...) + rel = sp.parent.relative_to(matched_root) parts = rel.parts if len(parts) >= 2: cat = parts[0] @@ -817,28 +859,7 @@ def discord_skill_commands_by_category( except Exception: pass - # Enforce Discord limits: 25 subcommand groups, 25 subcommands each ------ - _MAX_GROUPS = 25 - _MAX_PER_GROUP = 25 - - trimmed_categories: dict[str, list[tuple[str, str, str]]] = {} - group_count = 0 - for cat in sorted(categories): - if group_count >= _MAX_GROUPS: - hidden += len(categories[cat]) - continue - entries = categories[cat][:_MAX_PER_GROUP] - hidden += max(0, len(categories[cat]) - _MAX_PER_GROUP) - trimmed_categories[cat] = entries - group_count += 1 - - # Uncategorized skills also count against the 25 top-level limit - remaining_slots = _MAX_GROUPS - group_count - if len(uncategorized) > remaining_slots: - hidden += len(uncategorized) - remaining_slots - uncategorized = uncategorized[:remaining_slots] - - return trimmed_categories, uncategorized, hidden + return categories, uncategorized, hidden # --------------------------------------------------------------------------- diff --git a/tests/hermes_cli/test_commands.py b/tests/hermes_cli/test_commands.py index 296143a032..7c19730d9e 100644 --- a/tests/hermes_cli/test_commands.py +++ b/tests/hermes_cli/test_commands.py @@ -1420,6 +1420,119 @@ class TestDiscordSkillCommandsByCategory: assert "vllm" in names assert len(uncategorized) == 0 + def test_no_legacy_25x25_cap(self, tmp_path, monkeypatch): + """The old nested-layout caps (25 groups × 25 skills/group) are gone. + + The live caller flattens categories into a single autocomplete list, + which Discord fetches dynamically — the per-command 8KB payload + concern from the old nested layout (#11321, #10259) no longer applies. + Guards against accidentally re-introducing the caps, which would + silently drop skills in the 26th+ alphabetical category (the exact + failure mode users were hitting with 29 category dirs on real + installs). + """ + from unittest.mock import patch + + fake_skills_dir = str(tmp_path / "skills") + + # Build 30 categories (> old _MAX_GROUPS=25) each with 30 skills + # (> old _MAX_PER_GROUP=25). + fake_cmds = {} + for c in range(30): + cat = f"cat{c:02d}" # cat00, cat01, ..., cat29 — 30 categories + for s in range(30): + name = f"skill-{c:02d}-{s:02d}" + skill_subdir = tmp_path / "skills" / cat / name + skill_subdir.mkdir(parents=True, exist_ok=True) + (skill_subdir / "SKILL.md").write_text("---\nname: x\n---\n") + fake_cmds[f"/{name}"] = { + "name": name, + "description": f"Category {cat} skill {s}", + "skill_md_path": f"{fake_skills_dir}/{cat}/{name}/SKILL.md", + } + + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + with ( + patch("agent.skill_commands.get_skill_commands", return_value=fake_cmds), + patch("tools.skills_tool.SKILLS_DIR", tmp_path / "skills"), + ): + categories, uncategorized, hidden = discord_skill_commands_by_category( + reserved_names=set(), + ) + + # Every category should be present — no 25-group cap + assert len(categories) == 30, ( + f"expected all 30 categories, got {len(categories)} " + f"(cap from old nested layout must be removed)" + ) + # Every skill in every category must be present — no 25-per-group cap + for cat_name, entries in categories.items(): + assert len(entries) == 30, ( + f"category {cat_name}: expected 30 skills, got {len(entries)} " + f"(cap from old nested layout must be removed)" + ) + # Nothing should be reported hidden for the cap reason (the only + # legitimate hidden reason now is name clamp collisions, which + # don't happen here since all names are unique). + assert hidden == 0 + + def test_external_dirs_skills_included(self, tmp_path, monkeypatch): + """Skills in ``skills.external_dirs`` must appear in /skill autocomplete. + + #18741 fixed this for the flat ``discord_skill_commands`` collector + but left ``discord_skill_commands_by_category`` (the live caller for + Discord's ``/skill`` command) still filtering by + ``SKILLS_DIR`` prefix only. Regression guard that both collectors + now accept external-dir skills. + """ + from unittest.mock import patch + + local_skills_dir = tmp_path / "local-skills" + external_dir = tmp_path / "external-skills" + + (local_skills_dir / "creative" / "local-skill").mkdir(parents=True) + (local_skills_dir / "creative" / "local-skill" / "SKILL.md").write_text("") + + (external_dir / "mlops" / "external-skill").mkdir(parents=True) + (external_dir / "mlops" / "external-skill" / "SKILL.md").write_text("") + + fake_cmds = { + "/local-skill": { + "name": "local-skill", + "description": "Local", + "skill_md_path": str(local_skills_dir / "creative" / "local-skill" / "SKILL.md"), + }, + "/external-skill": { + "name": "external-skill", + "description": "External", + "skill_md_path": str(external_dir / "mlops" / "external-skill" / "SKILL.md"), + }, + } + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + with ( + patch("agent.skill_commands.get_skill_commands", return_value=fake_cmds), + patch("tools.skills_tool.SKILLS_DIR", local_skills_dir), + patch( + "agent.skill_utils.get_external_skills_dirs", + return_value=[external_dir], + ), + ): + categories, uncategorized, hidden = discord_skill_commands_by_category( + reserved_names=set(), + ) + + # Local skill → grouped under "creative" + assert "creative" in categories + assert any(n == "local-skill" for n, _d, _k in categories["creative"]) + # External skill → grouped under its own top-level dir "mlops" + assert "mlops" in categories, ( + "external-dir skills must be included — the old SKILLS_DIR-only " + "prefix check was broken for by_category (completes #18741)" + ) + assert any(n == "external-skill" for n, _d, _k in categories["mlops"]) + assert uncategorized == [] + assert hidden == 0 + # --------------------------------------------------------------------------- # Plugin slash command integration From 6ec74aec0705df82475d402e761afc6a50c29ad1 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:00:09 -0700 Subject: [PATCH 10/61] fix(gateway): match disabled/optional skills by frontmatter slug, not dir name (#18753) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit _check_unavailable_skill is meant to turn a typed "/foo" command that doesn't resolve into a specific hint — "disabled, enable with hermes skills config" or "available but not installed, install with hermes skills install …" — instead of the generic "unknown command" reply. It was doing the match with `skill_md.parent.name.lower().replace("_", "-")`, comparing that to the typed command. For every skill whose directory name drifted from its declared frontmatter `name:`, that comparison failed and the user got the unhelpful generic path. On a standard install today 19 skills have this drift, e.g.: dir: mlops/stable-diffusion frontmatter: name: Stable Diffusion Image Generation registered slug (what the user types): /stable-diffusion-image-generation dir: mlops/qdrant frontmatter: name: Qdrant Vector Search registered slug: /qdrant-vector-search dir: mlops/flash-attention frontmatter: name: Optimizing Attention Flash registered slug: /optimizing-attention-flash In every case, _check_unavailable_skill would fall through because "stable-diffusion" != "stable-diffusion-image-generation", even with the skill sitting right there on disk. Fix: extract a small `_skill_slug_from_frontmatter` helper that reads the SKILL.md frontmatter and normalizes exactly like scan_skill_commands (lower, spaces/underscores → hyphens, strip non-[a-z0-9-], collapse runs of hyphens, strip edges). Use it in both the disabled-skills branch and the optional-skills branch. The disabled-set membership check now uses the declared frontmatter name (which is what `hermes skills config` writes into skills.disabled / platform_disabled), not the slug. Tests: five cases in tests/gateway/test_unavailable_skill_hint.py — the drift case for the disabled branch, unknown-command negative, matched-but-not-disabled negative, non-alnum stripping, and the drift case for the optional-skills branch. All five fail against main and pass with the fix. --- gateway/run.py | 72 +++++++- tests/gateway/test_unavailable_skill_hint.py | 185 +++++++++++++++++++ 2 files changed, 253 insertions(+), 4 deletions(-) create mode 100644 tests/gateway/test_unavailable_skill_hint.py diff --git a/gateway/run.py b/gateway/run.py index 88196d6927..daf6b62a19 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -673,11 +673,69 @@ def _is_control_interrupt_message(message: Optional[str]) -> bool: return normalized in _CONTROL_INTERRUPT_MESSAGES +def _skill_slug_from_frontmatter(skill_md: Path) -> tuple[str | None, str | None]: + """Derive the /command slug and declared frontmatter name from a SKILL.md. + + Matches the exact normalization used by + :func:`agent.skill_commands.scan_skill_commands` so the slug here is the + same string a user types after the leading ``/`` (e.g. a skill with + frontmatter ``name: Stable Diffusion Image Generation`` resolves to + ``stable-diffusion-image-generation`` — NOT the parent directory name, + which is commonly shorter/different, e.g. ``stable-diffusion``). + + Using the directory name silently broke :func:`_check_unavailable_skill` + for every skill whose directory name drifted from its frontmatter name + (19 such skills on a standard install as of 2026-05), causing a generic + "unknown command" response where a "disabled — enable with …" or + "not installed — install with …" hint was expected. + + Returns ``(slug, declared_name)`` or ``(None, None)`` when the file + can't be read or lacks a ``name:`` in its frontmatter. + """ + try: + content = skill_md.read_text(encoding="utf-8", errors="replace") + except Exception: + return None, None + if not content.startswith("---"): + return None, None + end = content.find("\n---", 3) + if end < 0: + return None, None + declared_name: str | None = None + for line in content[3:end].splitlines(): + line = line.strip() + if line.startswith("name:"): + raw = line.split(":", 1)[1].strip() + # Strip YAML quote wrappers if present + if len(raw) >= 2 and raw[0] == raw[-1] and raw[0] in ('"', "'"): + raw = raw[1:-1] + declared_name = raw.strip() + break + if not declared_name: + return None, None + slug = declared_name.lower().replace(" ", "-").replace("_", "-") + # Mirror _SKILL_INVALID_CHARS and _SKILL_MULTI_HYPHEN from skill_commands + import re as _re + slug = _re.sub(r"[^a-z0-9-]", "", slug) + slug = _re.sub(r"-{2,}", "-", slug).strip("-") + if not slug: + return None, declared_name + return slug, declared_name + + def _check_unavailable_skill(command_name: str) -> str | None: """Check if a command matches a known-but-inactive skill. Returns a helpful message if the skill exists but is disabled or only available as an optional install. Returns None if no match found. + + The slug for each on-disk skill is derived from its frontmatter ``name:`` + (via :func:`_skill_slug_from_frontmatter`), NOT from its containing + directory name — because the two can differ (e.g. directory + ``stable-diffusion`` + frontmatter ``Stable Diffusion Image Generation`` + yields slug ``stable-diffusion-image-generation``). Matching on + directory name would miss that slug entirely and fall through to the + generic "unknown command" path. """ # Normalize: command uses hyphens, skill names may use hyphens or underscores normalized = command_name.lower().replace("_", "-") @@ -693,8 +751,12 @@ def _check_unavailable_skill(command_name: str) -> str | None: for skill_md in skills_dir.rglob("SKILL.md"): if any(part in ('.git', '.github', '.hub', '.archive') for part in skill_md.parts): continue - name = skill_md.parent.name.lower().replace("_", "-") - if name == normalized and name in disabled: + slug, declared_name = _skill_slug_from_frontmatter(skill_md) + if not slug or not declared_name: + continue + # disabled is keyed by the declared frontmatter name (what + # skills.disabled / skills.platform_disabled store). + if slug == normalized and declared_name in disabled: return ( f"The **{command_name}** skill is installed but disabled.\n" f"Enable it with: `hermes skills config`" @@ -706,8 +768,10 @@ def _check_unavailable_skill(command_name: str) -> str | None: optional_dir = get_optional_skills_dir(repo_root / "optional-skills") if optional_dir.exists(): for skill_md in optional_dir.rglob("SKILL.md"): - name = skill_md.parent.name.lower().replace("_", "-") - if name == normalized: + slug, _declared = _skill_slug_from_frontmatter(skill_md) + if not slug: + continue + if slug == normalized: # Build install path: official// rel = skill_md.parent.relative_to(optional_dir) parts = list(rel.parts) diff --git a/tests/gateway/test_unavailable_skill_hint.py b/tests/gateway/test_unavailable_skill_hint.py new file mode 100644 index 0000000000..8b28d13a62 --- /dev/null +++ b/tests/gateway/test_unavailable_skill_hint.py @@ -0,0 +1,185 @@ +"""Tests for gateway.run._check_unavailable_skill. + +Regression coverage for the dir-name-vs-frontmatter-name drift bug. +The hint function used to compare the skill's parent-directory name +against the typed command and the disabled list. That silently missed +every skill whose directory name differs from its declared frontmatter +name (~19 skills on a standard install), so users typing a real slug +like ``/stable-diffusion-image-generation`` got a generic "unknown +command" response instead of the intended "disabled — enable with …" +or "not installed — install with …" hint. + +These tests pin the fixed behavior: + +* Slug is derived from the frontmatter ``name:`` (exactly matching + :func:`agent.skill_commands.scan_skill_commands`), so the slug differs + from the directory name when the declared name is multi-word. +* ``disabled`` membership is checked by the declared name, because that + is what :func:`hermes_cli.skills_config.save_disabled_skills` stores. +""" +from __future__ import annotations + +from pathlib import Path +from unittest.mock import patch + +import pytest + + +@pytest.fixture +def tmp_skills(tmp_path: Path, monkeypatch: pytest.MonkeyPatch) -> Path: + """Isolated skills dir + HERMES_HOME so the real user config is untouched.""" + home = tmp_path / ".hermes" + home.mkdir() + (home / "skills").mkdir() + monkeypatch.setenv("HERMES_HOME", str(home)) + monkeypatch.setattr(Path, "home", lambda: tmp_path) + return home / "skills" + + +def _write_skill(skills_dir: Path, rel: str, frontmatter_name: str) -> Path: + """Create a SKILL.md at ``//SKILL.md``.""" + skill_dir = skills_dir / rel + skill_dir.mkdir(parents=True, exist_ok=True) + skill_md = skill_dir / "SKILL.md" + skill_md.write_text( + f"---\nname: {frontmatter_name}\ndescription: test skill\n---\nBody.\n", + encoding="utf-8", + ) + return skill_md + + +def test_frontmatter_slug_matched_even_when_dir_name_differs( + tmp_skills: Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """Directory ``stable-diffusion`` + frontmatter ``Stable Diffusion Image Generation``. + + Command typed: ``stable-diffusion-image-generation`` (the slug the + agent actually registers). The old dir-name-based check would have + compared ``stable-diffusion`` to the typed command and missed. + """ + from gateway import run as gateway_run + + _write_skill(tmp_skills, "mlops/stable-diffusion", "Stable Diffusion Image Generation") + + # Config disables by declared name (matches what `hermes skills config` writes). + monkeypatch.setattr( + "gateway.run._get_disabled_skill_names", + lambda: {"Stable Diffusion Image Generation"}, + raising=False, + ) + with patch( + "tools.skills_tool._get_disabled_skill_names", + return_value={"Stable Diffusion Image Generation"}, + ), patch( + "agent.skill_utils.get_all_skills_dirs", + return_value=[tmp_skills], + ): + msg = gateway_run._check_unavailable_skill("stable-diffusion-image-generation") + + assert msg is not None, ( + "expected a 'disabled' hint for the frontmatter-derived slug; " + "the old code compared the dir name 'stable-diffusion' and returned None" + ) + assert "disabled" in msg.lower() + assert "hermes skills config" in msg + + +def test_unknown_command_still_returns_none( + tmp_skills: Path, +) -> None: + """A command that matches no on-disk skill still returns None.""" + from gateway import run as gateway_run + + _write_skill(tmp_skills, "creative/ascii-art", "ascii-art") + + with patch( + "tools.skills_tool._get_disabled_skill_names", return_value=set() + ), patch( + "agent.skill_utils.get_all_skills_dirs", return_value=[tmp_skills] + ): + assert gateway_run._check_unavailable_skill("no-such-skill") is None + + +def test_matched_but_not_disabled_returns_none( + tmp_skills: Path, +) -> None: + """A skill that exists and isn't disabled shouldn't produce a hint.""" + from gateway import run as gateway_run + + _write_skill(tmp_skills, "creative/ascii-art", "ascii-art") + + with patch( + "tools.skills_tool._get_disabled_skill_names", return_value=set() + ), patch( + "agent.skill_utils.get_all_skills_dirs", return_value=[tmp_skills] + ): + assert gateway_run._check_unavailable_skill("ascii-art") is None + + +def test_slug_normalization_strips_non_alnum( + tmp_skills: Path, +) -> None: + """Frontmatter ``C++ Code Review`` → slug ``c-code-review`` (``+`` stripped).""" + from gateway import run as gateway_run + + _write_skill(tmp_skills, "software-development/cpp-review", "C++ Code Review") + + with patch( + "tools.skills_tool._get_disabled_skill_names", + return_value={"C++ Code Review"}, + ), patch( + "agent.skill_utils.get_all_skills_dirs", return_value=[tmp_skills] + ): + msg = gateway_run._check_unavailable_skill("c-code-review") + + assert msg is not None + assert "disabled" in msg.lower() + + +def test_optional_skill_uses_frontmatter_slug( + tmp_path: Path, monkeypatch: pytest.MonkeyPatch +) -> None: + """Same drift bug applies to the optional-skills branch. + + Before: directory name was matched against the typed command, so an + optional skill at ``optional-skills/mlops/stable-diffusion/SKILL.md`` + with frontmatter ``Stable Diffusion Image Generation`` returned None + when the user typed the real slug. + """ + from gateway import run as gateway_run + + # Build an isolated optional-skills dir + optional = tmp_path / "optional-skills" + skill_dir = optional / "mlops" / "stable-diffusion" + skill_dir.mkdir(parents=True) + (skill_dir / "SKILL.md").write_text( + "---\nname: Stable Diffusion Image Generation\ndescription: test\n---\n", + encoding="utf-8", + ) + + # Point the optional lookup at our tmp dir. The source reads from + # ``get_optional_skills_dir(repo_root / "optional-skills")`` — we + # can't easily retarget ``repo_root``, so patch the resolver. + monkeypatch.setattr( + "hermes_constants.get_optional_skills_dir", + lambda _default: optional, + raising=False, + ) + + # Ensure the "disabled" branch doesn't match anything so we fall + # through to the optional-skills branch. + empty_skills = tmp_path / "empty-skills" + empty_skills.mkdir() + with patch( + "tools.skills_tool._get_disabled_skill_names", return_value=set() + ), patch( + "agent.skill_utils.get_all_skills_dirs", return_value=[empty_skills] + ): + msg = gateway_run._check_unavailable_skill("stable-diffusion-image-generation") + + assert msg is not None, ( + "optional-skills branch should recognize the frontmatter-derived slug; " + "the old dir-name-based check returned None here too" + ) + assert "not installed" in msg.lower() + assert "official/mlops/stable-diffusion" in msg From 10297fa23c982a563844a1014f16bec77e1b6598 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:00:11 -0700 Subject: [PATCH 11/61] fix(discord): `/reload-skills` now refreshes the `/skill` autocomplete live (#18754) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `_register_skill_group` captured the skill catalog in closure variables (`entries` and `skill_lookup`) so the single `tree.add_command` call at startup owned the only live copy. The closure is never re-entered after startup, so `/reload-skills` — which rescans the on-disk skills dir and refreshes the in-process `_skill_commands` registry — had no way to propagate results into the `/skill` autocomplete on Discord. New skills stayed invisible in the dropdown, and deleted skills returned "Unknown skill" when the stale autocomplete entry was clicked. The fix is purely a dataflow change: promote `entries` and `skill_lookup` to instance attributes (`_skill_entries`, `_skill_lookup`), split the collector-driven rebuild into a helper (`_refresh_skill_catalog_state`), and add a public `refresh_skill_group()` method that re-runs the helper and is safe to call at any point after the initial registration. The gateway's `_handle_reload_skills_command` then iterates `self.adapters` and calls `refresh_skill_group()` on any adapter that exposes it (currently only Discord). Both sync and async implementations are supported; adapters that don't override the method (Telegram's BotCommand menu, Slack subcommand map, etc.) are silently skipped — the in-process `reload_skills()` call covers them. No `tree.sync()` is required because Discord fetches autocomplete options dynamically on every keystroke — mutating the instance state the callbacks already read from is sufficient. That sidesteps the per-app command-bucket rate limit (~5 writes / 20 s) that made the previous bulk-sync-on-reload approach unusable (#16713 context). Tests: tests/gateway/test_reload_skills_discord_resync.py — five cases covering (1) refresh replaces entries, (2) entries stay sorted after refresh, (3) collector exception leaves cached state intact, (4) `_refresh_skill_catalog_state` populates the instance attrs, (5) orchestrator calls `refresh_skill_group()` on sync + async adapters and skips adapters that don't expose it. --- gateway/platforms/discord.py | 109 ++++++-- gateway/run.py | 23 ++ .../test_reload_skills_discord_resync.py | 244 ++++++++++++++++++ 3 files changed, 348 insertions(+), 28 deletions(-) create mode 100644 tests/gateway/test_reload_skills_discord_resync.py diff --git a/gateway/platforms/discord.py b/gateway/platforms/discord.py index 60cfb55ef6..1dd608d6f3 100644 --- a/gateway/platforms/discord.py +++ b/gateway/platforms/discord.py @@ -2584,40 +2584,32 @@ class DiscordAdapter(BasePlatformAdapter): hidden skills. The slash picker also becomes more discoverable — Discord live-filters by the user's typed prefix against both the skill name and its description. + + The entries list and lookup dict are stored on ``self`` rather + than captured in closure variables so :meth:`refresh_skill_group` + can repopulate them when the user runs ``/reload-skills`` without + needing to touch the Discord slash-command tree or trigger a + ``tree.sync()`` call. """ try: - from hermes_cli.commands import discord_skill_commands_by_category - existing_names = set() try: existing_names = {cmd.name for cmd in tree.get_commands()} except Exception: pass - # Reuse the existing collector for consistent filtering - # (per-platform disabled, hub-excluded, name clamping), then - # flatten — the category grouping was only useful for the - # nested layout. - categories, uncategorized, hidden = discord_skill_commands_by_category( - reserved_names=existing_names, - ) - entries: list[tuple[str, str, str]] = list(uncategorized) - for cat_skills in categories.values(): - entries.extend(cat_skills) + # Populate the instance-level entries/lookup so the + # autocomplete + handler callbacks below always read the + # freshest state. refresh_skill_group() re-runs the same + # collector and mutates these two attributes in place. + self._skill_entries: list[tuple[str, str, str]] = [] + self._skill_lookup: dict[str, tuple[str, str]] = {} + self._skill_group_reserved_names: set[str] = set(existing_names) + self._refresh_skill_catalog_state() - if not entries: + if not self._skill_entries: return - # Stable alphabetical order so the autocomplete suggestion - # list is predictable across restarts. - entries.sort(key=lambda t: t[0]) - - # name -> (description, cmd_key) — used by both the autocomplete - # callback and the handler for O(1) dispatch. - skill_lookup: dict[str, tuple[str, str]] = { - n: (d, k) for n, d, k in entries - } - async def _autocomplete_name( interaction: "discord.Interaction", current: str, ) -> list: @@ -2627,10 +2619,13 @@ class DiscordAdapter(BasePlatformAdapter): "/skill pdf" surfaces skills whose description mentions PDFs even if the name doesn't. Discord caps this list at 25 entries per query. + + Reads ``self._skill_entries`` so a ``/reload-skills`` run + since process start shows up on the very next keystroke. """ q = (current or "").strip().lower() choices: list = [] - for name, desc, _key in entries: + for name, desc, _key in self._skill_entries: if not q or q in name.lower() or (desc and q in desc.lower()): if desc: label = f"{name} — {desc}" @@ -2654,7 +2649,7 @@ class DiscordAdapter(BasePlatformAdapter): async def _skill_handler( interaction: "discord.Interaction", name: str, args: str = "", ): - entry = skill_lookup.get(name) + entry = self._skill_lookup.get(name) if not entry: await interaction.response.send_message( f"Unknown skill: `{name}`. Start typing for " @@ -2676,16 +2671,74 @@ class DiscordAdapter(BasePlatformAdapter): logger.info( "[%s] Registered /skill command with %d skill(s) via autocomplete", - self.name, len(entries), + self.name, len(self._skill_entries), ) - if hidden: + if self._skill_group_hidden_count: logger.info( "[%s] %d skill(s) filtered out of /skill (name clamp / reserved)", - self.name, hidden, + self.name, self._skill_group_hidden_count, ) except Exception as exc: logger.warning("[%s] Failed to register /skill command: %s", self.name, exc) + def _refresh_skill_catalog_state(self) -> None: + """Re-scan disk for skills and repopulate ``self._skill_entries``. + + Called once from :meth:`_register_skill_group` at startup and + again from :meth:`refresh_skill_group` whenever the user runs + ``/reload-skills``. No Discord API calls are made — autocomplete + and the handler both read from these instance attributes + directly, so an in-place mutation is sufficient. + """ + from hermes_cli.commands import discord_skill_commands_by_category + + reserved = getattr(self, "_skill_group_reserved_names", set()) + categories, uncategorized, hidden = discord_skill_commands_by_category( + reserved_names=set(reserved), + ) + entries: list[tuple[str, str, str]] = list(uncategorized) + for cat_skills in categories.values(): + entries.extend(cat_skills) + # Stable alphabetical order so the autocomplete suggestion + # list is predictable across restarts. + entries.sort(key=lambda t: t[0]) + + self._skill_entries = entries + self._skill_lookup = {n: (d, k) for n, d, k in entries} + self._skill_group_hidden_count = hidden + + def refresh_skill_group(self) -> tuple[int, int]: + """Rescan skills and update the live ``/skill`` autocomplete state. + + Invoked by :meth:`gateway.run.GatewayOrchestrator._handle_reload_skills_command` + after :func:`agent.skill_commands.reload_skills` has refreshed + the in-process skill-command registry. Without this call, the + ``/skill`` autocomplete dropdown keeps showing the list captured + at process start — new skills stay invisible and deleted skills + return an "Unknown skill" error when clicked. + + Because autocomplete options are fetched dynamically by Discord, + we only need to mutate the entries/lookup attributes read by the + callbacks — no ``tree.sync()`` is required. + + Returns ``(new_count, hidden_count)``. + """ + try: + self._refresh_skill_catalog_state() + except Exception as exc: + logger.warning( + "[%s] Failed to refresh /skill autocomplete after reload: %s", + self.name, exc, + ) + return (len(getattr(self, "_skill_entries", [])), 0) + logger.info( + "[%s] Refreshed /skill autocomplete: %d skill(s) available (%d filtered)", + self.name, + len(self._skill_entries), + self._skill_group_hidden_count, + ) + return (len(self._skill_entries), self._skill_group_hidden_count) + def _build_slash_event(self, interaction: discord.Interaction, text: str) -> MessageEvent: """Build a MessageEvent from a Discord slash command interaction.""" is_dm = isinstance(interaction.channel, discord.DMChannel) diff --git a/gateway/run.py b/gateway/run.py index daf6b62a19..23c67eec09 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -15,6 +15,7 @@ Usage: import asyncio import dataclasses +import inspect import json import logging import os @@ -9687,6 +9688,28 @@ class GatewayRunner: removed = result.get("removed", []) # [{"name", "description"}, ...] total = result.get("total", 0) + # Let each connected adapter refresh any platform-side state + # that cached the skill list at startup. Today that's the + # Discord /skill autocomplete (registered once per connect); + # without this call, new skills stay invisible in the + # dropdown and deleted skills error out when clicked. Other + # adapters that don't override refresh_skill_group (Telegram's + # BotCommand menu, Slack subcommand map, etc.) are silently + # skipped — the in-process reload above is enough for them. + for adapter in list(self.adapters.values()): + refresh = getattr(adapter, "refresh_skill_group", None) + if not callable(refresh): + continue + try: + maybe = refresh() + if inspect.isawaitable(maybe): + await maybe + except Exception as exc: + logger.warning( + "Adapter %s refresh_skill_group raised: %s", + getattr(adapter, "name", adapter), exc, + ) + lines = ["🔄 **Skills Reloaded**\n"] if not added and not removed: lines.append("No new skills detected.") diff --git a/tests/gateway/test_reload_skills_discord_resync.py b/tests/gateway/test_reload_skills_discord_resync.py new file mode 100644 index 0000000000..7b2e1d20ff --- /dev/null +++ b/tests/gateway/test_reload_skills_discord_resync.py @@ -0,0 +1,244 @@ +"""Tests for `/reload-skills` resyncing the Discord ``/skill`` autocomplete. + +Before this change, ``_register_skill_group`` captured the skill catalog +in closure variables (``entries`` and ``skill_lookup``) so that the one +``tree.add_command`` call at startup owned the only live copy of the +skill list. The closure is never re-entered after startup, so +``/reload-skills`` (which rescans the on-disk skill dir and refreshes +the in-process registry) had no way to propagate its results into the +autocomplete — new skills stayed invisible in the dropdown and deleted +skills returned an "Unknown skill" error when the stale autocomplete +entry was clicked. + +The fix promotes those two variables to instance attributes +(``_skill_entries`` / ``_skill_lookup``) and exposes a +``refresh_skill_group()`` method that rescans and mutates them in +place. The gateway ``_handle_reload_skills_command`` iterates its +connected adapters and calls the method on any that expose it. + +No ``tree.sync()`` is required because Discord fetches autocomplete +options dynamically on every keystroke — we only need to rebind the +data the live callbacks already read from. +""" +from __future__ import annotations + +from unittest.mock import MagicMock + + +def _make_adapter(): + """Construct a DiscordAdapter without going through __init__ / token checks.""" + from gateway.platforms.discord import DiscordAdapter + from gateway.platforms.base import Platform + adapter = object.__new__(DiscordAdapter) + adapter.config = MagicMock() + adapter.config.extra = {} + # ``platform`` is set by BasePlatformAdapter.__init__, which we skip + # above; the inherited ``.name`` property dereferences it for log + # formatting, so set it explicitly. + adapter.platform = Platform.DISCORD + return adapter + + +class TestRefreshSkillGroup: + def test_refresh_repopulates_entries_after_catalog_change( + self, monkeypatch + ) -> None: + """The initial catalog is replaced wholesale on refresh. + + Mirrors the observable /reload-skills case: a user adds a new + skill to ~/.hermes/skills/, runs /reload-skills, and expects + the autocomplete to surface it on the very next keystroke. + """ + adapter = _make_adapter() + + # Start-of-process state: /register built the catalog from the + # original collector output. + adapter._skill_entries = [ + ("old-skill", "Pre-existing skill", "/old-skill"), + ] + adapter._skill_lookup = {"old-skill": ("Pre-existing skill", "/old-skill")} + adapter._skill_group_reserved_names = set() + adapter._skill_group_hidden_count = 0 + + # User adds new-skill to disk and removes old-skill. + def fake_collector(*, reserved_names): + return ( + {"creative": [("new-skill", "Fresh skill", "/new-skill")]}, # categories + [], # uncategorized + 0, # hidden + ) + + monkeypatch.setattr( + "hermes_cli.commands.discord_skill_commands_by_category", + fake_collector, + ) + + new_count, hidden = adapter.refresh_skill_group() + + assert new_count == 1 + assert hidden == 0 + # Old skill is gone, new skill is present. + names = [n for n, _d, _k in adapter._skill_entries] + assert names == ["new-skill"] + assert "old-skill" not in adapter._skill_lookup + assert adapter._skill_lookup["new-skill"] == ("Fresh skill", "/new-skill") + + def test_refresh_sorts_entries_alphabetically(self, monkeypatch) -> None: + """Autocomplete order must be stable and predictable across refreshes.""" + adapter = _make_adapter() + adapter._skill_entries = [] + adapter._skill_lookup = {} + adapter._skill_group_reserved_names = set() + adapter._skill_group_hidden_count = 0 + + def fake_collector(*, reserved_names): + # Intentionally unsorted — the fix must resort. + return ( + {"zzz": [("zebra", "", "/zebra")]}, + [("alpha", "", "/alpha")], + 0, + ) + + monkeypatch.setattr( + "hermes_cli.commands.discord_skill_commands_by_category", + fake_collector, + ) + + adapter.refresh_skill_group() + + names = [n for n, _d, _k in adapter._skill_entries] + assert names == sorted(names) == ["alpha", "zebra"] + + def test_refresh_handles_collector_exception_gracefully( + self, monkeypatch + ) -> None: + """A broken collector must not take down /reload-skills.""" + adapter = _make_adapter() + adapter._skill_entries = [("keep", "kept", "/keep")] + adapter._skill_lookup = {"keep": ("kept", "/keep")} + adapter._skill_group_reserved_names = set() + adapter._skill_group_hidden_count = 0 + + def boom(*, reserved_names): + raise RuntimeError("simulated collector failure") + + monkeypatch.setattr( + "hermes_cli.commands.discord_skill_commands_by_category", + boom, + ) + + new_count, hidden = adapter.refresh_skill_group() + # Returns previously-cached count, no crash, existing entries + # preserved so the live autocomplete keeps working. + assert new_count == 1 + assert hidden == 0 + assert adapter._skill_entries == [("keep", "kept", "/keep")] + + +class TestRegisterSkillGroupUsesInstanceState: + """The closure-based ``entries`` / ``skill_lookup`` must be gone. + + If the callbacks in ``_register_skill_group`` still close over + local variables instead of reading from ``self``, the refresh + method is useless — autocomplete will keep serving the stale list. + + The full slash-command registration path pulls in ``discord.app_commands`` + decorators (``@describe`` / ``@autocomplete`` / ``Command``), which + are unstubbed in the hermetic test env. We assert the data-shaped + side-effects instead: after ``_register_skill_group`` returns + (successfully or not), ``_skill_entries`` and ``_skill_lookup`` must + be populated from the collector output, because + ``_refresh_skill_catalog_state`` runs before any decorator evaluation. + """ + + def test_refresh_catalog_state_populates_instance_attrs( + self, monkeypatch + ) -> None: + adapter = _make_adapter() + adapter._skill_group_reserved_names = set() + + def fake_collector(*, reserved_names): + return ( + {"creative": [("ascii-art", "Make ASCII", "/ascii-art")]}, + [], + 0, + ) + monkeypatch.setattr( + "hermes_cli.commands.discord_skill_commands_by_category", + fake_collector, + ) + + adapter._refresh_skill_catalog_state() + + # Instance-level state populated — the autocomplete + handler + # callbacks both read from these, so `refresh_skill_group` + # mutating them in place is enough to pick up new skills. + assert adapter._skill_entries == [ + ("ascii-art", "Make ASCII", "/ascii-art"), + ] + assert adapter._skill_lookup == { + "ascii-art": ("Make ASCII", "/ascii-art"), + } + assert adapter._skill_group_hidden_count == 0 + + +class TestHandleReloadSkillsCallsRefreshSkillGroup: + """Gateway-side integration: /reload-skills must call refresh on adapters.""" + + def test_orchestrator_calls_refresh_skill_group_on_every_adapter(self): + """Sync + async refresh_skill_group implementations both get awaited/called. + + The orchestrator iterates ``self.adapters`` and calls + ``refresh_skill_group`` if it exists. Adapters that don't + implement it (today: everything except Discord) are silently + skipped without raising. + """ + import asyncio + from unittest.mock import patch, MagicMock + + # Import without constructing a real runner — test the method + # directly against an ``object.__new__`` instance. + from gateway.run import GatewayRunner + runner = object.__new__(GatewayRunner) + + sync_refresh = MagicMock(return_value=(5, 0)) + async_called = {"flag": False} + + class AsyncAdapter: + name = "async-platform" + async def refresh_skill_group(self): + async_called["flag"] = True + return (3, 0) + + class SyncAdapter: + name = "sync-platform" + refresh_skill_group = sync_refresh + + class NoOpAdapter: + name = "other" + # No refresh_skill_group — must not crash. + + runner.adapters = { + "discord": AsyncAdapter(), + "slack": SyncAdapter(), + "telegram": NoOpAdapter(), + } + + # Mock reload_skills itself so no disk scan runs. + fake_result = {"added": [], "removed": [], "total": 7} + with patch( + "agent.skill_commands.reload_skills", return_value=fake_result + ): + event = MagicMock() + event.source = MagicMock() + # _session_key_for_source may be called — make it safe. + runner._session_key_for_source = lambda src: None + runner._pending_skills_reload_notes = {} + + result = asyncio.get_event_loop().run_until_complete( + runner._handle_reload_skills_command(event) + ) + + assert "Skills Reloaded" in result + assert sync_refresh.called, "sync adapter refresh must be invoked" + assert async_called["flag"], "async adapter refresh must be awaited" From 2ef1ad280beee581e0f023901d0d040efec380ac Mon Sep 17 00:00:00 2001 From: Frank Song Date: Fri, 1 May 2026 13:42:50 +0800 Subject: [PATCH 12/61] fix: prefer ~/.hermes/.env over os.environ when seeding credential pool MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When _seed_from_env() reads API keys to populate the credential pool, it should treat ~/.hermes/.env as the authoritative source — not os.environ. Stale env vars inherited from parent shell processes (Codex CLI, test scripts, etc.) can shadow deliberate changes to the .env file, causing auth.json to cache an outdated key that leads to silent 401 errors. This is especially visible with OpenRouter: if a parent process exported OPENROUTER_API_KEY=test-key-fresh and the user later updates .env with a valid key, restarting Hermes still picks up the stale os.environ value, writes it back to auth.json, and all API calls fail with 401. Fixes #18254 --- agent/credential_pool.py | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/agent/credential_pool.py b/agent/credential_pool.py index 004b574988..27a16bd435 100644 --- a/agent/credential_pool.py +++ b/agent/credential_pool.py @@ -3,6 +3,7 @@ from __future__ import annotations import logging +import os import random import threading import time @@ -13,7 +14,7 @@ from datetime import datetime from typing import Any, Dict, List, Optional, Set, Tuple from hermes_constants import OPENROUTER_BASE_URL -from hermes_cli.config import get_env_value +from hermes_cli.config import get_env_value, load_env import hermes_cli.auth as auth_mod from hermes_cli.auth import ( CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS, @@ -1380,6 +1381,16 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool, Set[str]]: changed = False active_sources: Set[str] = set() + + # Prefer ~/.hermes/.env over os.environ — the user's config file is the + # authoritative source for Hermes credentials. Stale env vars from parent + # processes (Codex CLI, test scripts, etc.) should not override deliberate + # changes to the .env file. + def _get_env_prefer_dotenv(key: str) -> str: + env_file = load_env() + val = env_file.get(key) or os.environ.get(key) or "" + return val.strip() + # Honour user suppression — `hermes auth remove ` for an # env-seeded credential marks the env: source as suppressed so it # won't be re-seeded from the user's shell environment or ~/.hermes/.env. @@ -1391,8 +1402,8 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool def _is_source_suppressed(_p, _s): # type: ignore[misc] return False if provider == "openrouter": - # Check both os.environ and ~/.hermes/.env file - token = (get_env_value("OPENROUTER_API_KEY") or "").strip() + # Prefer ~/.hermes/.env over os.environ + token = _get_env_prefer_dotenv("OPENROUTER_API_KEY") if token: source = "env:OPENROUTER_API_KEY" if _is_source_suppressed(provider, source): @@ -1418,7 +1429,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool env_url = "" if pconfig.base_url_env_var: - env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/") + env_url = _get_env_prefer_dotenv(pconfig.base_url_env_var).rstrip("/") env_vars = list(pconfig.api_key_env_vars) if provider == "anthropic": @@ -1429,8 +1440,8 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool ] for env_var in env_vars: - # Check both os.environ and ~/.hermes/.env file - token = (get_env_value(env_var) or "").strip() + # Prefer ~/.hermes/.env over os.environ + token = _get_env_prefer_dotenv(env_var) if not token: continue source = f"env:{env_var}" From 9c626ef8ea8bc190f9a339991c8de26ce4528bb5 Mon Sep 17 00:00:00 2001 From: teknium1 <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 01:58:06 -0700 Subject: [PATCH 13/61] chore(release): map franksong2702 email for AUTHOR_MAP Follow-up for PR #18256 salvage. --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index c055e5783c..adb16607f0 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -371,6 +371,7 @@ AUTHOR_MAP = { "1243352777@qq.com": "zons-zhaozhy", "e.silacandmr@gmail.com": "Es1la", "h3057183414@gmail.com": "CoreyNoDream", + "franksong2702@gmail.com": "franksong2702", # ── bulk addition: 75 emails resolved via API, PR salvage bodies, noreply # crossref, and GH contributor list matching (April 2026 audit) ── "1115117931@qq.com": "aaronagent", From 0a6865b328ee6057eb59ee4a150c4886aa72d48c Mon Sep 17 00:00:00 2001 From: teknium1 <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 01:58:44 -0700 Subject: [PATCH 14/61] test(credential_pool): regression coverage for .env vs os.environ precedence MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Covers PR #18256 fix for issue #18254 — when OPENROUTER_API_KEY is set in BOTH os.environ (stale from parent shell) and ~/.hermes/.env (fresh), _seed_from_env must prefer the .env value. Also guards the fallback case where .env omits the key entirely (Docker/K8s/systemd deployments that only inject via runtime env). --- tests/agent/test_credential_pool.py | 58 +++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) diff --git a/tests/agent/test_credential_pool.py b/tests/agent/test_credential_pool.py index 70e59f17a5..abc93eca02 100644 --- a/tests/agent/test_credential_pool.py +++ b/tests/agent/test_credential_pool.py @@ -348,6 +348,64 @@ def test_load_pool_seeds_env_api_key(tmp_path, monkeypatch): assert entry.access_token == "sk-or-seeded" + +def test_load_pool_prefers_dotenv_over_stale_os_environ(tmp_path, monkeypatch): + """Regression for #18254: stale OPENROUTER_API_KEY in os.environ (inherited + from a parent shell) must NOT shadow the fresh key in ~/.hermes/.env when + seeding the credential pool. Before the fix, `get_env_value()` preferred + os.environ and silently wrote the stale value into auth.json, causing + persistent 401 errors after key rotation. + """ + hermes_home = tmp_path / "hermes" + hermes_home.mkdir() + monkeypatch.setenv("HERMES_HOME", str(hermes_home)) + + # Simulate the bug: parent shell exported a stale test key + monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-STALE-from-shell") + + # User edited ~/.hermes/.env with the fresh key + (hermes_home / ".env").write_text( + "OPENROUTER_API_KEY=sk-or-FRESH-from-dotenv\n" + ) + + _write_auth_store(tmp_path, {"version": 1, "providers": {}}) + + from agent.credential_pool import load_pool + pool = load_pool("openrouter") + entry = pool.select() + + assert entry is not None + assert entry.source == "env:OPENROUTER_API_KEY" + # The fresh key from .env must win over the stale shell export + assert entry.access_token == "sk-or-FRESH-from-dotenv", ( + f"Expected .env to win, got {entry.access_token!r}" + ) + + +def test_load_pool_falls_back_to_os_environ_when_dotenv_empty(tmp_path, monkeypatch): + """When ~/.hermes/.env does not define OPENROUTER_API_KEY (typical Docker / + K8s / systemd deployment), seeding must still pick up the key from + os.environ. Guards against regressions that would break production + deployments relying on runtime-injected env vars. + """ + hermes_home = tmp_path / "hermes" + hermes_home.mkdir() + monkeypatch.setenv("HERMES_HOME", str(hermes_home)) + monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-from-runtime-env") + + # .env exists but does not define OPENROUTER_API_KEY + (hermes_home / ".env").write_text("SOME_OTHER_VAR=unrelated\n") + + _write_auth_store(tmp_path, {"version": 1, "providers": {}}) + + from agent.credential_pool import load_pool + pool = load_pool("openrouter") + entry = pool.select() + + assert entry is not None + assert entry.access_token == "sk-or-from-runtime-env" + + def test_load_pool_removes_stale_seeded_env_entry(tmp_path, monkeypatch): monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes")) monkeypatch.delenv("OPENROUTER_API_KEY", raising=False) From 292d2fb42fe304e4d6e6184f39e1f60e5aa771f8 Mon Sep 17 00:00:00 2001 From: luyao618 <364939526@qq.com> Date: Fri, 1 May 2026 11:40:23 +0800 Subject: [PATCH 15/61] fix(discord): close old client before reconnect to prevent zombie websockets (#18187) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When DiscordAdapter.connect() is called during reconnect, it creates a new commands.Bot client without closing the previous one. The old client's websocket remains connected to Discord's gateway, causing both to fire on_message for every incoming event — resulting in double responses. Fix: before creating a new Bot instance, check if a previous client exists and close it. This ensures only one websocket connection is active at any time. Closes #18187 --- gateway/platforms/discord.py | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/gateway/platforms/discord.py b/gateway/platforms/discord.py index 1dd608d6f3..369a607a90 100644 --- a/gateway/platforms/discord.py +++ b/gateway/platforms/discord.py @@ -613,6 +613,21 @@ class DiscordAdapter(BasePlatformAdapter): # so LLM output or echoed user content can't ping the whole # server; override per DISCORD_ALLOW_MENTION_* env vars or the # discord.allow_mentions.* block in config.yaml. + + # Close any existing client to prevent zombie websocket connections + # on reconnect (see #18187). Without this, the old client remains + # connected to Discord gateway and both fire on_message, causing + # double responses. + if self._client is not None: + try: + if not self._client.is_closed(): + await self._client.close() + except Exception: + logger.debug("[%s] Failed to close previous Discord client", self.name) + finally: + self._client = None + self._ready_event.clear() + self._client = commands.Bot( command_prefix="!", # Not really used, we handle raw messages intents=intents, From e363ced3c3959392268fd1ea8b85334b889aa298 Mon Sep 17 00:00:00 2001 From: teknium1 <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:03:40 -0700 Subject: [PATCH 16/61] test(discord): regression coverage for zombie-websocket guard in connect() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Covers PR #18224 fix for issue #18187 — when DiscordAdapter.connect() is called a second time without an intervening disconnect(), the previous commands.Bot must be closed before a new one is created. Otherwise both websockets stay connected to Discord's gateway and both fire on_message, producing double responses with different wording. --- tests/gateway/test_discord_connect.py | 63 +++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) diff --git a/tests/gateway/test_discord_connect.py b/tests/gateway/test_discord_connect.py index d769d3f445..dd49e78e18 100644 --- a/tests/gateway/test_discord_connect.py +++ b/tests/gateway/test_discord_connect.py @@ -172,6 +172,69 @@ async def test_connect_only_requests_members_intent_when_needed(monkeypatch, all await adapter.disconnect() +@pytest.mark.asyncio +async def test_reconnect_closes_previous_client_to_prevent_zombie_websocket(monkeypatch): + """Regression for #18187: calling connect() twice without disconnect() in + between (e.g. during an in-process reconnect attempt) must close the old + commands.Bot before creating a new one. Without this guard, two websockets + stay alive and both fire on_message, producing double responses with + different wording. + """ + adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token")) + + monkeypatch.setattr("gateway.status.acquire_scoped_lock", lambda scope, identity, metadata=None: (True, None)) + monkeypatch.setattr("gateway.status.release_scoped_lock", lambda scope, identity: None) + + intents = SimpleNamespace( + message_content=False, dm_messages=False, guild_messages=False, + members=False, voice_states=False, + ) + monkeypatch.setattr(discord_platform.Intents, "default", lambda: intents) + + class TrackedBot(FakeBot): + """FakeBot that records close() calls and reports open/closed state.""" + _closed = False + + def is_closed(self): + return self._closed + + async def close(self): + self._closed = True + + created: list[TrackedBot] = [] + + def fake_bot_factory(*, command_prefix, intents, proxy=None, allowed_mentions=None, **_): + bot = TrackedBot(intents=intents, allowed_mentions=allowed_mentions) + created.append(bot) + return bot + + monkeypatch.setattr(discord_platform.commands, "Bot", fake_bot_factory) + monkeypatch.setattr(adapter, "_resolve_allowed_usernames", AsyncMock()) + + # First connect — fresh adapter, no prior client. + assert await adapter.connect() is True + assert len(created) == 1 + first_bot = created[0] + assert first_bot._closed is False, "first bot should still be open after connect()" + + # Second connect WITHOUT disconnect — simulates an in-process reconnect. + # Without the fix, first_bot would remain open (zombie), and both would + # receive every Discord event, causing double responses. + assert await adapter.connect() is True + assert len(created) == 2 + second_bot = created[1] + + # The first bot must be closed before the second is assigned. + assert first_bot._closed is True, ( + "First Discord client must be closed on re-entry of connect() to prevent " + "zombie websocket (#18187)" + ) + assert second_bot._closed is False, "second bot should still be open" + assert adapter._client is second_bot + + await adapter.disconnect() + + @pytest.mark.asyncio async def test_connect_releases_token_lock_on_timeout(monkeypatch): adapter = DiscordAdapter(PlatformConfig(enabled=True, token="test-token")) From 5eac6084bc781377cc1432165ebb489ccf5d6fbf Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:05:01 -0700 Subject: [PATCH 17/61] fix(discord): warn on 32-char clamp collisions in the /skill collector (#18759) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Discord's per-command name limit is 32 chars. When two skill slugs share the same first 32 chars (or a skill slug clamps onto a reserved gateway command name), only the first seen wins — the second is dropped from the /skill autocomplete. The old behavior incremented a ``hidden`` counter silently, so skill authors had no way to discover the drop short of noticing their skill was missing from the picker. Not an actively-biting bug today (no collisions on the default catalog as of 2026-05), but a landmine the moment someone ships a skill with a long name. The earlier series in #18745 / #18753 / #18754 dropped the other silent data-loss paths in the Discord /skill collector; this one lights up the last remaining one. Fix: promote ``_names_used`` from a set to a dict keyed by the clamped name, mapping to the source cmd_key (or a ``""`` sentinel for names inherited via ``reserved_names``). On collision, log a WARNING naming both sides — the winner, the loser, the clamped name, and what to rename. Two phrasings: * skill-vs-skill — "both clamp to X on Discord's 32-char command-name limit; only the winner appears in /skill. Rename one skill's frontmatter ``name:`` to differ in its first 32 chars." * skill-vs-reserved — "collides with a reserved gateway command name; the skill will not appear in /skill. Rename the skill's frontmatter ``name:``." Tests: three cases in ``tests/hermes_cli/test_discord_skill_clamp_warning.py`` — skill-vs-skill collision (warning names both cmd_keys + clamped prefix), skill-vs-reserved collision (warning uses the distinct phrasing), and a no-collision negative (zero warnings emitted). --- hermes_cli/commands.py | 45 ++++- .../test_discord_skill_clamp_warning.py | 171 ++++++++++++++++++ 2 files changed, 211 insertions(+), 5 deletions(-) create mode 100644 tests/hermes_cli/test_discord_skill_clamp_warning.py diff --git a/hermes_cli/commands.py b/hermes_cli/commands.py index ba05602241..1b4b85bd67 100644 --- a/hermes_cli/commands.py +++ b/hermes_cli/commands.py @@ -10,6 +10,7 @@ To add an alias: set ``aliases=("short",)`` on the existing ``CommandDef``. from __future__ import annotations +import logging import os import re import shutil @@ -21,6 +22,8 @@ from typing import Any from utils import is_truthy_value +logger = logging.getLogger(__name__) + # prompt_toolkit is an optional CLI dependency — only needed for # SlashCommandCompleter and SlashCommandAutoSuggest. Gateway and test # environments that lack it must still be able to import this module @@ -781,7 +784,12 @@ def discord_skill_commands_by_category( # Collect raw skill data -------------------------------------------------- categories: dict[str, list[tuple[str, str, str]]] = {} uncategorized: list[tuple[str, str, str]] = [] - _names_used: set[str] = set(reserved_names) + # Map clamped-32-char-name → what it came from, so we can emit an + # actionable warning on collision. Reserved (gateway-builtin) command + # names are marked with a sentinel so the warning distinguishes + # "skill collided with a reserved command" from "two skills collided + # on the 32-char clamp" — the latter is the rename-worthy case. + _names_used: dict[str, str] = {n: "" for n in reserved_names} hidden = 0 try: @@ -836,12 +844,39 @@ def discord_skill_commands_by_category( # Clamp to 32 chars (Discord per-command name limit) discord_name = raw_name[:32] if discord_name in _names_used: - # Collision with a previously-registered name — drop and - # count. Almost always caused by a reserved built-in name, - # not by another skill (frontmatter names are unique). + # Two skills whose first 32 chars are identical. One wins + # (the first one seen, which is alphabetical because the + # caller iterates ``sorted(skill_cmds)``); the other is + # dropped from Discord's /skill autocomplete. + # + # Silently counting this as ``hidden`` (the old behavior) + # meant skill authors had no way to discover the drop — + # their skill just didn't appear in the picker. Emit a + # WARNING naming both sides so the author can rename the + # losing skill's frontmatter name to something with a + # distinct 32-char prefix. + prior = _names_used[discord_name] + if prior == "": + logger.warning( + "Discord /skill: %r (from %r) collides on its 32-char " + "clamp with a reserved gateway command name %r — the " + "skill will not appear in the /skill autocomplete. " + "Rename the skill's frontmatter ``name:`` to differ " + "in its first 32 chars.", + discord_name, cmd_key, discord_name, + ) + else: + logger.warning( + "Discord /skill: %r and %r both clamp to %r on " + "Discord's 32-char command-name limit — only %r " + "will appear in the /skill autocomplete. Rename " + "one skill's frontmatter ``name:`` to differ in " + "its first 32 chars.", + prior, cmd_key, discord_name, prior, + ) hidden += 1 continue - _names_used.add(discord_name) + _names_used[discord_name] = cmd_key desc = info.get("description", "") if len(desc) > 100: diff --git a/tests/hermes_cli/test_discord_skill_clamp_warning.py b/tests/hermes_cli/test_discord_skill_clamp_warning.py new file mode 100644 index 0000000000..541eeddc41 --- /dev/null +++ b/tests/hermes_cli/test_discord_skill_clamp_warning.py @@ -0,0 +1,171 @@ +"""Tests for Discord /skill 32-char clamp collision warnings. + +Discord's per-command name limit is 32 chars, so +``discord_skill_commands_by_category`` clamps skill slugs to that width +before deduping. When two skills share the same 32-char prefix, only +the first (alphabetical) wins; the second is dropped. Previously the +drop was silent — the ``hidden`` count incremented but nothing named +which skills collided, so authors had no way to discover the drop +short of noticing that their skill was missing from the autocomplete. + +This module pins the upgraded behavior: a WARNING log with both full +cmd_keys + the clamped name, so whoever named the skills sees the +collision and can rename one. +""" +from __future__ import annotations + +import logging +from pathlib import Path +from unittest.mock import patch + + +def test_clamp_collision_emits_warning_naming_both_skills( + tmp_path: Path, caplog +) -> None: + """Two skills with identical first 32 chars — warning names both.""" + from hermes_cli.commands import discord_skill_commands_by_category + + # Craft cmd_keys that share the first 32 chars. + # 40-char prefix 'skill-collision-prefix-identical-first-32' + # -> clamped to 'skill-collision-prefix-identical' + prefix = "skill-collision-prefix-identical" # exactly 32 chars + name_a = prefix + "-alpha" # /skill-collision-prefix-identical-alpha + name_b = prefix + "-bravo" # /skill-collision-prefix-identical-bravo + assert name_a[:32] == name_b[:32] == prefix + + skills_dir = tmp_path / "skills" + for nm in (name_a, name_b): + d = skills_dir / "creative" / nm + d.mkdir(parents=True) + (d / "SKILL.md").write_text("---\nname: x\n---\n") + + fake_cmds = { + f"/{name_a}": { + "name": name_a, + "description": "Alpha", + "skill_md_path": str(skills_dir / "creative" / name_a / "SKILL.md"), + }, + f"/{name_b}": { + "name": name_b, + "description": "Bravo", + "skill_md_path": str(skills_dir / "creative" / name_b / "SKILL.md"), + }, + } + + with caplog.at_level(logging.WARNING, logger="hermes_cli.commands"), ( + patch("agent.skill_commands.get_skill_commands", return_value=fake_cmds) + ), patch("tools.skills_tool.SKILLS_DIR", skills_dir): + categories, uncategorized, hidden = discord_skill_commands_by_category( + reserved_names=set(), + ) + + # One skill made it through, one was dropped (hidden counted). + assert hidden == 1 + kept_names = [n for n, _d, _k in categories.get("creative", [])] + assert len(kept_names) == 1 + # Alphabetical iteration means the -alpha variant wins the slot. + assert kept_names[0] == prefix # clamped + + # Exactly one warning, naming BOTH full cmd_keys and the clamped name. + warnings = [ + r for r in caplog.records + if r.levelno == logging.WARNING and "clamp" in r.getMessage() + ] + assert len(warnings) == 1, ( + f"expected exactly one clamp-collision warning, got {len(warnings)}: " + f"{[r.getMessage() for r in warnings]}" + ) + msg = warnings[0].getMessage() + assert f"/{name_a}" in msg, f"winner not named in warning: {msg!r}" + assert f"/{name_b}" in msg, f"loser not named in warning: {msg!r}" + assert prefix in msg, f"clamped name not in warning: {msg!r}" + + +def test_clamp_collision_with_reserved_name_emits_distinct_warning( + tmp_path: Path, caplog +) -> None: + """A skill clashing with a reserved gateway command gets its own phrasing. + + The reserved-vs-skill case is operationally different — the fix is + still "rename the skill," but there's no second skill to also + rename. The warning should say so explicitly. + """ + from hermes_cli.commands import discord_skill_commands_by_category + + # Reserved name 'help' is 4 chars — make a skill whose slug + # clamps to 'help' (so, exactly 'help'). + reserved = "help" + skills_dir = tmp_path / "skills" + d = skills_dir / "creative" / reserved + d.mkdir(parents=True) + (d / "SKILL.md").write_text("---\nname: x\n---\n") + + fake_cmds = { + f"/{reserved}": { + "name": reserved, + "description": "desc", + "skill_md_path": str(d / "SKILL.md"), + }, + } + + with caplog.at_level(logging.WARNING, logger="hermes_cli.commands"), ( + patch("agent.skill_commands.get_skill_commands", return_value=fake_cmds) + ), patch("tools.skills_tool.SKILLS_DIR", skills_dir): + categories, uncategorized, hidden = discord_skill_commands_by_category( + reserved_names={"help"}, + ) + + # Skill dropped in favor of the reserved command. + assert hidden == 1 + assert categories == {} + assert uncategorized == [] + + warnings = [ + r for r in caplog.records + if r.levelno == logging.WARNING and "reserved" in r.getMessage() + ] + assert len(warnings) == 1, ( + f"expected one reserved-name collision warning, got " + f"{[r.getMessage() for r in warnings]}" + ) + msg = warnings[0].getMessage() + assert f"/{reserved}" in msg + assert "reserved" in msg.lower() + + +def test_no_collision_no_warning(tmp_path: Path, caplog) -> None: + """Sanity: two distinct-prefix skills produce zero warnings.""" + from hermes_cli.commands import discord_skill_commands_by_category + + skills_dir = tmp_path / "skills" + for nm in ("alpha", "bravo"): + d = skills_dir / "creative" / nm + d.mkdir(parents=True) + (d / "SKILL.md").write_text("---\nname: x\n---\n") + + fake_cmds = { + "/alpha": { + "name": "alpha", "description": "", + "skill_md_path": str(skills_dir / "creative" / "alpha" / "SKILL.md"), + }, + "/bravo": { + "name": "bravo", "description": "", + "skill_md_path": str(skills_dir / "creative" / "bravo" / "SKILL.md"), + }, + } + + with caplog.at_level(logging.WARNING, logger="hermes_cli.commands"), ( + patch("agent.skill_commands.get_skill_commands", return_value=fake_cmds) + ), patch("tools.skills_tool.SKILLS_DIR", skills_dir): + categories, uncategorized, hidden = discord_skill_commands_by_category( + reserved_names=set(), + ) + + assert hidden == 0 + assert {n for n, _d, _k in categories["creative"]} == {"alpha", "bravo"} + clamp_warnings = [ + r for r in caplog.records + if r.levelno == logging.WARNING + and ("clamp" in r.getMessage() or "reserved" in r.getMessage()) + ] + assert clamp_warnings == [] From 7696ddc59eba81624014d7bfc063f8ad7fe61598 Mon Sep 17 00:00:00 2001 From: ambition0802 <673088860@qq.com> Date: Sat, 2 May 2026 02:05:57 -0700 Subject: [PATCH 18/61] fix(cli): robust paste file expansion and process_loop error handling (#17666) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two narrow fixes for long pasted messages silently disappearing: 1. _expand_paste_references: replace path.exists() + read_text() with try/except (OSError, IOError). Closes the TOCTOU window where a paste file deleted between check and read raised FileNotFoundError, bubbled up through process_loop's outer except, and silently dropped the user's input. Failures now return the placeholder text and log a warning. 2. process_loop outer except: logger.warning() instead of print(). prompt_toolkit's TUI swallows stdout, so 'Error: …' was invisible to the user. Logged errors are discoverable via hermes logs. Dropped the larger interrupt_queue→pending_input drain that was part of the original PR — that's a separate class of input-drop (in-progress interrupt handling) unrelated to the paste-file TOCTOU reported in the issue, and worth its own review. Salvage of #17939. --- cli.py | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/cli.py b/cli.py index f0ba6fc991..da917ae190 100644 --- a/cli.py +++ b/cli.py @@ -2928,7 +2928,14 @@ class HermesCLI: def _expand_ref(match): path = Path(match.group(1)) - return path.read_text(encoding="utf-8") if path.exists() else match.group(0) + # Use try/except instead of path.exists() to avoid TOCTOU race: + # the paste file may be deleted between check and read, causing + # the input to be silently dropped (#17666). + try: + return path.read_text(encoding="utf-8") + except (OSError, IOError): + logger.warning("Paste file gone or unreadable, returning placeholder: %s", path) + return match.group(0) return paste_ref_re.sub(_expand_ref, text) @@ -11584,7 +11591,7 @@ class HermesCLI: pass # Non-fatal — don't break the main loop except Exception as e: - print(f"Error: {e}") + logger.warning("process_loop unhandled error (msg may be lost): %s", e) # Start processing thread process_thread = threading.Thread(target=process_loop, daemon=True) From 50f9f389ec1df2618ec1d61a24f7358a52fbe0f8 Mon Sep 17 00:00:00 2001 From: teknium1 <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:06:00 -0700 Subject: [PATCH 19/61] chore(release): map ambition0802 email for AUTHOR_MAP Follow-up for PR #17939 salvage. --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index adb16607f0..939a485d6b 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -372,6 +372,7 @@ AUTHOR_MAP = { "e.silacandmr@gmail.com": "Es1la", "h3057183414@gmail.com": "CoreyNoDream", "franksong2702@gmail.com": "franksong2702", + "673088860@qq.com": "ambition0802", # ── bulk addition: 75 emails resolved via API, PR salvage bodies, noreply # crossref, and GH contributor list matching (April 2026 audit) ── "1115117931@qq.com": "aaronagent", From 1dce90893016a822480599d02505664c294f255c Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:08:06 -0700 Subject: [PATCH 20/61] fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(gateway): config.yaml wins over .env for agent/display/timezone settings Regression from the silent config→env bridge. The bridge at module import time is correct for max_turns (unconditional overwrite), but every other agent.*, display.*, timezone, and security bridge key was guarded by 'if X not in os.environ' — so a stale .env entry from an old 'hermes setup' run would shadow the user's current config.yaml indefinitely. Symptom: agent.max_turns: 500 in config.yaml, HERMES_MAX_ITERATIONS=60 in .env from an old setup, and the gateway silently capped at 60 iterations per turn. Gateway logs confirmed api_calls never exceeded 60. Three changes: 1. gateway/run.py: drop the 'not in os.environ' guards for all agent.*, display.*, timezone, and security.* bridge keys. config.yaml is now authoritative for these settings — same semantics already in place for max_turns, terminal.*, and auxiliary.*. Also surface the bridge failure (previously 'except Exception: pass') to stderr so operators see bridge errors instead of silently falling back to .env. 2. gateway/run.py: INFO-log the resolved max_iterations at gateway start so operators can verify the config→env bridge did the right thing instead of chasing a phantom budget ceiling. 3. hermes_cli/setup.py: stop writing HERMES_MAX_ITERATIONS to .env in the setup wizard. config.yaml is the single source of truth. Also clean up any stale .env entry left behind by pre-fix setups. Regression tests in tests/gateway/test_config_env_bridge_authority.py guard each config→env key against the 'stale .env shadows config' bug. * fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) Three issues observed in production gateway.log during a rapid restart chain on 2026-05-02, all fixed here. 1. _send_restart_notification logged unconditional success adapter.send() catches provider errors (e.g. Telegram 'Chat not found') and returns SendResult(success=False); it never raises. The caller ignored the return value and always logged 'Sent restart notification to ' at INFO, producing a misleading success line directly below the 'Failed to send Telegram message' traceback on every boot. Now inspects result.success and logs WARNING with the error otherwise. 2. WhatsApp bridge SIGTERM on shutdown classified as fatal error _check_managed_bridge_exit() saw the bridge's returncode -15 (our own SIGTERM from disconnect()) and fired the full fatal-error path, producing 'ERROR ... WhatsApp bridge process exited unexpectedly' plus 'Fatal whatsapp adapter error (whatsapp_bridge_exited)' on every planned shutdown, immediately before the normal '✓ whatsapp disconnected'. Adds a _shutting_down flag that disconnect() sets before the terminate, and _check_managed_bridge_exit() returns None for returncode in {0, -2, -15} while shutting down. OOM-kill (137) and other non-signal exits still hit the fatal path. 3. restart_drain_timeout default 60s → 180s On 2026-05-02 01:43:27 a user /restart fired while three agents were mid-API-call (82s, 112s, 154s into their turns). The 60s drain budget expired and all three were force-interrupted. 180s covers realistic in-flight agent turns; users on very-long-reasoning models can still raise it further via agent.restart_drain_timeout in config.yaml. Existing explicit user values are preserved by deep-merge. Tests - tests/gateway/test_restart_notification.py: two new tests assert INFO is only logged on SendResult(success=True) and WARNING with the error string is logged on SendResult(success=False). - tests/gateway/test_whatsapp_connect.py: parametrized test for returncode in {0, -2, -15} proves shutdown-time exits are suppressed; separate test proves returncode 137 (SIGKILL/OOM) still surfaces as fatal even when _shutting_down is set. - _check_managed_bridge_exit() reads _shutting_down via getattr-with- default so existing _make_adapter() test helpers that bypass __init__ (pitfall #17 in AGENTS.md) keep working unmodified. --- gateway/platforms/whatsapp.py | 26 +++ gateway/run.py | 84 ++++++--- hermes_cli/config.py | 7 +- hermes_cli/setup.py | 19 +- .../test_config_env_bridge_authority.py | 166 ++++++++++++++++++ tests/gateway/test_restart_notification.py | 87 ++++++++- tests/gateway/test_whatsapp_connect.py | 60 +++++++ tests/hermes_cli/test_setup_agent_settings.py | 55 +++++- 8 files changed, 472 insertions(+), 32 deletions(-) create mode 100644 tests/gateway/test_config_env_bridge_authority.py diff --git a/gateway/platforms/whatsapp.py b/gateway/platforms/whatsapp.py index a82417a601..b3e655a51b 100644 --- a/gateway/platforms/whatsapp.py +++ b/gateway/platforms/whatsapp.py @@ -185,6 +185,13 @@ class WhatsAppAdapter(BasePlatformAdapter): self._bridge_log: Optional[Path] = None self._poll_task: Optional[asyncio.Task] = None self._http_session: Optional["aiohttp.ClientSession"] = None + # Set to True by disconnect() before we SIGTERM our child bridge so + # _check_managed_bridge_exit() can distinguish an intentional + # shutdown-time exit (returncode -15 / -2 / 0) from a real crash. + # Without this, every graceful gateway shutdown/restart would log + # "Fatal whatsapp adapter error" plus dispatch a fatal-error + # notification before the normal "✓ whatsapp disconnected" fires. + self._shutting_down: bool = False def _whatsapp_require_mention(self) -> bool: configured = self.config.extra.get("require_mention") @@ -555,6 +562,21 @@ class WhatsAppAdapter(BasePlatformAdapter): if returncode is None: return None + # Planned shutdown: disconnect() sets _shutting_down before it sends + # SIGTERM to the bridge, so a returncode of -15 (SIGTERM), -2 (SIGINT), + # or 0 (clean exit) at that point is expected, not a crash. Treat it + # as informational and skip the fatal-error path. + # getattr-with-default keeps tests that construct the adapter via + # ``WhatsAppAdapter.__new__`` (bypassing __init__) working without + # every _make_adapter() helper having to seed the attribute. + if getattr(self, "_shutting_down", False) and returncode in (0, -2, -15): + logger.info( + "[%s] Bridge exited during shutdown (code %d).", + self.name, + returncode, + ) + return None + message = f"WhatsApp bridge process exited unexpectedly (code {returncode})." if not self.has_fatal_error: logger.error("[%s] %s", self.name, message) @@ -565,6 +587,10 @@ class WhatsAppAdapter(BasePlatformAdapter): async def disconnect(self) -> None: """Stop the WhatsApp bridge and clean up any orphaned processes.""" + # Flip the shutdown flag BEFORE signalling the child so the exit-check + # path (which runs from other tasks like send() and the poll loop) + # doesn't race us and report the intentional termination as fatal. + self._shutting_down = True if self._bridge_process: try: try: diff --git a/gateway/run.py b/gateway/run.py index 23c67eec09..db6fcc9756 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -407,37 +407,37 @@ if _config_path.exists(): os.environ[_env_map["base_url"]] = _base_url if _api_key: os.environ[_env_map["api_key"]] = _api_key + # config.yaml is the documented, authoritative source for these + # settings — it unconditionally wins over .env values. Previously + # the guards below read `if X not in os.environ` and let stale + # .env entries (e.g. HERMES_MAX_ITERATIONS=60 written by an old + # `hermes setup` run) silently shadow the user's current config. + # See PR #18413 / the 60-vs-500 max_turns incident. _agent_cfg = _cfg.get("agent", {}) if _agent_cfg and isinstance(_agent_cfg, dict): if "max_turns" in _agent_cfg: os.environ["HERMES_MAX_ITERATIONS"] = str(_agent_cfg["max_turns"]) - # Bridge agent.gateway_timeout → HERMES_AGENT_TIMEOUT env var. - # Env var from .env takes precedence (already in os.environ). - if "gateway_timeout" in _agent_cfg and "HERMES_AGENT_TIMEOUT" not in os.environ: + if "gateway_timeout" in _agent_cfg: os.environ["HERMES_AGENT_TIMEOUT"] = str(_agent_cfg["gateway_timeout"]) - if "gateway_timeout_warning" in _agent_cfg and "HERMES_AGENT_TIMEOUT_WARNING" not in os.environ: + if "gateway_timeout_warning" in _agent_cfg: os.environ["HERMES_AGENT_TIMEOUT_WARNING"] = str(_agent_cfg["gateway_timeout_warning"]) - if "gateway_notify_interval" in _agent_cfg and "HERMES_AGENT_NOTIFY_INTERVAL" not in os.environ: + if "gateway_notify_interval" in _agent_cfg: os.environ["HERMES_AGENT_NOTIFY_INTERVAL"] = str(_agent_cfg["gateway_notify_interval"]) - if "restart_drain_timeout" in _agent_cfg and "HERMES_RESTART_DRAIN_TIMEOUT" not in os.environ: + if "restart_drain_timeout" in _agent_cfg: os.environ["HERMES_RESTART_DRAIN_TIMEOUT"] = str(_agent_cfg["restart_drain_timeout"]) - if ( - "gateway_auto_continue_freshness" in _agent_cfg - and "HERMES_AUTO_CONTINUE_FRESHNESS" not in os.environ - ): + if "gateway_auto_continue_freshness" in _agent_cfg: os.environ["HERMES_AUTO_CONTINUE_FRESHNESS"] = str( _agent_cfg["gateway_auto_continue_freshness"] ) _display_cfg = _cfg.get("display", {}) if _display_cfg and isinstance(_display_cfg, dict): - if "busy_input_mode" in _display_cfg and "HERMES_GATEWAY_BUSY_INPUT_MODE" not in os.environ: + if "busy_input_mode" in _display_cfg: os.environ["HERMES_GATEWAY_BUSY_INPUT_MODE"] = str(_display_cfg["busy_input_mode"]) - if "busy_ack_enabled" in _display_cfg and "HERMES_GATEWAY_BUSY_ACK_ENABLED" not in os.environ: + if "busy_ack_enabled" in _display_cfg: os.environ["HERMES_GATEWAY_BUSY_ACK_ENABLED"] = str(_display_cfg["busy_ack_enabled"]) # Timezone: bridge config.yaml → HERMES_TIMEZONE env var. - # HERMES_TIMEZONE from .env takes precedence (already in os.environ). _tz_cfg = _cfg.get("timezone", "") - if _tz_cfg and isinstance(_tz_cfg, str) and "HERMES_TIMEZONE" not in os.environ: + if _tz_cfg and isinstance(_tz_cfg, str): os.environ["HERMES_TIMEZONE"] = _tz_cfg.strip() # Security settings _security_cfg = _cfg.get("security", {}) @@ -445,8 +445,24 @@ if _config_path.exists(): _redact = _security_cfg.get("redact_secrets") if _redact is not None: os.environ["HERMES_REDACT_SECRETS"] = str(_redact).lower() - except Exception: - pass # Non-fatal; gateway can still run with .env values + except Exception as _bridge_err: + # Previously this was silent (`except Exception: pass`), which + # hid partial bridge failures and let .env defaults shadow + # config.yaml values — users observed max_turns=500 in config + # but a 60-iteration cap in practice. Surface the failure to + # stderr so operators see it even though `logger` is not yet + # initialized at module-import time (logger is defined further + # down this module). + print( + f" Warning: config.yaml → env bridge failed: " + f"{type(_bridge_err).__name__}: {_bridge_err}", + file=sys.stderr, + ) + print( + " Gateway will fall back to .env values, which may not match " + "your current config.yaml. Run `hermes doctor` to investigate.", + file=sys.stderr, + ) # Apply IPv4 preference if configured (before any HTTP clients are created). try: @@ -2584,6 +2600,18 @@ class GatewayRunner: """ logger.info("Starting Hermes Gateway...") logger.info("Session storage: %s", self.config.sessions_dir) + # Log the resolved max_iterations budget so operators can verify the + # config.yaml → env bridge did the right thing at a glance (instead + # of silently running at a stale .env value for weeks). + try: + _effective_max_iter = int(os.getenv("HERMES_MAX_ITERATIONS", "90")) + logger.info( + "Agent budget: max_iterations=%d (agent.max_turns from config.yaml, " + "or HERMES_MAX_ITERATIONS from .env, or default 90)", + _effective_max_iter, + ) + except Exception: + pass try: from hermes_cli.profiles import get_active_profile_name _profile = get_active_profile_name() @@ -10453,16 +10481,28 @@ class GatewayRunner: return metadata = {"thread_id": thread_id} if thread_id else None - await adapter.send( + result = await adapter.send( chat_id, "♻ Gateway restarted successfully. Your session continues.", metadata=metadata, ) - logger.info( - "Sent restart notification to %s:%s", - platform_str, - chat_id, - ) + # adapter.send() catches provider errors (e.g. "Chat not found") + # and returns SendResult(success=False) rather than raising, so + # we must inspect the result before claiming success — otherwise + # the log line is misleading and hides real delivery failures. + if getattr(result, "success", False): + logger.info( + "Sent restart notification to %s:%s", + platform_str, + chat_id, + ) + else: + logger.warning( + "Restart notification to %s:%s was not delivered: %s", + platform_str, + chat_id, + getattr(result, "error", "unknown error"), + ) except Exception as e: logger.warning("Restart notification failed: %s", e) finally: diff --git a/hermes_cli/config.py b/hermes_cli/config.py index fe989619bb..17e10c08d6 100644 --- a/hermes_cli/config.py +++ b/hermes_cli/config.py @@ -400,7 +400,12 @@ DEFAULT_CONFIG = { # The gateway stops accepting new work, waits for running agents # to finish, then interrupts any remaining runs after the timeout. # 0 = no drain, interrupt immediately. - "restart_drain_timeout": 60, + # + # 180s is calibrated for realistic in-flight agent turns: a typical + # coding conversation mid-reasoning runs 60–150s per call, so a 60s + # budget routinely interrupted legitimate work on /restart. Raise + # further in config.yaml if you run very-long-reasoning models. + "restart_drain_timeout": 180, # Max app-level retry attempts for API errors (connection drops, # provider timeouts, 5xx, etc.) before the agent surfaces the # failure. The OpenAI SDK already does its own low-level retries diff --git a/hermes_cli/setup.py b/hermes_cli/setup.py index 3933ad8494..8f32e2cbd8 100644 --- a/hermes_cli/setup.py +++ b/hermes_cli/setup.py @@ -1643,7 +1643,11 @@ def setup_terminal_backend(config: dict): def _apply_default_agent_settings(config: dict): """Apply recommended defaults for all agent settings without prompting.""" config.setdefault("agent", {})["max_turns"] = 90 - save_env_value("HERMES_MAX_ITERATIONS", "90") + # config.yaml is the authoritative source for max_turns; the gateway + # bridges it into HERMES_MAX_ITERATIONS at startup. We no longer write + # to .env to avoid the dual-source inconsistency that caused the + # 60-vs-500 bug (stale .env entry silently shadowing config.yaml). + remove_env_value("HERMES_MAX_ITERATIONS") config.setdefault("display", {})["tool_progress"] = "all" @@ -1673,9 +1677,10 @@ def setup_agent_settings(config: dict): print() # ── Max Iterations ── - current_max = get_env_value("HERMES_MAX_ITERATIONS") or str( - cfg_get(config, "agent", "max_turns", default=90) - ) + # config.yaml is authoritative; read from there. If a legacy .env + # entry is still around (from pre-PR#18413 setups), prefer the + # config value so we don't surface a stale number to the user. + current_max = str(cfg_get(config, "agent", "max_turns", default=90)) print_info("Maximum tool-calling iterations per conversation.") print_info("Higher = more complex tasks, but costs more tokens.") print_info( @@ -1686,9 +1691,13 @@ def setup_agent_settings(config: dict): try: max_iter = int(max_iter_str) if max_iter > 0: - save_env_value("HERMES_MAX_ITERATIONS", str(max_iter)) + # Write to config.yaml (authoritative) only. Also clean up any + # stale .env entry from earlier setup runs — the gateway's + # bridge in gateway/run.py now unconditionally derives + # HERMES_MAX_ITERATIONS from agent.max_turns at startup. config.setdefault("agent", {})["max_turns"] = max_iter config.pop("max_turns", None) + remove_env_value("HERMES_MAX_ITERATIONS") print_success(f"Max iterations set to {max_iter}") except ValueError: print_warning("Invalid number, keeping current value") diff --git a/tests/gateway/test_config_env_bridge_authority.py b/tests/gateway/test_config_env_bridge_authority.py new file mode 100644 index 0000000000..26c54f1c73 --- /dev/null +++ b/tests/gateway/test_config_env_bridge_authority.py @@ -0,0 +1,166 @@ +"""Regression tests for the config.yaml → env var bridge in gateway/run.py. + +Guards against the 60-vs-500 bug where a stale `.env HERMES_MAX_ITERATIONS=60` +entry silently shadowed `agent.max_turns: 500` in config.yaml because the +bridge used `if X not in os.environ` guards. After PR#18413 the bridge +treats config.yaml as authoritative and unconditionally overwrites .env +values for `agent.*`, `display.*`, `timezone`, and `security.*` keys. +""" + +from __future__ import annotations + +import os +import subprocess +import sys +import textwrap +from pathlib import Path + +import pytest + + +PROJECT_ROOT = Path(__file__).resolve().parents[2] + + +def _run_gateway_import(hermes_home: Path, initial_env: dict[str, str]) -> dict[str, str]: + """Import gateway.run in a clean subprocess and return the post-import env. + + The bridge runs at module-import time, so simply importing is enough + to exercise it. Running in a subprocess isolates the test from other + import side effects and makes the "what ends up in os.environ" check + deterministic. + """ + script = textwrap.dedent( + f""" + import os, sys + sys.path.insert(0, {str(PROJECT_ROOT)!r}) + + try: + from gateway import run # noqa: F401 — module import triggers bridge + except Exception as exc: + print(f"IMPORT_ERROR:{{type(exc).__name__}}:{{exc}}", file=sys.stderr) + sys.exit(2) + + for k in ( + "HERMES_MAX_ITERATIONS", + "HERMES_AGENT_TIMEOUT", + "HERMES_AGENT_TIMEOUT_WARNING", + "HERMES_GATEWAY_BUSY_INPUT_MODE", + "HERMES_TIMEZONE", + ): + v = os.environ.get(k) + if v is not None: + print(f"{{k}}={{v}}") + """ + ) + env = dict(initial_env) + env["HERMES_HOME"] = str(hermes_home) + # Keep PATH / PYTHONPATH so venv imports resolve. + for k in ("PATH", "PYTHONPATH", "VIRTUAL_ENV", "HOME"): + if k in os.environ and k not in env: + env[k] = os.environ[k] + + result = subprocess.run( + [sys.executable, "-c", script], + env=env, + capture_output=True, + text=True, + timeout=60, + ) + if result.returncode != 0: + pytest.fail( + f"gateway.run import failed (rc={result.returncode})\n" + f"stderr:\n{result.stderr}\nstdout:\n{result.stdout}" + ) + out: dict[str, str] = {} + for line in result.stdout.splitlines(): + if "=" in line: + k, v = line.split("=", 1) + out[k] = v + return out + + +def _write_config(home: Path, agent_cfg: dict | None = None, display_cfg: dict | None = None, + timezone: str | None = None) -> None: + import yaml + cfg: dict = {} + if agent_cfg: + cfg["agent"] = agent_cfg + if display_cfg: + cfg["display"] = display_cfg + if timezone: + cfg["timezone"] = timezone + (home / "config.yaml").write_text(yaml.safe_dump(cfg)) + + +def _write_env(home: Path, entries: dict[str, str]) -> None: + lines = [f"{k}={v}\n" for k, v in entries.items()] + (home / ".env").write_text("".join(lines)) + + +@pytest.fixture +def hermes_home(tmp_path: Path) -> Path: + home = tmp_path / ".hermes" + home.mkdir() + return home + + +def test_config_max_turns_wins_over_stale_env(hermes_home: Path) -> None: + """Regression: config.yaml:agent.max_turns=500 must beat .env=60.""" + _write_config(hermes_home, agent_cfg={"max_turns": 500}) + _write_env(hermes_home, {"HERMES_MAX_ITERATIONS": "60"}) + + env = _run_gateway_import(hermes_home, initial_env={}) + + assert env.get("HERMES_MAX_ITERATIONS") == "500", ( + f"expected config.yaml max_turns=500 to win; got {env.get('HERMES_MAX_ITERATIONS')!r}. " + "Stale .env value is shadowing config — the bridge lost its override." + ) + + +def test_config_gateway_timeout_wins_over_stale_env(hermes_home: Path) -> None: + """Every agent.* bridge key must be config-authoritative, not .env-authoritative.""" + _write_config(hermes_home, agent_cfg={ + "gateway_timeout": 1800, + "gateway_timeout_warning": 900, + }) + _write_env(hermes_home, { + "HERMES_AGENT_TIMEOUT": "60", + "HERMES_AGENT_TIMEOUT_WARNING": "30", + }) + + env = _run_gateway_import(hermes_home, initial_env={}) + + assert env.get("HERMES_AGENT_TIMEOUT") == "1800" + assert env.get("HERMES_AGENT_TIMEOUT_WARNING") == "900" + + +def test_config_display_busy_input_mode_wins_over_stale_env(hermes_home: Path) -> None: + _write_config(hermes_home, display_cfg={"busy_input_mode": "interrupt"}) + _write_env(hermes_home, {"HERMES_GATEWAY_BUSY_INPUT_MODE": "queue"}) + + env = _run_gateway_import(hermes_home, initial_env={}) + + assert env.get("HERMES_GATEWAY_BUSY_INPUT_MODE") == "interrupt" + + +def test_config_timezone_wins_over_stale_env(hermes_home: Path) -> None: + _write_config(hermes_home, timezone="America/Los_Angeles") + _write_env(hermes_home, {"HERMES_TIMEZONE": "UTC"}) + + env = _run_gateway_import(hermes_home, initial_env={}) + + assert env.get("HERMES_TIMEZONE") == "America/Los_Angeles" + + +def test_env_value_survives_when_config_omits_key(hermes_home: Path) -> None: + """If config.yaml doesn't set max_turns, .env value must still pass through. + + The bridge only overwrites when the config key is present — an absent + config key should NOT clobber the .env value. + """ + _write_config(hermes_home, agent_cfg={}) # no max_turns + _write_env(hermes_home, {"HERMES_MAX_ITERATIONS": "123"}) + + env = _run_gateway_import(hermes_home, initial_env={}) + + assert env.get("HERMES_MAX_ITERATIONS") == "123" diff --git a/tests/gateway/test_restart_notification.py b/tests/gateway/test_restart_notification.py index 8297dfc32f..254917897f 100644 --- a/tests/gateway/test_restart_notification.py +++ b/tests/gateway/test_restart_notification.py @@ -242,4 +242,89 @@ async def test_send_restart_notification_cleans_up_on_send_failure( await runner._send_restart_notification() - assert not notify_path.exists() # cleaned up despite error + # File cleaned up even though send raised. + assert not notify_path.exists() + + +@pytest.mark.asyncio +async def test_send_restart_notification_logs_warning_on_sendresult_failure( + tmp_path, monkeypatch, caplog +): + """Adapter that returns SendResult(success=False) must log a WARNING, not INFO. + + Regression guard: adapter.send() catches provider errors (e.g. Telegram + "Chat not found") and returns SendResult(success=False) rather than + raising. The caller previously ignored the return value and always + logged "Sent restart notification to ..." at INFO — masking real + delivery failures behind a fake success line. + """ + from gateway.platforms.base import SendResult + + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + notify_path = tmp_path / ".restart_notify.json" + notify_path.write_text(json.dumps({ + "platform": "telegram", + "chat_id": "42", + })) + + runner, adapter = make_restart_runner() + adapter.send = AsyncMock( + return_value=SendResult(success=False, error="Chat not found"), + ) + + with caplog.at_level("DEBUG", logger="gateway.run"): + await runner._send_restart_notification() + + success_lines = [ + r for r in caplog.records + if r.levelname == "INFO" and "Sent restart notification" in r.getMessage() + ] + warning_lines = [ + r for r in caplog.records + if r.levelname == "WARNING" + and "was not delivered" in r.getMessage() + and "Chat not found" in r.getMessage() + ] + assert not success_lines, ( + "Expected no INFO 'Sent restart notification' line when send failed, " + f"got: {[r.getMessage() for r in success_lines]}" + ) + assert warning_lines, ( + "Expected a WARNING line mentioning the failure; " + f"got records: {[(r.levelname, r.getMessage()) for r in caplog.records]}" + ) + # Still cleans up. + assert not notify_path.exists() + + +@pytest.mark.asyncio +async def test_send_restart_notification_logs_info_on_sendresult_success( + tmp_path, monkeypatch, caplog +): + """Adapter returning SendResult(success=True) keeps the INFO log line.""" + from gateway.platforms.base import SendResult + + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + notify_path = tmp_path / ".restart_notify.json" + notify_path.write_text(json.dumps({ + "platform": "telegram", + "chat_id": "42", + })) + + runner, adapter = make_restart_runner() + adapter.send = AsyncMock(return_value=SendResult(success=True, message_id="m-1")) + + with caplog.at_level("DEBUG", logger="gateway.run"): + await runner._send_restart_notification() + + success_lines = [ + r for r in caplog.records + if r.levelname == "INFO" and "Sent restart notification" in r.getMessage() + ] + assert success_lines, ( + "Expected INFO 'Sent restart notification' when send succeeded; " + f"got records: {[(r.levelname, r.getMessage()) for r in caplog.records]}" + ) + assert not notify_path.exists() diff --git a/tests/gateway/test_whatsapp_connect.py b/tests/gateway/test_whatsapp_connect.py index 29f7eee3af..0a359fb751 100644 --- a/tests/gateway/test_whatsapp_connect.py +++ b/tests/gateway/test_whatsapp_connect.py @@ -284,6 +284,66 @@ class TestBridgeRuntimeFailure: mock_fh.close.assert_called_once() assert adapter._bridge_log_fh is None + @pytest.mark.asyncio + @pytest.mark.parametrize("returncode", [0, -2, -15]) + async def test_shutdown_suppresses_fatal_on_planned_bridge_exit(self, returncode): + """During graceful disconnect(), SIGTERM/SIGINT/clean-exit are NOT fatal. + + Regression guard for the bug where every gateway shutdown/restart + logged "Fatal whatsapp adapter error (whatsapp_bridge_exited)" and + dispatched a fatal-error notification just before the normal + "✓ whatsapp disconnected" — because _check_managed_bridge_exit() + saw the bridge's returncode of -15 (our own SIGTERM) and classified + it as an unexpected crash. + """ + adapter = _make_adapter() + fatal_handler = AsyncMock() + adapter.set_fatal_error_handler(fatal_handler) + adapter._running = True + adapter._http_session = MagicMock() + adapter._bridge_log_fh = MagicMock() + adapter._shutting_down = True # disconnect() sets this before SIGTERM + + mock_proc = MagicMock() + mock_proc.poll.return_value = returncode + adapter._bridge_process = mock_proc + + result = await adapter._check_managed_bridge_exit() + + assert result is None, ( + f"returncode={returncode} during shutdown should be suppressed, " + f"got fatal message: {result!r}" + ) + assert adapter.fatal_error_code is None + fatal_handler.assert_not_awaited() + + @pytest.mark.asyncio + async def test_shutdown_still_surfaces_nonzero_crash(self): + """Even during shutdown, a truly crashed bridge (e.g. returncode 9) is fatal. + + The suppression list is deliberately narrow (0, -2, -15) so that + OOM-kill (137), assertion failures, or custom error exits still + reach the fatal-error handler and user notification path. + """ + adapter = _make_adapter() + fatal_handler = AsyncMock() + adapter.set_fatal_error_handler(fatal_handler) + adapter._running = True + adapter._http_session = MagicMock() + adapter._bridge_log_fh = MagicMock() + adapter._shutting_down = True + + mock_proc = MagicMock() + mock_proc.poll.return_value = 137 # SIGKILL / OOM-kill + adapter._bridge_process = mock_proc + + result = await adapter._check_managed_bridge_exit() + + assert result is not None + assert "exited unexpectedly" in result + assert adapter.fatal_error_code == "whatsapp_bridge_exited" + fatal_handler.assert_awaited_once() + @pytest.mark.asyncio async def test_closed_when_http_not_ready(self): """Health endpoint never returns 200 within 15 attempts.""" diff --git a/tests/hermes_cli/test_setup_agent_settings.py b/tests/hermes_cli/test_setup_agent_settings.py index 868be7508c..b0e1d906ab 100644 --- a/tests/hermes_cli/test_setup_agent_settings.py +++ b/tests/hermes_cli/test_setup_agent_settings.py @@ -4,11 +4,16 @@ from hermes_cli.setup import setup_agent_settings def test_setup_agent_settings_uses_displayed_max_iterations_value(tmp_path, monkeypatch, capsys): - """The helper text should match the value shown in the prompt.""" + """The helper text should match the value shown in the prompt. + + After PR#18413 max_turns is read exclusively from config.yaml — the + .env `HERMES_MAX_ITERATIONS` fallback was removed because it was + shadowing the user's current config (see the 60-vs-500 incident). + """ monkeypatch.setenv("HERMES_HOME", str(tmp_path)) config = { - "agent": {"max_turns": 90}, + "agent": {"max_turns": 60}, "display": {"tool_progress": "all"}, "compression": {"threshold": 0.50}, "session_reset": {"mode": "both", "idle_minutes": 1440, "at_hour": 4}, @@ -16,10 +21,10 @@ def test_setup_agent_settings_uses_displayed_max_iterations_value(tmp_path, monk prompt_answers = iter(["60", "all", "0.5"]) - monkeypatch.setattr("hermes_cli.setup.get_env_value", lambda key: "60" if key == "HERMES_MAX_ITERATIONS" else "") monkeypatch.setattr("hermes_cli.setup.prompt", lambda *args, **kwargs: next(prompt_answers)) monkeypatch.setattr("hermes_cli.setup.prompt_choice", lambda *args, **kwargs: 4) monkeypatch.setattr("hermes_cli.setup.save_env_value", lambda *args, **kwargs: None) + monkeypatch.setattr("hermes_cli.setup.remove_env_value", lambda *args, **kwargs: None) monkeypatch.setattr("hermes_cli.setup.save_config", lambda *args, **kwargs: None) setup_agent_settings(config) @@ -27,3 +32,47 @@ def test_setup_agent_settings_uses_displayed_max_iterations_value(tmp_path, monk out = capsys.readouterr().out assert "Press Enter to keep 60." in out assert "Default is 90" not in out + + +def test_setup_agent_settings_prefers_config_over_stale_env(tmp_path, monkeypatch, capsys): + """Config.yaml wins even when a stale .env value disagrees. + + Regression guard for the bug where `.env HERMES_MAX_ITERATIONS=60` + from an old `hermes setup` run shadowed `agent.max_turns: 500` in + config.yaml. The wizard must now display the config value. + """ + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + + config = { + "agent": {"max_turns": 500}, # user bumped this in config.yaml + "display": {"tool_progress": "all"}, + "compression": {"threshold": 0.50}, + "session_reset": {"mode": "both", "idle_minutes": 1440, "at_hour": 4}, + } + + prompt_answers = iter(["500", "all", "0.5"]) + + # Simulate stale .env value — the wizard must ignore this. + monkeypatch.setattr( + "hermes_cli.setup.get_env_value", + lambda key: "60" if key == "HERMES_MAX_ITERATIONS" else "", + ) + monkeypatch.setattr("hermes_cli.setup.prompt", lambda *args, **kwargs: next(prompt_answers)) + monkeypatch.setattr("hermes_cli.setup.prompt_choice", lambda *args, **kwargs: 4) + monkeypatch.setattr("hermes_cli.setup.save_env_value", lambda *args, **kwargs: None) + + removed_keys: list[str] = [] + monkeypatch.setattr( + "hermes_cli.setup.remove_env_value", + lambda key: (removed_keys.append(key), True)[1], + ) + monkeypatch.setattr("hermes_cli.setup.save_config", lambda *args, **kwargs: None) + + setup_agent_settings(config) + + out = capsys.readouterr().out + # Config value wins + assert "Press Enter to keep 500." in out + assert "Press Enter to keep 60." not in out + # And the stale .env entry gets cleaned up + assert "HERMES_MAX_ITERATIONS" in removed_keys From 13f344c5ce2fe57b55b18b767b5945dd596971c0 Mon Sep 17 00:00:00 2001 From: luyao618 <364939526@qq.com> Date: Thu, 30 Apr 2026 20:45:20 +0800 Subject: [PATCH 21/61] fix(agent): try fallback providers at init when primary credential pool is exhausted (#17929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When a provider's credential pool has a single entry in 429-cooldown, resolve_provider_client returns None and AIAgent.__init__ raises a misleading RuntimeError suggesting the API key is missing — even when valid fallback_providers are configured. This patch makes __init__ iterate the fallback chain before raising, mirroring the existing in-flight fallback logic in the request loop. If a fallback resolves, the agent initializes against it and sets _fallback_activated=True so _restore_primary_runtime can pick the primary back up after cooldown. Closes #17929 --- run_agent.py | 52 +++++++++++--- .../test_init_fallback_on_exhausted_pool.py | 69 +++++++++++++++++++ 2 files changed, 111 insertions(+), 10 deletions(-) create mode 100644 tests/run_agent/test_init_fallback_on_exhausted_pool.py diff --git a/run_agent.py b/run_agent.py index 1d926050fc..aac067ed4e 100644 --- a/run_agent.py +++ b/run_agent.py @@ -1473,17 +1473,49 @@ class AIAgent: _env_hint = _pcfg.api_key_env_vars[0] except Exception: pass + # --- Init-time fallback (#17929) --- + _fb_entries = [] + if isinstance(fallback_model, list): + _fb_entries = [ + f for f in fallback_model + if isinstance(f, dict) and f.get("provider") and f.get("model") + ] + elif isinstance(fallback_model, dict) and fallback_model.get("provider") and fallback_model.get("model"): + _fb_entries = [fallback_model] + _fb_resolved = False + for _fb in _fb_entries: + _fb_client, _fb_model = resolve_provider_client( + _fb["provider"], model=_fb["model"], raw_codex=True, + explicit_base_url=_fb.get("base_url"), + explicit_api_key=_fb.get("api_key"), + ) + if _fb_client is not None: + self.provider = _fb["provider"] + self.model = _fb_model or _fb["model"] + self._fallback_activated = True + client_kwargs = { + "api_key": _fb_client.api_key, + "base_url": str(_fb_client.base_url), + } + if _provider_timeout is not None: + client_kwargs["timeout"] = _provider_timeout + if hasattr(_fb_client, "_default_headers") and _fb_client._default_headers: + client_kwargs["default_headers"] = dict(_fb_client._default_headers) + _fb_resolved = True + break + if not _fb_resolved: + raise RuntimeError( + f"Provider '{_explicit}' is set in config.yaml but no API key " + f"was found. Set the {_env_hint} environment " + f"variable, or switch to a different provider with `hermes model`." + ) + if not getattr(self, "_fallback_activated", False): + # No provider configured — reject with a clear message. raise RuntimeError( - f"Provider '{_explicit}' is set in config.yaml but no API key " - f"was found. Set the {_env_hint} environment " - f"variable, or switch to a different provider with `hermes model`." + "No LLM provider configured. Run `hermes model` to " + "select a provider, or run `hermes setup` for first-time " + "configuration." ) - # No provider configured — reject with a clear message. - raise RuntimeError( - "No LLM provider configured. Run `hermes model` to " - "select a provider, or run `hermes setup` for first-time " - "configuration." - ) self._client_kwargs = client_kwargs # stored for rebuilding after interrupt @@ -1536,7 +1568,7 @@ class AIAgent: else: self._fallback_chain = [] self._fallback_index = 0 - self._fallback_activated = False + self._fallback_activated = getattr(self, "_fallback_activated", False) # Legacy attribute kept for backward compat (tests, external callers) self._fallback_model = self._fallback_chain[0] if self._fallback_chain else None if self._fallback_chain and not self.quiet_mode: diff --git a/tests/run_agent/test_init_fallback_on_exhausted_pool.py b/tests/run_agent/test_init_fallback_on_exhausted_pool.py new file mode 100644 index 0000000000..8440fd3ab5 --- /dev/null +++ b/tests/run_agent/test_init_fallback_on_exhausted_pool.py @@ -0,0 +1,69 @@ +"""Regression test for #17929: AIAgent.__init__ should try fallback_model +when primary provider credentials are exhausted.""" +import pytest +from unittest.mock import patch, MagicMock +from run_agent import AIAgent + + +def _make_tool_defs(): + return [{"type": "function", "function": {"name": "web_search", + "description": "search", "parameters": {"type": "object", "properties": {}}}}] + + +def _mock_client(api_key="fb-key-1234567890", base_url="https://fb.example.com/v1"): + c = MagicMock() + c.api_key = api_key + c.base_url = base_url + c._default_headers = None + return c + + +def test_init_tries_fallback_when_primary_returns_none(): + """When resolve_provider_client returns None for primary but succeeds for + a fallback entry, __init__ should NOT raise RuntimeError.""" + fb = _mock_client() + + def fake_resolve(provider, model=None, raw_codex=False, + explicit_base_url=None, explicit_api_key=None): + if provider == "tencent-token-plan": + return fb, "kimi2.5" + return None, None # primary exhausted + + with patch("agent.auxiliary_client.resolve_provider_client", side_effect=fake_resolve), \ + patch("run_agent.get_tool_definitions", return_value=_make_tool_defs()), \ + patch("run_agent.check_toolset_requirements", return_value={}), \ + patch("run_agent.OpenAI", return_value=MagicMock()): + + agent = AIAgent( + provider="alibaba-coding-plan", + model="qwen3.6-plus", + api_key=None, + base_url=None, + quiet_mode=True, + skip_context_files=True, + skip_memory=True, + fallback_model=[{"provider": "tencent-token-plan", "model": "kimi2.5"}], + ) + assert agent.provider == "tencent-token-plan" + assert agent.model == "kimi2.5" + assert agent._fallback_activated is True + + +def test_init_raises_when_no_fallback_configured(): + """When primary returns None and no fallback is set, should raise.""" + with patch("agent.auxiliary_client.resolve_provider_client", return_value=(None, None)), \ + patch("run_agent.get_tool_definitions", return_value=_make_tool_defs()), \ + patch("run_agent.check_toolset_requirements", return_value={}), \ + patch("run_agent.OpenAI", return_value=MagicMock()): + + with pytest.raises(RuntimeError, match="no API key was found"): + AIAgent( + provider="alibaba-coding-plan", + model="qwen3.6-plus", + api_key=None, + base_url=None, + quiet_mode=True, + skip_context_files=True, + skip_memory=True, + fallback_model=None, + ) From e444d8f29cead99781cbd4306160b81887b3f4e5 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:14:35 -0700 Subject: [PATCH 22/61] fix(gateway): config.yaml wins over .env for agent/display/timezone settings (#18764) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Regression from the silent config→env bridge. The bridge at module import time is correct for max_turns (unconditional overwrite), but every other agent.*, display.*, timezone, and security bridge key was guarded by 'if X not in os.environ' — so a stale .env entry from an old 'hermes setup' run would shadow the user's current config.yaml indefinitely. Symptom: agent.max_turns: 500 in config.yaml, HERMES_MAX_ITERATIONS=60 in .env from an old setup, and the gateway silently capped at 60 iterations per turn. Gateway logs confirmed api_calls never exceeded 60. Three changes: 1. gateway/run.py: drop the 'not in os.environ' guards for all agent.*, display.*, timezone, and security.* bridge keys. config.yaml is now authoritative for these settings — same semantics already in place for max_turns, terminal.*, and auxiliary.*. Also surface the bridge failure (previously 'except Exception: pass') to stderr so operators see bridge errors instead of silently falling back to .env. 2. gateway/run.py: INFO-log the resolved max_iterations at gateway start so operators can verify the config→env bridge did the right thing instead of chasing a phantom budget ceiling. 3. hermes_cli/setup.py: stop writing HERMES_MAX_ITERATIONS to .env in the setup wizard. config.yaml is the single source of truth. Also clean up any stale .env entry left behind by pre-fix setups. Regression tests in tests/gateway/test_config_env_bridge_authority.py guard each config→env key against the 'stale .env shadows config' bug. From 38dd057e91dcc47e82478ebc31c66d67b2d96ace Mon Sep 17 00:00:00 2001 From: beibi9966 Date: Sat, 2 May 2026 02:22:37 -0700 Subject: [PATCH 23/61] fix(feishu): finalize remote document downloads inside httpx.AsyncClient context (#18502) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Snapshot Content-Type and body while the client context is still active so pooled connections fully release on exit. Previously the read happened after `async with httpx.AsyncClient(...)` returned — which works today only because httpx eagerly buffers non-streaming responses; a future refactor to `.stream()` would silently read- after-close. Part of the #18451 connection-hygiene audit. Salvage of #18502. --- gateway/platforms/feishu.py | 9 ++++-- tests/gateway/test_feishu.py | 63 ++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+), 2 deletions(-) diff --git a/gateway/platforms/feishu.py b/gateway/platforms/feishu.py index 8bc2ae816e..a6b522c4a2 100644 --- a/gateway/platforms/feishu.py +++ b/gateway/platforms/feishu.py @@ -2922,13 +2922,18 @@ class FeishuAdapter(BasePlatformAdapter): }, ) response.raise_for_status() + # Snapshot Content-Type and body while the client context is + # still active so pooled connections fully release on exit. + # See #18451. + content_type_hdr = str(response.headers.get("Content-Type", "")) + body = response.content filename = self._derive_remote_filename( file_url, - content_type=str(response.headers.get("Content-Type", "")), + content_type=content_type_hdr, default_name=preferred_name, default_ext=default_ext, ) - cached_path = cache_document_from_bytes(response.content, filename) + cached_path = cache_document_from_bytes(body, filename) return cached_path, filename @staticmethod diff --git a/tests/gateway/test_feishu.py b/tests/gateway/test_feishu.py index ea5a805729..8042d38e3f 100644 --- a/tests/gateway/test_feishu.py +++ b/tests/gateway/test_feishu.py @@ -1771,6 +1771,69 @@ class TestAdapterBehavior(unittest.TestCase): self.assertIn("GIF downgraded to file", caption) self.assertIn("look", caption) + def test_download_remote_document_reads_response_before_httpx_client_closes(self): + """#18451 — snapshot Content-Type + body while the httpx.AsyncClient + context is still active so pooled connections fully release on + exit. Otherwise the response is only readable because httpx + eagerly buffers it; a future refactor to .stream() would silently + read-after-close.""" + from gateway.config import PlatformConfig + from gateway.platforms.feishu import FeishuAdapter + + events: list[str] = [] + + class _FakeResponse: + headers = {"Content-Type": "application/octet-stream"} + + def raise_for_status(self) -> None: + events.append("raise_for_status") + + @property + def content(self) -> bytes: + events.append("content_read") + return b"doc-bytes" + + class _FakeAsyncClient: + def __init__(self, *_a: object, **_k: object) -> None: + pass + + async def __aenter__(self) -> "_FakeAsyncClient": + events.append("client_enter") + return self + + async def __aexit__(self, *exc: object) -> None: + events.append("client_exit") + + async def get(self, *_a: object, **_k: object) -> _FakeResponse: + events.append("get") + return _FakeResponse() + + with tempfile.TemporaryDirectory() as tmp: + with patch.dict(os.environ, {"HERMES_HOME": tmp}, clear=False): + adapter = FeishuAdapter(PlatformConfig()) + + async def _run() -> tuple[str, str]: + with patch("tools.url_safety.is_safe_url", return_value=True): + with patch("httpx.AsyncClient", _FakeAsyncClient): + with patch( + "gateway.platforms.feishu.cache_document_from_bytes", + return_value="/tmp/cached-doc.bin", + ): + return await adapter._download_remote_document( + "https://example.com/doc.bin", + default_ext=".bin", + preferred_name="doc", + ) + + path, filename = asyncio.run(_run()) + + self.assertEqual(path, "/tmp/cached-doc.bin") + self.assertTrue(filename) + # content_read MUST happen before client_exit — otherwise we're + # reading response body after the connection pool has been torn + # down, which only works by accident (httpx's eager buffering). + self.assertLess(events.index("content_read"), events.index("client_exit")) + def test_dedup_state_persists_across_adapter_restart(self): from gateway.config import PlatformConfig from gateway.platforms.feishu import FeishuAdapter From 762eb79f1e1985a54758d40f3d3caa2f119bd4da Mon Sep 17 00:00:00 2001 From: teknium1 <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:23:00 -0700 Subject: [PATCH 24/61] fix(gateway): tighten httpx keepalive and close whatsapp typing-response leak (#18451) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two mitigations for the CLOSE_WAIT accumulation reported against QQ Bot + Feishu on macOS behind Cloudflare Warp. 1. Shared httpx.Limits helper (gateway/platforms/_http_client_limits.py). Every long-lived platform adapter now constructs httpx.AsyncClient with max_keepalive_connections=10 and keepalive_expiry=2.0, vs httpx's default of unbounded keepalive pool and 5.0s expiry. On macOS/Warp the default 5s window let idle keepalive sockets sit in CLOSE_WAIT long enough for seven persistent adapters (QQ Bot, WeCom, DingTalk, Signal, BlueBubbles, WeCom-callback, plus the transient Feishu helper) to compound to the 256-fd ulimit. Tunable via HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY and HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE env vars. 2. whatsapp.send_typing aiohttp leak. The call was 'await self._http_session.post(...)' with no 'async with' and no variable capture — the ClientResponse went out of scope unclosed, holding its TCP socket in CLOSE_WAIT until GC. Fixed by wrapping in 'async with'. This was the only bare-await aiohttp leak in the gateway/tools/plugins tree per audit; all other aiohttp sites use the context-manager pattern correctly. The underlying reporter also saw Feishu SDK (lark-oapi) connections in CLOSE_WAIT — those are inside the SDK and out of our direct control, but tightening httpx keepalive across adapters reduces the aggregate pool pressure regardless of which individual adapter leaks. --- gateway/platforms/_http_client_limits.py | 84 +++++++++++++ gateway/platforms/bluebubbles.py | 4 +- gateway/platforms/dingtalk.py | 6 +- gateway/platforms/qqbot/adapter.py | 4 + gateway/platforms/signal.py | 4 +- gateway/platforms/wecom.py | 6 +- gateway/platforms/wecom_callback.py | 4 +- gateway/platforms/whatsapp.py | 8 +- .../test_platform_http_client_limits.py | 114 ++++++++++++++++++ 9 files changed, 227 insertions(+), 7 deletions(-) create mode 100644 gateway/platforms/_http_client_limits.py create mode 100644 tests/gateway/test_platform_http_client_limits.py diff --git a/gateway/platforms/_http_client_limits.py b/gateway/platforms/_http_client_limits.py new file mode 100644 index 0000000000..4d8a7c86e9 --- /dev/null +++ b/gateway/platforms/_http_client_limits.py @@ -0,0 +1,84 @@ +"""Shared HTTP client factory for long-lived platform adapters. + +Gateway messaging platforms (QQ Bot, Feishu, WeCom, DingTalk, Signal, +BlueBubbles, WeCom-callback) keep a persistent ``httpx.AsyncClient`` +alive for the adapter's lifetime. That amortises TLS/connection setup +across many API calls, but it also means the process's file-descriptor +pressure is sensitive to how aggressively the pool recycles idle keep- +alive connections. + +httpx's default ``keepalive_expiry`` is 5 seconds. On macOS behind +Cloudflare Warp (and other transparent proxies), peer-initiated FIN can +sit in ``CLOSE_WAIT`` longer than that before the local socket actually +drains — which, multiplied across 7 long-lived adapters plus the LLM +client and MCP clients, walks straight into the default 256 fd limit. +See #18451. + +``platform_httpx_limits()`` returns a tighter ``httpx.Limits`` the +adapter factories use instead of the httpx default. The values chosen: + +* ``max_keepalive_connections=10`` — plenty for any single adapter; + platform APIs rarely parallelise beyond this. +* ``keepalive_expiry=2.0`` — close idle sockets aggressively so a + proxy's lingering CLOSE_WAIT window can't starve the process. + +Override via ``HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY`` / +``HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE`` env vars when tuning under load. +""" + +from __future__ import annotations + +import os + +try: + import httpx +except ImportError: # pragma: no cover — optional dep + httpx = None # type: ignore[assignment] + + +_DEFAULT_KEEPALIVE_EXPIRY_S = 2.0 +_DEFAULT_MAX_KEEPALIVE = 10 + + +def platform_httpx_limits() -> "httpx.Limits | None": + """Return ``httpx.Limits`` tuned for persistent platform-adapter clients. + + Returns ``None`` when httpx isn't importable, so callers can fall + back to httpx's built-in default without a hard dependency on this + helper being reachable. + """ + if httpx is None: + return None + + def _env_float(name: str, default: float) -> float: + raw = os.environ.get(name, "").strip() + if not raw: + return default + try: + val = float(raw) + except (TypeError, ValueError): + return default + return val if val > 0 else default + + def _env_int(name: str, default: int) -> int: + raw = os.environ.get(name, "").strip() + if not raw: + return default + try: + val = int(raw) + except (TypeError, ValueError): + return default + return val if val > 0 else default + + keepalive_expiry = _env_float( + "HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY", _DEFAULT_KEEPALIVE_EXPIRY_S + ) + max_keepalive = _env_int( + "HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE", _DEFAULT_MAX_KEEPALIVE + ) + + return httpx.Limits( + max_keepalive_connections=max_keepalive, + # Leave max_connections at httpx default (100) — plenty of headroom. + keepalive_expiry=keepalive_expiry, + ) diff --git a/gateway/platforms/bluebubbles.py b/gateway/platforms/bluebubbles.py index afcbf1a7e4..31120785c0 100644 --- a/gateway/platforms/bluebubbles.py +++ b/gateway/platforms/bluebubbles.py @@ -162,7 +162,9 @@ class BlueBubblesAdapter(BasePlatformAdapter): return False from aiohttp import web - self.client = httpx.AsyncClient(timeout=30.0) + # Tighter keepalive so idle CLOSE_WAIT drains promptly (#18451). + from gateway.platforms._http_client_limits import platform_httpx_limits + self.client = httpx.AsyncClient(timeout=30.0, limits=platform_httpx_limits()) try: await self._api_get("/api/v1/ping") info = await self._api_get("/api/v1/server/info") diff --git a/gateway/platforms/dingtalk.py b/gateway/platforms/dingtalk.py index 3037e402b2..f1520e22c6 100644 --- a/gateway/platforms/dingtalk.py +++ b/gateway/platforms/dingtalk.py @@ -228,7 +228,11 @@ class DingTalkAdapter(BasePlatformAdapter): return False try: - self._http_client = httpx.AsyncClient(timeout=30.0) + # Tighter keepalive so idle CLOSE_WAIT drains promptly (#18451). + from gateway.platforms._http_client_limits import platform_httpx_limits + self._http_client = httpx.AsyncClient( + timeout=30.0, limits=platform_httpx_limits(), + ) credential = dingtalk_stream.Credential( self._client_id, self._client_secret diff --git a/gateway/platforms/qqbot/adapter.py b/gateway/platforms/qqbot/adapter.py index 10e1f62e72..c6e5d428c6 100644 --- a/gateway/platforms/qqbot/adapter.py +++ b/gateway/platforms/qqbot/adapter.py @@ -243,10 +243,14 @@ class QQAdapter(BasePlatformAdapter): return False try: + # Tighter keepalive pool so idle CLOSE_WAIT sockets drain + # faster behind proxies like Cloudflare Warp (#18451). + from gateway.platforms._http_client_limits import platform_httpx_limits self._http_client = httpx.AsyncClient( timeout=30.0, follow_redirects=True, event_hooks={"response": [_ssrf_redirect_guard]}, + limits=platform_httpx_limits(), ) # 1. Get access token diff --git a/gateway/platforms/signal.py b/gateway/platforms/signal.py index 225430600d..77d3c18cb6 100644 --- a/gateway/platforms/signal.py +++ b/gateway/platforms/signal.py @@ -248,7 +248,9 @@ class SignalAdapter(BasePlatformAdapter): except Exception as e: logger.warning("Signal: Could not acquire phone lock (non-fatal): %s", e) - self.client = httpx.AsyncClient(timeout=30.0) + # Tighter keepalive so idle CLOSE_WAIT drains promptly (#18451). + from gateway.platforms._http_client_limits import platform_httpx_limits + self.client = httpx.AsyncClient(timeout=30.0, limits=platform_httpx_limits()) try: # Health check — verify signal-cli daemon is reachable try: diff --git a/gateway/platforms/wecom.py b/gateway/platforms/wecom.py index 7ba0fa21b9..453b95a717 100644 --- a/gateway/platforms/wecom.py +++ b/gateway/platforms/wecom.py @@ -206,7 +206,11 @@ class WeComAdapter(BasePlatformAdapter): return False try: - self._http_client = httpx.AsyncClient(timeout=30.0, follow_redirects=True) + # Tighter keepalive so idle CLOSE_WAIT drains promptly (#18451). + from gateway.platforms._http_client_limits import platform_httpx_limits + self._http_client = httpx.AsyncClient( + timeout=30.0, follow_redirects=True, limits=platform_httpx_limits(), + ) await self._open_connection() self._mark_connected() self._listen_task = asyncio.create_task(self._listen_loop()) diff --git a/gateway/platforms/wecom_callback.py b/gateway/platforms/wecom_callback.py index 5440792dea..139c67fe7c 100644 --- a/gateway/platforms/wecom_callback.py +++ b/gateway/platforms/wecom_callback.py @@ -119,7 +119,9 @@ class WecomCallbackAdapter(BasePlatformAdapter): pass try: - self._http_client = httpx.AsyncClient(timeout=20.0) + # Tighter keepalive so idle CLOSE_WAIT drains promptly (#18451). + from gateway.platforms._http_client_limits import platform_httpx_limits + self._http_client = httpx.AsyncClient(timeout=20.0, limits=platform_httpx_limits()) self._app = web.Application() self._app.router.add_get("/health", self._handle_health) self._app.router.add_get(self._path, self._handle_verify) diff --git a/gateway/platforms/whatsapp.py b/gateway/platforms/whatsapp.py index b3e655a51b..921dd70d72 100644 --- a/gateway/platforms/whatsapp.py +++ b/gateway/platforms/whatsapp.py @@ -902,11 +902,15 @@ class WhatsAppAdapter(BasePlatformAdapter): try: import aiohttp - await self._http_session.post( + # Must wrap in `async with` — a bare `await session.post(...)` + # leaves the response object alive until GC, holding its TCP + # socket in CLOSE_WAIT. See #18451. + async with self._http_session.post( f"http://127.0.0.1:{self._bridge_port}/typing", json={"chatId": chat_id}, timeout=aiohttp.ClientTimeout(total=5) - ) + ): + pass except Exception: pass # Ignore typing indicator failures diff --git a/tests/gateway/test_platform_http_client_limits.py b/tests/gateway/test_platform_http_client_limits.py new file mode 100644 index 0000000000..fe613fb1f0 --- /dev/null +++ b/tests/gateway/test_platform_http_client_limits.py @@ -0,0 +1,114 @@ +"""Tests for the shared httpx.Limits helper that all long-lived platform +adapters use to tighten their keep-alive pool. + +Context: #18451 — on macOS behind Cloudflare Warp, httpx's default +keepalive_expiry=5s let idle CLOSE_WAIT sockets accumulate across +multiple long-lived gateway adapters (QQ Bot, Feishu, WeCom, DingTalk, +Signal, BlueBubbles, WeCom-callback) until the process hit the default +256 fd limit. These tests just verify the helper returns sensibly +tuned limits and respects env-var overrides; the actual fd-pressure +behaviour is only observable at runtime under load. +""" + +from __future__ import annotations + +import os + +import pytest + + +@pytest.fixture(autouse=True) +def _clear_env(monkeypatch): + monkeypatch.delenv("HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY", raising=False) + monkeypatch.delenv("HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE", raising=False) + + +def test_returns_none_when_httpx_unavailable(monkeypatch): + """If httpx can't be imported, the helper returns None so callers + fall back to httpx's built-in Limits default without raising.""" + import gateway.platforms._http_client_limits as mod + monkeypatch.setattr(mod, "httpx", None) + assert mod.platform_httpx_limits() is None + + +def test_default_limits_tighten_keepalive_below_httpx_default(): + import httpx + from gateway.platforms._http_client_limits import platform_httpx_limits + limits = platform_httpx_limits() + assert isinstance(limits, httpx.Limits) + # httpx default keepalive_expiry is 5.0 — ours must be shorter so + # CLOSE_WAIT sockets drain promptly behind proxies like Warp. + assert limits.keepalive_expiry is not None + assert limits.keepalive_expiry < 5.0 + # max_keepalive_connections must be positive and reasonable for a + # single adapter (platform APIs rarely parallelise beyond ~10). + assert limits.max_keepalive_connections is not None + assert 1 <= limits.max_keepalive_connections <= 50 + + +def test_env_override_keepalive_expiry(monkeypatch): + monkeypatch.setenv("HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY", "7.5") + from gateway.platforms._http_client_limits import platform_httpx_limits + limits = platform_httpx_limits() + assert limits.keepalive_expiry == 7.5 + + +def test_env_override_max_keepalive(monkeypatch): + monkeypatch.setenv("HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE", "25") + from gateway.platforms._http_client_limits import platform_httpx_limits + limits = platform_httpx_limits() + assert limits.max_keepalive_connections == 25 + + +def test_env_override_rejects_garbage(monkeypatch): + """Malformed env values fall back to defaults rather than raising.""" + monkeypatch.setenv("HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY", "not-a-number") + monkeypatch.setenv("HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE", "-3") + from gateway.platforms._http_client_limits import platform_httpx_limits + limits = platform_httpx_limits() + # Non-positive / non-numeric → fell back to defaults (not the override values) + assert limits.keepalive_expiry is not None and limits.keepalive_expiry > 0 + assert limits.max_keepalive_connections is not None + assert limits.max_keepalive_connections > 0 + + +def test_helper_is_importable_from_every_platform_that_uses_it(): + """Every persistent-httpx-client platform adapter imports this helper. + If any of those modules fails to import, this test surfaces it before + the regression shows up as a runtime adapter-startup crash.""" + # Just importing exercises the helper's import path for each adapter. + import gateway.platforms.qqbot.adapter # noqa: F401 + import gateway.platforms.wecom # noqa: F401 + import gateway.platforms.dingtalk # noqa: F401 + import gateway.platforms.signal # noqa: F401 + import gateway.platforms.bluebubbles # noqa: F401 + import gateway.platforms.wecom_callback # noqa: F401 + + +class TestWhatsappTypingLeakFix: + """#18451 — whatsapp.send_typing previously used a bare + `await self._http_session.post(...)` which leaked the aiohttp + response object until GC, holding its TCP socket in CLOSE_WAIT. + Must now wrap the call in `async with` so the response is + released immediately when the call returns. + + We verify by inspecting the source text rather than exercising + the coroutine — the test suite would otherwise need a live + aiohttp server, and the contract we care about is structural. + """ + + def test_bare_await_removed(self): + import inspect + import gateway.platforms.whatsapp as mod + + src = inspect.getsource(mod.WhatsAppAdapter.send_typing) + # The fix must be structural: the post() call is inside an + # `async with`, not a bare `await`. + assert "async with self._http_session.post(" in src, ( + "send_typing must wrap self._http_session.post(...) in " + "`async with` to release the aiohttp response socket " + "(#18451). Otherwise the response sits in CLOSE_WAIT " + "until GC." + ) + # The old bare-await form must be gone. + assert "await self._http_session.post(" not in src From 73bcd83dba7ed3d621d982d87fc02923964121ac Mon Sep 17 00:00:00 2001 From: teknium1 <127238744+teknium1@users.noreply.github.com> Date: Sat, 2 May 2026 02:23:05 -0700 Subject: [PATCH 25/61] chore(release): map beibi9966 email for AUTHOR_MAP Follow-up for PR #18502 salvage. --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index 939a485d6b..0c046ee46e 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -373,6 +373,7 @@ AUTHOR_MAP = { "h3057183414@gmail.com": "CoreyNoDream", "franksong2702@gmail.com": "franksong2702", "673088860@qq.com": "ambition0802", + "beibei1988@proton.me": "beibi9966", # ── bulk addition: 75 emails resolved via API, PR salvage bodies, noreply # crossref, and GH contributor list matching (April 2026 audit) ── "1115117931@qq.com": "aaronagent", From af981227937f54ccd621673f1e86ee196134a005 Mon Sep 17 00:00:00 2001 From: liuhao1024 Date: Fri, 1 May 2026 18:29:54 +0800 Subject: [PATCH 26/61] fix(auxiliary): propagate explicit_api_key to _try_openrouter() When resolve_provider_client() passes explicit_api_key for OpenRouter auxiliary tasks, _try_openrouter() now accepts and honors this parameter instead of silently ignoring it and falling back to OPENROUTER_API_KEY env var. Root cause: _try_openrouter() had no explicit_api_key parameter, so even when callers wanted to pass a runtime credential pool key, it could not be used. Fix: - Add explicit_api_key: str = None parameter to _try_openrouter() - Prioritize explicit_api_key over pool key and env var - Update resolve_provider_client() call site to pass explicit_api_key Regression coverage: - Test that explicit_api_key is passed to OpenAI client when provided - Test that fallback to OPENROUTER_API_KEY still works when explicit_api_key is None Closes #18338 --- agent/auxiliary_client.py | 10 ++-- tests/agent/test_auxiliary_client.py | 75 ++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+), 5 deletions(-) diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py index 27d4c7ed34..bed5c8d470 100644 --- a/agent/auxiliary_client.py +++ b/agent/auxiliary_client.py @@ -1149,10 +1149,10 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]: -def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]: +def _try_openrouter(explicit_api_key: str = None) -> Tuple[Optional[OpenAI], Optional[str]]: pool_present, entry = _select_pool_entry("openrouter") if pool_present: - or_key = _pool_runtime_api_key(entry) + or_key = explicit_api_key or _pool_runtime_api_key(entry) if not or_key: return None, None base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL @@ -1160,7 +1160,7 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]: return OpenAI(api_key=or_key, base_url=base_url, default_headers=_OR_HEADERS), _OPENROUTER_MODEL - or_key = os.getenv("OPENROUTER_API_KEY") + or_key = explicit_api_key or os.getenv("OPENROUTER_API_KEY") if not or_key: return None, None logger.debug("Auxiliary client: OpenRouter") @@ -2053,9 +2053,9 @@ def resolve_provider_client( return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode else (client, final_model)) - # ── OpenRouter ─────────────────────────────────────────────────── + # ── OpenRouter ─────────────────────────────────────────── if provider == "openrouter": - client, default = _try_openrouter() + client, default = _try_openrouter(explicit_api_key=explicit_api_key) if client is None: logger.warning( "resolve_provider_client: openrouter requested but %s", diff --git a/tests/agent/test_auxiliary_client.py b/tests/agent/test_auxiliary_client.py index bc74fc7306..c57a0b6372 100644 --- a/tests/agent/test_auxiliary_client.py +++ b/tests/agent/test_auxiliary_client.py @@ -1818,3 +1818,78 @@ class TestBuildCallKwargsToolDedup: provider="openai", model="gpt-4o", messages=[], tools=None, ) assert "tools" not in kwargs + + +@pytest.fixture(autouse=True) +def _clean_env(monkeypatch): + """Strip provider env vars so each test starts clean.""" + for key in ( + "OPENROUTER_API_KEY", "OPENAI_BASE_URL", "OPENAI_API_KEY", + ): + monkeypatch.delenv(key, raising=False) + + +class TestOpenRouterExplicitApiKey: + """Test that explicit_api_key is correctly propagated to _try_openrouter().""" + + def test_resolve_provider_client_passes_explicit_api_key_to_openrouter( + self, monkeypatch + ): + """ + When resolve_provider_client() is called with explicit_api_key for OpenRouter, + the explicit key should be passed to the OpenAI client instead of falling back + to OPENROUTER_API_KEY env var. + """ + # Set up env var as fallback (should NOT be used when explicit_api_key is provided) + monkeypatch.setenv("OPENROUTER_API_KEY", "env-fallback-key") + + # Mock OpenAI to capture the api_key used + mock_openai = MagicMock() + mock_openai.return_value = MagicMock(name="openrouter-client") + + with patch("agent.auxiliary_client.OpenAI", mock_openai): + client, model = resolve_provider_client( + provider="openrouter", + explicit_api_key="explicit-pool-key", + ) + + # Verify a client was created + assert client is not None + # Verify the explicit key was used, not the env var fallback + mock_openai.assert_called_once() + call_kwargs = mock_openai.call_args[1] + assert call_kwargs["api_key"] == "explicit-pool-key", ( + f"Expected explicit_api_key to be passed, got: {call_kwargs['api_key']}" + ) + assert call_kwargs["api_key"] != "env-fallback-key", ( + "Should NOT fall back to OPENROUTER_API_KEY when explicit_api_key is provided" + ) + + def test_resolve_provider_client_without_explicit_api_key_falls_back_to_env( + self, monkeypatch + ): + """ + When resolve_provider_client() is called WITHOUT explicit_api_key for OpenRouter, + it should fall back to OPENROUTER_API_KEY env var. + """ + # Set up env var as fallback (should be used when explicit_api_key is NOT provided) + monkeypatch.setenv("OPENROUTER_API_KEY", "env-fallback-key") + + # Mock OpenAI to capture the api_key used + mock_openai = MagicMock() + mock_openai.return_value = MagicMock(name="openrouter-client") + + with patch("agent.auxiliary_client.OpenAI", mock_openai): + client, model = resolve_provider_client( + provider="openrouter", + explicit_api_key=None, + ) + + # Verify a client was created + assert client is not None + # Verify the env var fallback was used + mock_openai.assert_called_once() + call_kwargs = mock_openai.call_args[1] + assert call_kwargs["api_key"] == "env-fallback-key", ( + f"Expected env fallback key to be used when explicit_api_key is None, got: {call_kwargs['api_key']}" + ) From 5d3be898a8671eb9fb99cf18f43165502f54e7f4 Mon Sep 17 00:00:00 2001 From: Siddharth Balyan <52913345+alt-glitch@users.noreply.github.com> Date: Sat, 2 May 2026 16:08:01 +0530 Subject: [PATCH 27/61] docs(tts): mention xAI custom voice support (#18776) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Point users to xAI's custom voices feature — clone your voice in the console, paste the voice_id into tts.xai.voice_id. No code changes needed; the existing TTS pipeline already handles arbitrary voice IDs. - config.py: link to xAI custom voices docs in voice_id comment - setup.py: prompt accepts custom voice IDs during xAI TTS setup - tts.md: short section linking to xAI console and docs --- hermes_cli/config.py | 2 +- hermes_cli/setup.py | 7 +++++++ website/docs/user-guide/features/tts.md | 15 ++++++++++++++- 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/hermes_cli/config.py b/hermes_cli/config.py index 17e10c08d6..9e7ff8897c 100644 --- a/hermes_cli/config.py +++ b/hermes_cli/config.py @@ -830,7 +830,7 @@ DEFAULT_CONFIG = { # Voices: alloy, echo, fable, onyx, nova, shimmer }, "xai": { - "voice_id": "eve", + "voice_id": "eve", # or custom voice ID — see https://docs.x.ai/developers/model-capabilities/audio/custom-voices "language": "en", "sample_rate": 24000, "bit_rate": 128000, diff --git a/hermes_cli/setup.py b/hermes_cli/setup.py index 8f32e2cbd8..31cb846012 100644 --- a/hermes_cli/setup.py +++ b/hermes_cli/setup.py @@ -1190,6 +1190,13 @@ def _setup_tts_provider(config: dict): "Falling back to Edge TTS." ) selected = "edge" + if selected == "xai": + print() + voice_id = prompt("xAI voice_id (Enter for 'eve', or paste a custom voice ID)") + if voice_id and voice_id.strip(): + config.setdefault("tts", {}).setdefault("xai", {})["voice_id"] = voice_id.strip() + print_success(f"xAI voice_id set to: {voice_id.strip()}") + elif selected == "minimax": existing = get_env_value("MINIMAX_API_KEY") diff --git a/website/docs/user-guide/features/tts.md b/website/docs/user-guide/features/tts.md index fa632a83b4..14d44daa89 100644 --- a/website/docs/user-guide/features/tts.md +++ b/website/docs/user-guide/features/tts.md @@ -69,7 +69,7 @@ tts: model: "gemini-2.5-flash-preview-tts" # or gemini-2.5-pro-preview-tts voice: "Kore" # 30 prebuilt voices: Zephyr, Puck, Kore, Enceladus, Gacrux, etc. xai: - voice_id: "eve" # xAI TTS voice (see https://docs.x.ai/docs/api-reference#tts) + voice_id: "eve" # or a custom voice ID — see docs below language: "en" # ISO 639-1 code sample_rate: 24000 # 22050 / 24000 (default) / 44100 / 48000 bit_rate: 128000 # MP3 bitrate; only applies when codec=mp3 @@ -127,6 +127,19 @@ Without ffmpeg, Edge TTS, MiniMax TTS, NeuTTS, KittenTTS, and Piper audio are se If you want voice bubbles without installing ffmpeg, switch to the OpenAI, ElevenLabs, or Mistral provider. ::: +### xAI Custom Voices (voice cloning) + +xAI supports cloning your voice and using it with TTS. Create a custom voice in the [xAI Console](https://console.x.ai/team/default/voice/voice-library), then set the resulting `voice_id` in your config: + +```yaml +tts: + provider: xai + xai: + voice_id: "nlbqfwie" # your custom voice ID +``` + +See the [xAI Custom Voices docs](https://docs.x.ai/developers/model-capabilities/audio/custom-voices) for details on recording, supported formats, and limits. + ### Piper (local, 44 languages) Piper is a fast, local neural TTS engine from the Open Home Foundation (the Home Assistant maintainers). It runs entirely on CPU, supports **44 languages** with pre-trained voices, and needs no API key. From d409a4409c8f11ccf029eff33a2eb9860f92e761 Mon Sep 17 00:00:00 2001 From: helix4u <4317663+helix4u@users.noreply.github.com> Date: Sat, 2 May 2026 16:23:51 -0600 Subject: [PATCH 28/61] fix(model): avoid bedrock credential probe in provider picker --- hermes_cli/model_switch.py | 49 ++++++++++++++++--- tests/hermes_cli/test_bedrock_model_picker.py | 24 +++++++++ 2 files changed, 67 insertions(+), 6 deletions(-) diff --git a/hermes_cli/model_switch.py b/hermes_cli/model_switch.py index 07455eb6fa..4c323145da 100644 --- a/hermes_cli/model_switch.py +++ b/hermes_cli/model_switch.py @@ -1057,6 +1057,45 @@ def list_authenticated_providers( if normed: _builtin_endpoints.add(normed) + def _has_fast_aws_sdk_signal() -> bool: + """Return True when explicit AWS auth config is present. + + This intentionally avoids botocore's full credential chain. Provider + picker/model-switch discovery can run for non-Bedrock providers, and + botocore may otherwise probe EC2 IMDS (169.254.169.254) on local + machines before returning no credentials. + """ + if os.environ.get("AWS_BEARER_TOKEN_BEDROCK", "").strip(): + return True + if ( + os.environ.get("AWS_ACCESS_KEY_ID", "").strip() + and os.environ.get("AWS_SECRET_ACCESS_KEY", "").strip() + ): + return True + return any( + os.environ.get(name, "").strip() + for name in ( + "AWS_PROFILE", + "AWS_CONTAINER_CREDENTIALS_RELATIVE_URI", + "AWS_CONTAINER_CREDENTIALS_FULL_URI", + "AWS_WEB_IDENTITY_TOKEN_FILE", + ) + ) + + def _has_aws_sdk_creds_for_listing(slug: str) -> bool: + """Credential check for AWS SDK providers in non-runtime discovery.""" + slug_norm = str(slug or "").strip().lower() + current_norm = str(current_provider or "").strip().lower() + if _has_fast_aws_sdk_signal(): + return True + if slug_norm != current_norm: + return False + try: + from agent.bedrock_adapter import has_aws_credentials + return bool(has_aws_credentials()) + except Exception: + return False + data = fetch_models_dev() # Build curated model lists keyed by hermes provider ID @@ -1184,7 +1223,9 @@ def list_authenticated_providers( # Check if credentials exist has_creds = False - if overlay.extra_env_vars: + if overlay.auth_type == "aws_sdk": + has_creds = _has_aws_sdk_creds_for_listing(hermes_slug) + elif overlay.extra_env_vars: has_creds = any(os.environ.get(ev) for ev in overlay.extra_env_vars) # Also check api_key_env_vars from PROVIDER_REGISTRY for api_key auth_type if not has_creds and overlay.auth_type == "api_key": @@ -1324,11 +1365,7 @@ def list_authenticated_providers( # credentials come from the boto3 credential chain (env vars, # ~/.aws/credentials, instance roles, etc.) if not _cp_has_creds and _cp_config and getattr(_cp_config, "auth_type", "") == "aws_sdk": - try: - from agent.bedrock_adapter import has_aws_credentials - _cp_has_creds = has_aws_credentials() - except Exception: - pass + _cp_has_creds = _has_aws_sdk_creds_for_listing(_cp.slug) if not _cp_has_creds: continue diff --git a/tests/hermes_cli/test_bedrock_model_picker.py b/tests/hermes_cli/test_bedrock_model_picker.py index a93dde0443..3b2c4d5dc7 100644 --- a/tests/hermes_cli/test_bedrock_model_picker.py +++ b/tests/hermes_cli/test_bedrock_model_picker.py @@ -203,6 +203,30 @@ class TestListAuthenticatedProvidersBedrock: bedrock = next((p for p in providers if p["slug"] == "bedrock"), None) assert bedrock is None, "bedrock should NOT appear when AWS credentials are absent" + def test_non_bedrock_picker_does_not_probe_full_aws_chain(self, monkeypatch): + """Non-Bedrock provider discovery must not touch boto3's full credential chain.""" + from hermes_cli.model_switch import list_authenticated_providers + + monkeypatch.delenv("AWS_PROFILE", raising=False) + monkeypatch.delenv("AWS_ACCESS_KEY_ID", raising=False) + monkeypatch.delenv("AWS_SECRET_ACCESS_KEY", raising=False) + monkeypatch.delenv("AWS_BEARER_TOKEN_BEDROCK", raising=False) + monkeypatch.delenv("AWS_WEB_IDENTITY_TOKEN_FILE", raising=False) + monkeypatch.delenv("AWS_CONTAINER_CREDENTIALS_RELATIVE_URI", raising=False) + monkeypatch.delenv("AWS_CONTAINER_CREDENTIALS_FULL_URI", raising=False) + + calls = {"has_aws_credentials": 0} + + def _has_aws_credentials(): + calls["has_aws_credentials"] += 1 + return False + + with patch("agent.bedrock_adapter.has_aws_credentials", side_effect=_has_aws_credentials): + providers = list_authenticated_providers(current_provider="openrouter", max_models=0) + + assert calls["has_aws_credentials"] == 0 + assert all(p["slug"] != "bedrock" for p in providers) + def test_bedrock_falls_back_to_curated_when_discovery_fails(self, monkeypatch): """When discover_bedrock_models() raises, fall back to curated list without crashing.""" from hermes_cli.model_switch import list_authenticated_providers From 4f37669170bb7886b94acbbf3630bf70650f7295 Mon Sep 17 00:00:00 2001 From: helix4u <4317663+helix4u@users.noreply.github.com> Date: Fri, 1 May 2026 13:50:54 -0600 Subject: [PATCH 29/61] fix(tools): reconfigure enabled unconfigured toolsets --- hermes_cli/tools_config.py | 24 ++++++++++++++++++++++- tests/hermes_cli/test_tools_config.py | 28 +++++++++++++++++++++++++++ 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/hermes_cli/tools_config.py b/hermes_cli/tools_config.py index 5edb227d95..b3df18d932 100644 --- a/hermes_cli/tools_config.py +++ b/hermes_cli/tools_config.py @@ -1822,7 +1822,7 @@ def _reconfigure_tool(config: dict): cat = TOOL_CATEGORIES.get(ts_key) reqs = TOOLSET_ENV_REQUIREMENTS.get(ts_key) if cat or reqs: - if _toolset_has_keys(ts_key, config): + if _toolset_has_keys(ts_key, config) or _toolset_enabled_for_reconfigure(ts_key, config): configurable.append((ts_key, ts_label)) if not configurable: @@ -1848,6 +1848,28 @@ def _reconfigure_tool(config: dict): save_config(config) +def _toolset_enabled_for_reconfigure(ts_key: str, config: dict) -> bool: + """Return True if a configurable toolset is enabled anywhere. + + Reconfigure must include enabled-but-unconfigured categories so users can + finish provider/API-key setup without disabling and re-enabling the toolset. + """ + for platform in PLATFORMS: + if not _toolset_allowed_for_platform(ts_key, platform): + continue + try: + enabled = _get_platform_tools( + config, + platform, + include_default_mcp_servers=False, + ) + except Exception: + continue + if ts_key in enabled: + return True + return False + + def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict): """Reconfigure a tool category - provider selection + API key update.""" icon = cat.get("icon", "") diff --git a/tests/hermes_cli/test_tools_config.py b/tests/hermes_cli/test_tools_config.py index d5b8aec3b7..abe211f4fb 100644 --- a/tests/hermes_cli/test_tools_config.py +++ b/tests/hermes_cli/test_tools_config.py @@ -8,6 +8,7 @@ from hermes_cli.tools_config import ( _configure_provider, _get_platform_tools, _platform_toolset_summary, + _reconfigure_tool, _save_platform_tools, _toolset_has_keys, CONFIGURABLE_TOOLSETS, @@ -468,6 +469,33 @@ def test_local_browser_provider_is_saved_explicitly(monkeypatch): assert config["browser"]["cloud_provider"] == "local" +def test_reconfigure_lists_enabled_web_without_existing_provider_config(monkeypatch): + config = {"platform_toolsets": {"cli": ["web"]}} + seen = {} + configured = [] + + monkeypatch.setattr( + "hermes_cli.tools_config._toolset_has_keys", + lambda ts_key, config=None: False, + ) + + def fake_prompt_choice(question, choices, default=0): + seen["choices"] = choices + return 0 + + monkeypatch.setattr("hermes_cli.tools_config._prompt_choice", fake_prompt_choice) + monkeypatch.setattr( + "hermes_cli.tools_config._configure_tool_category_for_reconfig", + lambda ts_key, cat, config: configured.append(ts_key), + ) + monkeypatch.setattr("hermes_cli.tools_config.save_config", lambda config: None) + + _reconfigure_tool(config) + + assert any("Web Search" in choice for choice in seen["choices"]) + assert configured == ["web"] + + def test_first_install_nous_auto_configures_managed_defaults(monkeypatch): monkeypatch.setattr("hermes_cli.tools_config.managed_nous_tools_enabled", lambda: True) monkeypatch.setattr("hermes_cli.nous_subscription.managed_nous_tools_enabled", lambda: True) From e26f9b207041c03d1aa9a982d29dbfda66df3a82 Mon Sep 17 00:00:00 2001 From: Henkey Date: Sat, 2 May 2026 00:16:27 +0100 Subject: [PATCH 30/61] fix(acp): route Zed thoughts to reasoning callbacks --- acp_adapter/server.py | 25 +++++++++++++++++++------ tests/acp/test_server.py | 28 ++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 6 deletions(-) diff --git a/acp_adapter/server.py b/acp_adapter/server.py index f8dade72af..7395f2557c 100644 --- a/acp_adapter/server.py +++ b/acp_adapter/server.py @@ -744,24 +744,37 @@ class HermesACPAgent(acp.Agent): tool_call_meta: dict[str, dict[str, Any]] = {} previous_approval_cb = None + streamed_message = False + if conn: tool_progress_cb = make_tool_progress_cb(conn, session_id, loop, tool_call_ids, tool_call_meta) - thinking_cb = make_thinking_cb(conn, session_id, loop) + reasoning_cb = make_thinking_cb(conn, session_id, loop) step_cb = make_step_cb(conn, session_id, loop, tool_call_ids, tool_call_meta) message_cb = make_message_cb(conn, session_id, loop) + + def stream_delta_cb(text: str) -> None: + nonlocal streamed_message + if text: + streamed_message = True + message_cb(text) + approval_cb = make_approval_callback(conn.request_permission, loop, session_id) else: tool_progress_cb = None - thinking_cb = None + reasoning_cb = None step_cb = None - message_cb = None + stream_delta_cb = None approval_cb = None agent = state.agent agent.tool_progress_callback = tool_progress_cb - agent.thinking_callback = thinking_cb + # ACP thought panes should not receive Hermes' local kawaii waiting/status + # updates. Route provider/model reasoning deltas instead; if the provider + # emits no reasoning, Zed should not get a fake "thinking" accordion. + agent.thinking_callback = None + agent.reasoning_callback = reasoning_cb agent.step_callback = step_cb - agent.message_callback = message_cb + agent.stream_delta_callback = stream_delta_cb # Approval callback is per-thread (thread-local, GHSA-qg5c-hvr5-hjgr). # Set it INSIDE _run_agent so the TLS write happens in the executor @@ -867,7 +880,7 @@ class HermesACPAgent(acp.Agent): ) except Exception: logger.debug("Failed to auto-title ACP session %s", session_id, exc_info=True) - if final_response and conn: + if final_response and conn and not streamed_message: update = acp.update_agent_message_text(final_response) await conn.session_update(session_id, update) diff --git a/tests/acp/test_server.py b/tests/acp/test_server.py index 35aafc603e..d292ade3fe 100644 --- a/tests/acp/test_server.py +++ b/tests/acp/test_server.py @@ -200,6 +200,8 @@ class TestSessionOps: "context", "reset", "compact", + "steer", + "queue", "version", ] model_cmd = next( @@ -522,6 +524,11 @@ class TestPrompt: assert isinstance(resp, PromptResponse) assert resp.stop_reason == "end_turn" state.agent.run_conversation.assert_called_once() + assert state.agent.tool_progress_callback is not None + assert state.agent.step_callback is not None + assert state.agent.stream_delta_callback is not None + assert state.agent.reasoning_callback is not None + assert state.agent.thinking_callback is None @pytest.mark.asyncio async def test_prompt_updates_history(self, agent): @@ -572,6 +579,27 @@ class TestPrompt: update = last_call[1].get("update") or last_call[0][1] assert update.session_update == "agent_message_chunk" + @pytest.mark.asyncio + async def test_prompt_does_not_duplicate_streamed_final_message(self, agent): + """If ACP already streamed response chunks, final_response should not be sent again.""" + new_resp = await agent.new_session(cwd=".") + state = agent.session_manager.get_session(new_resp.session_id) + + def mock_run(*args, **kwargs): + state.agent.stream_delta_callback("streamed answer") + return {"final_response": "streamed answer", "messages": []} + + state.agent.run_conversation = mock_run + + mock_conn = MagicMock(spec=acp.Client) + mock_conn.session_update = AsyncMock() + agent._conn = mock_conn + + prompt = [TextContentBlock(type="text", text="hello")] + await agent.prompt(prompt=prompt, session_id=new_resp.session_id) + + assert mock_conn.session_update.call_count == 1 + @pytest.mark.asyncio async def test_prompt_auto_titles_session(self, agent): new_resp = await agent.new_session(cwd=".") From ef9a08a872d1ed87eb4c91cf8ad8e8f4ef5a6e2b Mon Sep 17 00:00:00 2001 From: Henkey Date: Sat, 2 May 2026 14:06:51 +0100 Subject: [PATCH 31/61] fix(acp): polish Zed context and tool rendering --- acp_adapter/server.py | 231 ++++++++++++++++++++-- acp_adapter/tools.py | 405 +++++++++++++++++++++++++++++++++++++- tests/acp/test_mcp_e2e.py | 7 +- tests/acp/test_server.py | 141 ++++++++++++- tests/acp/test_tools.py | 147 ++++++++++++++ 5 files changed, 892 insertions(+), 39 deletions(-) diff --git a/acp_adapter/server.py b/acp_adapter/server.py index 7395f2557c..498dae88bd 100644 --- a/acp_adapter/server.py +++ b/acp_adapter/server.py @@ -4,6 +4,7 @@ from __future__ import annotations import asyncio import contextvars +import json import logging import os from collections import defaultdict, deque @@ -47,6 +48,7 @@ from acp.schema import ( TextContentBlock, UnstructuredCommandInput, Usage, + UsageUpdate, UserMessageChunk, ) @@ -65,6 +67,7 @@ from acp_adapter.events import ( ) from acp_adapter.permissions import make_approval_callback from acp_adapter.session import SessionManager, SessionState, _expand_acp_enabled_toolsets +from acp_adapter.tools import build_tool_complete, build_tool_start logger = logging.getLogger(__name__) @@ -315,6 +318,66 @@ class HermesACPAgent(acp.Agent): return target_provider, new_model + @staticmethod + def _build_usage_update(state: SessionState) -> UsageUpdate | None: + """Build ACP native context-usage data for clients like Zed. + + Zed's circular context indicator is driven by ACP ``usage_update`` + session updates: ``size`` is the model context window and ``used`` is + the current request pressure. Hermes estimates ``used`` from the same + buckets it sends to providers: system prompt, conversation history, and + tool schemas. + """ + agent = state.agent + compressor = getattr(agent, "context_compressor", None) + size = int(getattr(compressor, "context_length", 0) or 0) + if size <= 0: + return None + + try: + from agent.model_metadata import estimate_request_tokens_rough + + used = estimate_request_tokens_rough( + state.history, + system_prompt=getattr(agent, "_cached_system_prompt", "") or "", + tools=getattr(agent, "tools", None) or None, + ) + except Exception: + logger.debug("Could not estimate ACP native context usage", exc_info=True) + used = int(getattr(compressor, "last_prompt_tokens", 0) or 0) + + return UsageUpdate( + session_update="usage_update", + size=max(size, 0), + used=max(used, 0), + ) + + async def _send_usage_update(self, state: SessionState) -> None: + """Send ACP native context usage to the connected client.""" + if not self._conn: + return + update = self._build_usage_update(state) + if update is None: + return + try: + await self._conn.session_update( + session_id=state.session_id, + update=update, + ) + except Exception: + logger.warning( + "Failed to send ACP usage update for session %s", + state.session_id, + exc_info=True, + ) + + def _schedule_usage_update(self, state: SessionState) -> None: + """Schedule native context indicator refresh after ACP responses.""" + if not self._conn: + return + loop = asyncio.get_running_loop() + loop.call_soon(asyncio.create_task, self._send_usage_update(state)) + async def _register_session_mcp_servers( self, state: SessionState, @@ -485,37 +548,99 @@ class HermesACPAgent(acp.Agent): ) return None + @staticmethod + def _history_tool_call_name_args(tool_call: dict[str, Any]) -> tuple[str, dict[str, Any]]: + """Extract function name/arguments from an OpenAI-style tool_call.""" + function = tool_call.get("function") if isinstance(tool_call.get("function"), dict) else {} + name = str(function.get("name") or tool_call.get("name") or "unknown_tool") + raw_args = function.get("arguments") or tool_call.get("arguments") or tool_call.get("args") or {} + if isinstance(raw_args, str): + try: + parsed = json.loads(raw_args) + except Exception: + parsed = {"raw": raw_args} + raw_args = parsed + if not isinstance(raw_args, dict): + raw_args = {} + return name, raw_args + + @staticmethod + def _history_tool_call_id(tool_call: dict[str, Any]) -> str: + """Return the stable provider tool call id for ACP history replay.""" + return str( + tool_call.get("id") + or tool_call.get("call_id") + or tool_call.get("tool_call_id") + or "" + ).strip() + async def _replay_session_history(self, state: SessionState) -> None: """Send persisted user/assistant history to clients during session/load. Zed's ACP history UI calls ``session/load`` after the user picks an item from the Agents sidebar. The agent must then replay the full conversation - as ``user_message_chunk`` / ``agent_message_chunk`` notifications; merely - restoring server-side state makes Hermes remember context, but leaves the - editor looking like a clean thread. + as user/assistant chunks plus reconstructed tool-call start/completion + notifications; merely restoring server-side state makes Hermes remember + context, but leaves the editor looking like a clean thread. """ if not self._conn or not state.history: return - for message in state.history: - role = str(message.get("role") or "") - if role not in {"user", "assistant"}: - continue - text = self._history_message_text(message) - if not text: - continue - update = self._history_message_update(role=role, text=text) - if update is None: - continue + active_tool_calls: dict[str, tuple[str, dict[str, Any]]] = {} + + async def _send(update: Any) -> bool: try: await self._conn.session_update(session_id=state.session_id, update=update) + return True except Exception: logger.warning( "Failed to replay ACP history for session %s", state.session_id, exc_info=True, ) - return + return False + + for message in state.history: + role = str(message.get("role") or "") + + if role in {"user", "assistant"}: + text = self._history_message_text(message) + if text: + update = self._history_message_update(role=role, text=text) + if update is not None and not await _send(update): + return + + if role == "assistant" and isinstance(message.get("tool_calls"), list): + for tool_call in message["tool_calls"]: + if not isinstance(tool_call, dict): + continue + tool_call_id = self._history_tool_call_id(tool_call) + if not tool_call_id: + continue + tool_name, args = self._history_tool_call_name_args(tool_call) + active_tool_calls[tool_call_id] = (tool_name, args) + if not await _send(build_tool_start(tool_call_id, tool_name, args)): + return + continue + + if role == "tool": + tool_call_id = str(message.get("tool_call_id") or "").strip() + tool_name = str(message.get("tool_name") or "").strip() + function_args: dict[str, Any] | None = None + if tool_call_id in active_tool_calls: + tool_name, function_args = active_tool_calls.pop(tool_call_id) + if not tool_call_id or not tool_name: + continue + result = message.get("content") + if not await _send( + build_tool_complete( + tool_call_id, + tool_name, + result=result if isinstance(result, str) else None, + function_args=function_args, + ) + ): + return async def new_session( self, @@ -527,6 +652,7 @@ class HermesACPAgent(acp.Agent): await self._register_session_mcp_servers(state, mcp_servers) logger.info("New session %s (cwd=%s)", state.session_id, cwd) self._schedule_available_commands_update(state.session_id) + self._schedule_usage_update(state) return NewSessionResponse( session_id=state.session_id, models=self._build_model_state(state), @@ -547,6 +673,7 @@ class HermesACPAgent(acp.Agent): logger.info("Loaded session %s", session_id) await self._replay_session_history(state) self._schedule_available_commands_update(session_id) + self._schedule_usage_update(state) return LoadSessionResponse(models=self._build_model_state(state)) async def resume_session( @@ -564,6 +691,7 @@ class HermesACPAgent(acp.Agent): logger.info("Resumed session %s", state.session_id) await self._replay_session_history(state) self._schedule_available_commands_update(state.session_id) + self._schedule_usage_update(state) return ResumeSessionResponse(models=self._build_model_state(state)) async def cancel(self, session_id: str, **kwargs: Any) -> None: @@ -712,6 +840,7 @@ class HermesACPAgent(acp.Agent): if self._conn: update = acp.update_agent_message_text(response_text) await self._conn.session_update(session_id, update) + await self._send_usage_update(state) return PromptResponse(stop_reason="end_turn") # If Zed sends another regular prompt while the same ACP session is @@ -916,6 +1045,8 @@ class HermesACPAgent(acp.Agent): cached_read_tokens=result.get("cache_read_tokens"), ) + await self._send_usage_update(state) + stop_reason = "cancelled" if state.cancel_event and state.cancel_event.is_set() else "end_turn" return PromptResponse(stop_reason=stop_reason, usage=usage) @@ -1048,22 +1179,84 @@ class HermesACPAgent(acp.Agent): return f"Could not list tools: {e}" def _cmd_context(self, args: str, state: SessionState) -> str: + """Show ACP session context pressure and compression guidance.""" n_messages = len(state.history) - if n_messages == 0: - return "Conversation is empty (no messages yet)." - # Count by role + + # Count by role. roles: dict[str, int] = {} for msg in state.history: role = msg.get("role", "unknown") roles[role] = roles.get(role, 0) + 1 + + agent = state.agent + model = state.model or getattr(agent, "model", "") + provider = getattr(agent, "provider", None) or "auto" + compressor = getattr(agent, "context_compressor", None) + context_length = int(getattr(compressor, "context_length", 0) or 0) + threshold_tokens = int(getattr(compressor, "threshold_tokens", 0) or 0) + + try: + from agent.model_metadata import estimate_request_tokens_rough + + system_prompt = getattr(agent, "_cached_system_prompt", "") or "" + tools = getattr(agent, "tools", None) or None + approx_tokens = estimate_request_tokens_rough( + state.history, + system_prompt=system_prompt, + tools=tools, + ) + except Exception: + logger.debug("Could not estimate ACP context usage", exc_info=True) + approx_tokens = 0 + + if threshold_tokens <= 0 and context_length > 0: + threshold_tokens = int(context_length * 0.80) + lines = [ - f"Conversation: {n_messages} messages", + f"Conversation: {n_messages} messages" + if n_messages + else "Conversation is empty (no messages yet).", f" user: {roles.get('user', 0)}, assistant: {roles.get('assistant', 0)}, " f"tool: {roles.get('tool', 0)}, system: {roles.get('system', 0)}", ] - model = state.model or getattr(state.agent, "model", "") if model: lines.append(f"Model: {model}") + lines.append(f"Provider: {provider}") + + if approx_tokens > 0: + if context_length > 0: + usage_pct = (approx_tokens / context_length) * 100 + lines.append( + f"Context usage: ~{approx_tokens:,} / {context_length:,} tokens ({usage_pct:.1f}%)" + ) + else: + lines.append(f"Context usage: ~{approx_tokens:,} tokens") + + if threshold_tokens > 0: + if approx_tokens > 0: + threshold_pct = (threshold_tokens / context_length) * 100 if context_length > 0 else 0 + remaining = max(threshold_tokens - approx_tokens, 0) + if approx_tokens >= threshold_tokens: + lines.append( + f"Compression: due now (threshold ~{threshold_tokens:,}" + + (f", {threshold_pct:.0f}%" if threshold_pct else "") + + "). Run /compact." + ) + else: + lines.append( + f"Compression: ~{remaining:,} tokens until threshold " + f"(~{threshold_tokens:,}" + + (f", {threshold_pct:.0f}%" if threshold_pct else "") + + ")." + ) + else: + lines.append(f"Compression threshold: ~{threshold_tokens:,} tokens") + + if getattr(agent, "compression_enabled", True) is False: + lines.append("Compression is disabled for this agent.") + else: + lines.append("Tip: run /compact to compress manually before the threshold.") + return "\n".join(lines) def _cmd_reset(self, args: str, state: SessionState) -> str: diff --git a/acp_adapter/tools.py b/acp_adapter/tools.py index 067652106e..3c0aa3727f 100644 --- a/acp_adapter/tools.py +++ b/acp_adapter/tools.py @@ -28,6 +28,11 @@ TOOL_KIND_MAP: Dict[str, ToolKind] = { "terminal": "execute", "process": "execute", "execute_code": "execute", + # Session/meta tools + "todo": "other", + "skill_view": "read", + "skills_list": "read", + "skill_manage": "edit", # Web / fetch "web_search": "fetch", "web_extract": "fetch", @@ -51,6 +56,20 @@ TOOL_KIND_MAP: Dict[str, ToolKind] = { } +_POLISHED_TOOLS = { + "todo", + "read_file", + "search_files", + "execute_code", + "skill_view", + "skills_list", + "skill_manage", + "terminal", + "web_search", + "web_extract", +} + + def get_tool_kind(tool_name: str) -> ToolKind: """Return the ACP ToolKind for a hermes tool, defaulting to 'other'.""" return TOOL_KIND_MAP.get(tool_name, "other") @@ -91,12 +110,295 @@ def build_tool_title(tool_name: str, args: Dict[str, Any]) -> str: goal = goal[:57] + "..." return f"delegate: {goal}" if goal else "delegate task" if tool_name == "execute_code": - return "execute code" + code = str(args.get("code") or "").strip() + first_line = next((line.strip() for line in code.splitlines() if line.strip()), "") + if first_line: + if len(first_line) > 70: + first_line = first_line[:67] + "..." + return f"python: {first_line}" + return "python code" + if tool_name == "todo": + items = args.get("todos") + if isinstance(items, list): + return f"todo ({len(items)} item{'s' if len(items) != 1 else ''})" + return "todo" + if tool_name == "skill_view": + name = str(args.get("name") or "?").strip() or "?" + file_path = str(args.get("file_path") or "").strip() + suffix = f"/{file_path}" if file_path else "" + return f"skill view ({name}{suffix})" + if tool_name == "skills_list": + category = str(args.get("category") or "").strip() + return f"skills list ({category})" if category else "skills list" + if tool_name == "skill_manage": + action = str(args.get("action") or "manage").strip() or "manage" + name = str(args.get("name") or "?").strip() or "?" + file_path = str(args.get("file_path") or "").strip() + target = f"{name}/{file_path}" if file_path else name + if len(target) > 64: + target = target[:61] + "..." + return f"skill {action}: {target}" if tool_name == "vision_analyze": return f"analyze image: {args.get('question', '?')[:50]}" return tool_name +def _text(content: str) -> Any: + return acp.tool_content(acp.text_block(content)) + + +def _json_loads_maybe(value: Optional[str]) -> Any: + if not isinstance(value, str): + return value + try: + return json.loads(value) + except Exception: + pass + + # Some Hermes tools append a human hint after a JSON payload, e.g. + # ``{...}\n\n[Hint: Results truncated...]``. Keep the structured rendering path + # by decoding the first JSON value instead of falling back to raw text. + try: + decoded, _ = json.JSONDecoder().raw_decode(value.lstrip()) + return decoded + except Exception: + return None + + +def _truncate_text(text: str, limit: int = 5000) -> str: + if len(text) <= limit: + return text + return text[: max(0, limit - 100)] + f"\n... ({len(text)} chars total, truncated)" + + +def _format_todo_result(result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict) or not isinstance(data.get("todos"), list): + return None + summary = data.get("summary") if isinstance(data.get("summary"), dict) else {} + icon = { + "completed": "✅", + "in_progress": "🔄", + "pending": "⏳", + "cancelled": "✗", + } + lines = ["**Todo list**", ""] + for item in data["todos"]: + if not isinstance(item, dict): + continue + status = str(item.get("status") or "pending") + content = str(item.get("content") or item.get("id") or "").strip() + if content: + lines.append(f"- {icon.get(status, '•')} {content}") + if summary: + cancelled = summary.get("cancelled", 0) + lines.extend([ + "", + "**Progress:** " + f"{summary.get('completed', 0)} completed, " + f"{summary.get('in_progress', 0)} in progress, " + f"{summary.get('pending', 0)} pending" + + (f", {cancelled} cancelled" if cancelled else ""), + ]) + return "\n".join(lines) + + +def _format_read_file_result(result: Optional[str], args: Optional[Dict[str, Any]]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return None + if data.get("error") and not data.get("content"): + return f"Read failed: {data.get('error')}" + content = data.get("content") + if not isinstance(content, str): + return None + path = str((args or {}).get("path") or data.get("path") or "file").strip() + offset = (args or {}).get("offset") + limit = (args or {}).get("limit") + range_bits = [] + if offset: + range_bits.append(f"from line {offset}") + if limit: + range_bits.append(f"limit {limit}") + suffix = f" ({', '.join(range_bits)})" if range_bits else "" + header = f"Read {path}{suffix}" + if data.get("total_lines") is not None: + header += f" — {data.get('total_lines')} total lines" + return _truncate_text(f"{header}\n\n{content}") + + +def _format_search_files_result(result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return None + matches = data.get("matches") + if not isinstance(matches, list): + return None + + total = data.get("total_count", len(matches)) + shown = min(len(matches), 12) + truncated = bool(data.get("truncated")) or len(matches) > shown + lines = [ + "Search results", + f"Found {total} match{'es' if total != 1 else ''}; showing {shown}.", + "", + ] + + for match in matches[:shown]: + if not isinstance(match, dict): + lines.append(f"- {match}") + continue + + path = str(match.get("path") or match.get("file") or match.get("filename") or "?") + line = match.get("line") or match.get("line_number") + content = str(match.get("content") or match.get("text") or "").strip() + loc = f"{path}:{line}" if line else path + lines.append(f"- {loc}") + if content: + snippet = _truncate_text(" ".join(content.split()), 300) + lines.append(f" {snippet}") + + if truncated: + lines.extend([ + "", + "Results truncated. Narrow the search, add file_glob, or use offset to page.", + ]) + return _truncate_text("\n".join(lines), limit=7000) + + +def _format_execute_code_result(result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return result if isinstance(result, str) and result.strip() else None + output = str(data.get("output") or "") + error = str(data.get("error") or "") + exit_code = data.get("exit_code") + parts = [f"Exit code: {exit_code}" if exit_code is not None else "Execution complete"] + if output: + parts.extend(["", "Output:", output]) + if error: + parts.extend(["", "Error:", error]) + return _truncate_text("\n".join(parts)) + + +def _extract_markdown_headings(content: str, limit: int = 8) -> list[str]: + headings: list[str] = [] + for line in content.splitlines(): + stripped = line.strip() + if stripped.startswith("#"): + heading = stripped.lstrip("#").strip() + if heading: + headings.append(heading) + if len(headings) >= limit: + break + return headings + + +def _format_skill_view_result(result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return None + if data.get("success") is False: + return f"Skill view failed: {data.get('error', 'unknown error')}" + name = str(data.get("name") or "skill") + file_path = str(data.get("file") or data.get("path") or "SKILL.md") + description = str(data.get("description") or "").strip() + content = str(data.get("content") or "") + linked = data.get("linked_files") if isinstance(data.get("linked_files"), dict) else None + + lines = ["**Skill loaded**", "", f"- **Name:** `{name}`", f"- **File:** `{file_path}`"] + if description: + lines.append(f"- **Description:** {description}") + if content: + lines.append(f"- **Content:** {len(content):,} chars loaded into agent context") + if linked: + linked_count = sum(len(v) for v in linked.values() if isinstance(v, list)) + lines.append(f"- **Linked files:** {linked_count}") + + headings = _extract_markdown_headings(content) + if headings: + lines.extend(["", "**Sections**"]) + lines.extend(f"- {heading}" for heading in headings) + + lines.extend([ + "", + "_Full skill content is available to the agent but hidden here to keep ACP readable._", + ]) + return "\n".join(lines) + + +def _format_skill_manage_result(result: Optional[str], args: Optional[Dict[str, Any]]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return None + + action = str((args or {}).get("action") or "manage").strip() or "manage" + name = str((args or {}).get("name") or data.get("name") or "skill").strip() or "skill" + file_path = str((args or {}).get("file_path") or data.get("file_path") or "SKILL.md").strip() or "SKILL.md" + success = data.get("success") + status = "✅ Skill updated" if success is not False else "✗ Skill update failed" + + lines = [f"**{status}**", "", f"- **Action:** `{action}`", f"- **Skill:** `{name}`"] + if action not in {"delete"}: + lines.append(f"- **File:** `{file_path}`") + + message = str(data.get("message") or data.get("error") or "").strip() + if message: + lines.append(f"- **Result:** {message}") + + replacements = data.get("replacements") or data.get("replacement_count") + if replacements is not None: + lines.append(f"- **Replacements:** {replacements}") + + path = str(data.get("path") or "").strip() + if path: + lines.append(f"- **Path:** `{path}`") + + return "\n".join(lines) + + +def _format_web_search_result(result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return None + web = data.get("data", {}).get("web") if isinstance(data.get("data"), dict) else data.get("web") + if not isinstance(web, list): + return None + lines = [f"Web results: {len(web)}"] + for item in web[:10]: + if not isinstance(item, dict): + continue + title = str(item.get("title") or item.get("url") or "result").strip() + url = str(item.get("url") or "").strip() + desc = str(item.get("description") or "").strip() + lines.append(f"• {title}" + (f" — {url}" if url else "")) + if desc: + lines.append(f" {desc}") + return _truncate_text("\n".join(lines)) + + +def _build_polished_completion_content( + tool_name: str, + result: Optional[str], + function_args: Optional[Dict[str, Any]], +) -> Optional[List[Any]]: + formatter = { + "todo": lambda: _format_todo_result(result), + "read_file": lambda: _format_read_file_result(result, function_args), + "search_files": lambda: _format_search_files_result(result), + "execute_code": lambda: _format_execute_code_result(result), + "skill_view": lambda: _format_skill_view_result(result), + "skill_manage": lambda: _format_skill_manage_result(result, function_args), + "web_search": lambda: _format_web_search_result(result), + }.get(tool_name) + if formatter is None: + return None + text = formatter() + if not text: + return None + return [_text(text)] + + def _build_patch_mode_content(patch_text: str) -> List[Any]: """Parse V4A patch mode input into ACP diff blocks when possible.""" if not patch_text: @@ -258,7 +560,11 @@ def _build_tool_complete_content( except Exception: pass - return [acp.tool_content(acp.text_block(display_result))] + polished_content = _build_polished_completion_content(tool_name, result, function_args) + if polished_content: + return polished_content + + return [_text(display_result)] # --------------------------------------------------------------------------- @@ -302,27 +608,108 @@ def build_tool_start( if tool_name == "terminal": command = arguments.get("command", "") - content = [acp.tool_content(acp.text_block(f"$ {command}"))] + content = [_text(f"$ {command}")] return acp.start_tool_call( tool_call_id, title, kind=kind, content=content, locations=locations, - raw_input=arguments, ) if tool_name == "read_file": path = arguments.get("path", "") - content = [acp.tool_content(acp.text_block(f"Reading {path}"))] + offset = arguments.get("offset") + limit = arguments.get("limit") + bits = [] + if offset: + bits.append(f"from line {offset}") + if limit: + bits.append(f"limit {limit}") + suffix = f" ({', '.join(bits)})" if bits else "" + content = [_text(f"Reading {path}{suffix}")] return acp.start_tool_call( tool_call_id, title, kind=kind, content=content, locations=locations, - raw_input=arguments, ) if tool_name == "search_files": pattern = arguments.get("pattern", "") target = arguments.get("target", "content") - content = [acp.tool_content(acp.text_block(f"Searching for '{pattern}' ({target})"))] + search_path = arguments.get("path") + where = f" in {search_path}" if search_path else "" + content = [_text(f"Searching for '{pattern}' ({target}){where}")] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name == "todo": + items = arguments.get("todos") + if isinstance(items, list): + preview_lines = ["Updating todo list", ""] + for item in items[:8]: + if isinstance(item, dict): + preview_lines.append(f"- {item.get('status', 'pending')}: {item.get('content', item.get('id', ''))}") + if len(items) > 8: + preview_lines.append(f"... {len(items) - 8} more") + content = [_text("\n".join(preview_lines))] + else: + content = [_text("Reading todo list")] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name == "skill_view": + name = str(arguments.get("name") or "?").strip() or "?" + file_path = str(arguments.get("file_path") or "SKILL.md").strip() or "SKILL.md" + content = [_text(f"Loading skill '{name}' ({file_path})")] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name == "skill_manage": + action = str(arguments.get("action") or "manage").strip() or "manage" + name = str(arguments.get("name") or "?").strip() or "?" + file_path = str(arguments.get("file_path") or "SKILL.md").strip() or "SKILL.md" + path = f"skills/{name}/{file_path}" if file_path else f"skills/{name}" + + if action == "patch": + old = str(arguments.get("old_string") or "") + new = str(arguments.get("new_string") or "") + content = [acp.tool_diff_content(path=path, old_text=old or None, new_text=new)] + elif action in {"edit", "create"}: + content = [ + acp.tool_diff_content( + path=path, + new_text=str(arguments.get("content") or ""), + ) + ] + elif action == "write_file": + target = str(arguments.get("file_path") or "file") + content = [ + acp.tool_diff_content( + path=f"skills/{name}/{target}", + new_text=str(arguments.get("file_content") or ""), + ) + ] + elif action in {"delete", "remove_file"}: + target = str(arguments.get("file_path") or file_path or name) + content = [_text(f"Removing {target} from skill '{name}'")] + else: + content = [_text(f"Running skill_manage action '{action}' on skill '{name}' ({file_path})")] + + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name == "execute_code": + code = str(arguments.get("code") or "").strip() + preview = code[:1200] + (f"\n... ({len(code)} chars total, truncated)" if len(code) > 1200 else "") + content = [_text(f"Running Python helper script:\n\n```python\n{preview}\n```" if preview else "Running Python helper script")] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name == "web_search": + query = str(arguments.get("query") or "").strip() + content = [_text(f"Searching the web for: {query}" if query else "Searching the web")] return acp.start_tool_call( tool_call_id, title, kind=kind, content=content, locations=locations, - raw_input=arguments, ) # Generic fallback @@ -358,7 +745,7 @@ def build_tool_complete( kind=kind, status="completed", content=content, - raw_output=result, + raw_output=None if tool_name in _POLISHED_TOOLS else result, ) diff --git a/tests/acp/test_mcp_e2e.py b/tests/acp/test_mcp_e2e.py index 45aed78e4f..dab4607198 100644 --- a/tests/acp/test_mcp_e2e.py +++ b/tests/acp/test_mcp_e2e.py @@ -178,9 +178,10 @@ class TestMcpRegistrationE2E: complete_event = completions[0] assert isinstance(complete_event, ToolCallProgress) assert complete_event.status == "completed" - # rawOutput should contain the tool result string - assert complete_event.raw_output is not None - assert "hello" in str(complete_event.raw_output) + # Completion should contain human-readable output rather than forcing raw JSON panes. + assert complete_event.content + assert "hello" in complete_event.content[0].content.text + assert complete_event.raw_output is None def test_patch_mode_tool_start_emits_diff_blocks_for_v4a_patch(self): update = build_tool_start( diff --git a/tests/acp/test_server.py b/tests/acp/test_server.py index d292ade3fe..282a4553c0 100644 --- a/tests/acp/test_server.py +++ b/tests/acp/test_server.py @@ -27,7 +27,10 @@ from acp.schema import ( SetSessionModeResponse, SessionInfo, TextContentBlock, + ToolCallProgress, + ToolCallStart, Usage, + UsageUpdate, UserMessageChunk, ) from acp_adapter.server import HermesACPAgent, HERMES_VERSION @@ -210,6 +213,46 @@ class TestSessionOps: assert model_cmd.input is not None assert model_cmd.input.root.hint == "model name to switch to" + def test_build_usage_update_for_zed_context_indicator(self, agent, mock_manager): + state = mock_manager.create_session(cwd="/tmp") + state.history = [{"role": "user", "content": "hello"}] + state.agent.context_compressor = MagicMock(context_length=100_000) + state.agent._cached_system_prompt = "system" + state.agent.tools = [{"type": "function", "function": {"name": "demo"}}] + + with patch( + "agent.model_metadata.estimate_request_tokens_rough", + return_value=25_000, + ): + update = agent._build_usage_update(state) + + assert isinstance(update, UsageUpdate) + assert update.session_update == "usage_update" + assert update.size == 100_000 + assert update.used == 25_000 + + @pytest.mark.asyncio + async def test_send_usage_update_to_client(self, agent, mock_manager): + state = mock_manager.create_session(cwd="/tmp") + state.agent.context_compressor = MagicMock(context_length=100_000) + mock_conn = MagicMock(spec=acp.Client) + mock_conn.session_update = AsyncMock() + agent._conn = mock_conn + + with patch( + "agent.model_metadata.estimate_request_tokens_rough", + return_value=25_000, + ): + await agent._send_usage_update(state) + + mock_conn.session_update.assert_awaited_once() + call = mock_conn.session_update.await_args + assert call.kwargs["session_id"] == state.session_id + update = call.kwargs["update"] + assert isinstance(update, UsageUpdate) + assert update.size == 100_000 + assert update.used == 25_000 + @pytest.mark.asyncio async def test_cancel_sets_event(self, agent): resp = await agent.new_session(cwd=".") @@ -240,7 +283,25 @@ class TestSessionOps: {"role": "system", "content": "hidden system"}, {"role": "user", "content": "what controls the / slash commands?"}, {"role": "assistant", "content": "HermesACPAgent._ADVERTISED_COMMANDS controls them."}, - {"role": "tool", "content": "tool output should not replay"}, + { + "role": "assistant", + "content": "", + "tool_calls": [ + { + "id": "call_search_1", + "type": "function", + "function": { + "name": "search_files", + "arguments": '{"pattern":"slash commands","path":"."}', + }, + } + ], + }, + { + "role": "tool", + "tool_call_id": "call_search_1", + "content": '{"total_count":1,"matches":[{"path":"cli.py","line":42,"content":"slash commands"}]}', + }, ] mock_conn.session_update.reset_mock() @@ -259,6 +320,21 @@ class TestSessionOps: assert isinstance(replay_calls[1].kwargs["update"], AgentMessageChunk) assert replay_calls[1].kwargs["update"].content.text.startswith("HermesACPAgent") + tool_updates = [ + call.kwargs["update"] + for call in calls + if getattr(call.kwargs.get("update"), "session_update", None) + in {"tool_call", "tool_call_update"} + ] + assert len(tool_updates) == 2 + assert isinstance(tool_updates[0], ToolCallStart) + assert tool_updates[0].tool_call_id == "call_search_1" + assert tool_updates[0].title == "search: slash commands" + assert isinstance(tool_updates[1], ToolCallProgress) + assert tool_updates[1].tool_call_id == "call_search_1" + assert "Search results" in tool_updates[1].content[0].content.text + assert "cli.py:42" in tool_updates[1].content[0].content.text + @pytest.mark.asyncio async def test_resume_session_replays_persisted_history_to_client(self, agent): mock_conn = MagicMock(spec=acp.Client) @@ -572,12 +648,13 @@ class TestPrompt: prompt = [TextContentBlock(type="text", text="help me")] await agent.prompt(prompt=prompt, session_id=new_resp.session_id) - # session_update should have been called with the final message + # session_update should include the final message (usage_update may follow it) mock_conn.session_update.assert_called() - # Get the last call's update argument - last_call = mock_conn.session_update.call_args_list[-1] - update = last_call[1].get("update") or last_call[0][1] - assert update.session_update == "agent_message_chunk" + updates = [ + call.kwargs.get("update") or call.args[1] + for call in mock_conn.session_update.call_args_list + ] + assert any(update.session_update == "agent_message_chunk" for update in updates) @pytest.mark.asyncio async def test_prompt_does_not_duplicate_streamed_final_message(self, agent): @@ -598,7 +675,13 @@ class TestPrompt: prompt = [TextContentBlock(type="text", text="hello")] await agent.prompt(prompt=prompt, session_id=new_resp.session_id) - assert mock_conn.session_update.call_count == 1 + updates = [ + call.kwargs.get("update") or call.args[1] + for call in mock_conn.session_update.call_args_list + ] + agent_chunks = [update for update in updates if update.session_update == "agent_message_chunk"] + assert len(agent_chunks) == 1 + assert agent_chunks[0].content.text == "streamed answer" @pytest.mark.asyncio async def test_prompt_auto_titles_session(self, agent): @@ -736,6 +819,43 @@ class TestSlashCommands: assert "2 messages" in result assert "user: 1" in result + def test_context_shows_usage_and_compression_threshold(self, agent, mock_manager): + state = self._make_state(mock_manager) + state.history = [{"role": "user", "content": "hello"}] + state.agent.context_compressor = MagicMock( + context_length=100_000, + threshold_tokens=80_000, + ) + state.agent._cached_system_prompt = "system" + state.agent.tools = [{"type": "function", "function": {"name": "demo"}}] + + with patch( + "agent.model_metadata.estimate_request_tokens_rough", + return_value=25_000, + ): + result = agent._handle_slash_command("/context", state) + + assert "Context usage: ~25,000 / 100,000 tokens (25.0%)" in result + assert "Compression: ~55,000 tokens until threshold (~80,000, 80%)" in result + assert "Tip: run /compact" in result + + def test_context_says_compression_due_when_past_threshold(self, agent, mock_manager): + state = self._make_state(mock_manager) + state.history = [{"role": "user", "content": "hello"}] + state.agent.context_compressor = MagicMock( + context_length=100_000, + threshold_tokens=80_000, + ) + + with patch( + "agent.model_metadata.estimate_request_tokens_rough", + return_value=82_000, + ): + result = agent._handle_slash_command("/context", state) + + assert "Context usage: ~82,000 / 100,000 tokens (82.0%)" in result + assert "Compression: due now (threshold ~80,000, 80%). Run /compact." in result + def test_reset_clears_history(self, agent, mock_manager): state = self._make_state(mock_manager) state.history = [{"role": "user", "content": "hello"}] @@ -815,7 +935,12 @@ class TestSlashCommands: resp = await agent.prompt(prompt=prompt, session_id=new_resp.session_id) assert resp.stop_reason == "end_turn" - mock_conn.session_update.assert_called_once() + updates = [ + call.kwargs.get("update") or call.args[1] + for call in mock_conn.session_update.call_args_list + ] + assert any(update.session_update == "agent_message_chunk" for update in updates) + assert any(update.session_update == "usage_update" for update in updates) @pytest.mark.asyncio async def test_unknown_slash_falls_through_to_llm(self, agent, mock_manager): diff --git a/tests/acp/test_tools.py b/tests/acp/test_tools.py index 603fe7459c..fa576b6144 100644 --- a/tests/acp/test_tools.py +++ b/tests/acp/test_tools.py @@ -52,6 +52,12 @@ class TestToolKindMap: def test_tool_kind_execute_code(self): assert get_tool_kind("execute_code") == "execute" + def test_tool_kind_todo(self): + assert get_tool_kind("todo") == "other" + + def test_tool_kind_skill_view(self): + assert get_tool_kind("skill_view") == "read" + def test_tool_kind_browser_navigate(self): assert get_tool_kind("browser_navigate") == "fetch" @@ -110,6 +116,25 @@ class TestBuildToolTitle: title = build_tool_title("web_search", {"query": "python asyncio"}) assert "python asyncio" in title + def test_skill_view_title_includes_skill_name(self): + title = build_tool_title("skill_view", {"name": "github-pitfalls"}) + assert title == "skill view (github-pitfalls)" + + def test_skill_view_title_includes_linked_file(self): + title = build_tool_title("skill_view", {"name": "github-pitfalls", "file_path": "references/api.md"}) + assert title == "skill view (github-pitfalls/references/api.md)" + + def test_execute_code_title_includes_first_code_line(self): + title = build_tool_title("execute_code", {"code": "\nfrom hermes_tools import terminal\nprint('done')"}) + assert title == "python: from hermes_tools import terminal" + + def test_skill_manage_title_includes_action_and_target(self): + title = build_tool_title( + "skill_manage", + {"action": "patch", "name": "hermes-agent-operations", "file_path": "references/acp.md"}, + ) + assert title == "skill patch: hermes-agent-operations/references/acp.md" + def test_unknown_tool_uses_name(self): title = build_tool_title("some_new_tool", {"foo": "bar"}) assert title == "some_new_tool" @@ -181,6 +206,48 @@ class TestBuildToolStart: assert isinstance(result, ToolCallStart) assert result.kind == "search" assert "TODO" in result.content[0].content.text + assert result.raw_input is None + + def test_build_tool_start_for_todo_is_human_readable(self): + args = {"todos": [{"id": "one", "content": "Fix ACP rendering", "status": "in_progress"}]} + result = build_tool_start("tc-todo", "todo", args) + assert result.title == "todo (1 item)" + assert "Fix ACP rendering" in result.content[0].content.text + assert result.raw_input is None + + def test_build_tool_start_for_skill_view_is_human_readable(self): + result = build_tool_start("tc-skill", "skill_view", {"name": "github-pitfalls"}) + assert result.title == "skill view (github-pitfalls)" + assert "github-pitfalls" in result.content[0].content.text + assert result.raw_input is None + + def test_build_tool_start_for_execute_code_shows_code_preview(self): + result = build_tool_start("tc-code", "execute_code", {"code": "print('hello')"}) + assert result.kind == "execute" + assert result.title == "python: print('hello')" + assert "```python" in result.content[0].content.text + assert "print('hello')" in result.content[0].content.text + assert result.raw_input is None + + def test_build_tool_start_for_skill_manage_patch_shows_diff(self): + result = build_tool_start( + "tc-skill-manage", + "skill_manage", + { + "action": "patch", + "name": "hermes-agent-operations", + "file_path": "references/acp.md", + "old_string": "old advice", + "new_string": "new advice", + }, + ) + assert result.kind == "edit" + assert result.title == "skill patch: hermes-agent-operations/references/acp.md" + assert isinstance(result.content[0], FileEditToolCallContent) + assert result.content[0].path == "skills/hermes-agent-operations/references/acp.md" + assert result.content[0].old_text == "old advice" + assert result.content[0].new_text == "new advice" + assert result.raw_input is None def test_build_tool_start_generic_fallback(self): """Unknown tools should get a generic text representation.""" @@ -205,6 +272,86 @@ class TestBuildToolComplete: content_item = result.content[0] assert isinstance(content_item, ContentToolCallContent) assert "total 42" in content_item.content.text + assert result.raw_output is None + + def test_build_tool_complete_for_todo_is_checklist(self): + result = build_tool_complete( + "tc-todo", + "todo", + '{"todos":[{"id":"a","content":"Inspect ACP","status":"completed"},{"id":"b","content":"Patch renderers","status":"in_progress"}],"summary":{"total":2,"pending":0,"in_progress":1,"completed":1,"cancelled":0}}', + ) + text = result.content[0].content.text + assert "✅ Inspect ACP" in text + assert "- 🔄 Patch renderers" in text + assert "**Progress:** 1 completed, 1 in progress, 0 pending" in text + assert result.raw_output is None + + def test_build_tool_complete_for_skill_view_summarizes_content_without_raw_json(self): + result = build_tool_complete( + "tc-skill", + "skill_view", + '{"success":true,"name":"github-pitfalls","description":"GitHub gotchas","content":"# GitHub Pitfalls\\nUse gh carefully.","path":"github/github-pitfalls/SKILL.md"}', + ) + text = result.content[0].content.text + assert "**Skill loaded**" in text + assert "`github-pitfalls`" in text + assert "GitHub gotchas" in text + assert "GitHub Pitfalls" in text + assert "Use gh carefully" not in text + assert "Full skill content is available to the agent" in text + assert result.raw_output is None + + def test_build_tool_complete_for_execute_code_formats_output(self): + result = build_tool_complete("tc-code", "execute_code", '{"output":"hello\\n","exit_code":0}') + text = result.content[0].content.text + assert "Exit code: 0" in text + assert "hello" in text + assert result.raw_output is None + + def test_build_tool_complete_for_skill_manage_summarizes_without_raw_json(self): + result = build_tool_complete( + "tc-skill-manage", + "skill_manage", + '{"success":true,"message":"Patched references/hermes-acp-zed-rendering.md in skill \'hermes-agent-operations\' (1 replacement)."}', + function_args={ + "action": "patch", + "name": "hermes-agent-operations", + "file_path": "references/hermes-acp-zed-rendering.md", + }, + ) + text = result.content[0].content.text + assert "**✅ Skill updated**" in text + assert "`patch`" in text + assert "`hermes-agent-operations`" in text + assert "references/hermes-acp-zed-rendering.md" in text + assert "{\"success\"" not in text + assert result.raw_output is None + + def test_build_tool_complete_for_read_file_formats_content(self): + result = build_tool_complete( + "tc-read", + "read_file", + '{"content":"1|hello\\n2|world","total_lines":2}', + function_args={"path":"README.md","offset":1,"limit":20}, + ) + text = result.content[0].content.text + assert "Read README.md" in text + assert "1|hello" in text + assert result.raw_output is None + + def test_build_tool_complete_for_search_files_formats_matches(self): + result = build_tool_complete( + "tc-search", + "search_files", + '{"total_count":2,"matches":[{"path":"README.md","line":3,"content":"TODO: fix this"},{"path":"src/app.py","line":9,"content":"needle"}],"truncated":true}\n\n[Hint: Results truncated. Use offset=12 to see more.]', + ) + text = result.content[0].content.text + assert "Search results" in text + assert "Found 2 matches" in text + assert "README.md:3" in text + assert "TODO: fix this" in text + assert "Results truncated" in text + assert result.raw_output is None def test_build_tool_complete_truncates_large_output(self): """Very large outputs should be truncated.""" From 72c8037a24b58b7b1a38a99903cc0bf8a3d7595c Mon Sep 17 00:00:00 2001 From: Henkey Date: Sat, 2 May 2026 14:40:48 +0100 Subject: [PATCH 32/61] fix(acp): polish common tool rendering --- acp_adapter/tools.py | 442 ++++++++++++++++++++++++++++++++++++++-- tests/acp/test_tools.py | 64 ++++++ 2 files changed, 492 insertions(+), 14 deletions(-) diff --git a/acp_adapter/tools.py b/acp_adapter/tools.py index 3c0aa3727f..8fc9eacf07 100644 --- a/acp_adapter/tools.py +++ b/acp_adapter/tools.py @@ -57,16 +57,24 @@ TOOL_KIND_MAP: Dict[str, ToolKind] = { _POLISHED_TOOLS = { - "todo", - "read_file", - "search_files", - "execute_code", - "skill_view", - "skills_list", - "skill_manage", - "terminal", - "web_search", - "web_extract", + # Core operator loop + "todo", "memory", "session_search", "delegate_task", + # Files / execution + "read_file", "write_file", "patch", "search_files", "terminal", "process", "execute_code", + # Skills / web / browser / media + "skill_view", "skills_list", "skill_manage", "web_search", "web_extract", + "browser_navigate", "browser_click", "browser_type", "browser_press", "browser_scroll", + "browser_back", "browser_snapshot", "browser_console", "browser_get_images", "browser_vision", + "vision_analyze", "image_generate", "text_to_speech", + # Schedulers / platform integrations + "cronjob", "send_message", "clarify", "discord", "discord_admin", + "ha_list_entities", "ha_get_state", "ha_list_services", "ha_call_service", + "feishu_doc_read", "feishu_drive_list_comments", "feishu_drive_list_comment_replies", + "feishu_drive_reply_comment", "feishu_drive_add_comment", + "kanban_create", "kanban_show", "kanban_comment", "kanban_complete", + "kanban_block", "kanban_link", "kanban_heartbeat", + "yb_query_group_info", "yb_query_group_members", "yb_search_sticker", + "yb_send_dm", "yb_send_sticker", "mixture_of_agents", } @@ -104,11 +112,25 @@ def build_tool_title(tool_name: str, args: Dict[str, Any]) -> str: if urls: return f"extract: {urls[0]}" + (f" (+{len(urls)-1})" if len(urls) > 1 else "") return "web extract" + if tool_name == "process": + action = str(args.get("action") or "").strip() or "manage" + sid = str(args.get("session_id") or "").strip() + return f"process {action}: {sid}" if sid else f"process {action}" if tool_name == "delegate_task": + tasks = args.get("tasks") + if isinstance(tasks, list) and tasks: + return f"delegate batch ({len(tasks)} tasks)" goal = args.get("goal", "") if goal and len(goal) > 60: goal = goal[:57] + "..." return f"delegate: {goal}" if goal else "delegate task" + if tool_name == "session_search": + query = str(args.get("query") or "").strip() + return f"session search: {query}" if query else "recent sessions" + if tool_name == "memory": + action = str(args.get("action") or "manage").strip() or "manage" + target = str(args.get("target") or "memory").strip() or "memory" + return f"memory {action}: {target}" if tool_name == "execute_code": code = str(args.get("code") or "").strip() first_line = next((line.strip() for line in code.splitlines() if line.strip()), "") @@ -138,8 +160,23 @@ def build_tool_title(tool_name: str, args: Dict[str, Any]) -> str: if len(target) > 64: target = target[:61] + "..." return f"skill {action}: {target}" + if tool_name == "browser_navigate": + return f"navigate: {args.get('url', '?')}" + if tool_name == "browser_snapshot": + return "browser snapshot" + if tool_name == "browser_vision": + return f"browser vision: {str(args.get('question', '?'))[:50]}" + if tool_name == "browser_get_images": + return "browser images" if tool_name == "vision_analyze": - return f"analyze image: {args.get('question', '?')[:50]}" + return f"analyze image: {str(args.get('question', '?'))[:50]}" + if tool_name == "image_generate": + prompt = str(args.get("prompt") or args.get("description") or "").strip() + return f"generate image: {prompt[:50]}" if prompt else "generate image" + if tool_name == "cronjob": + action = str(args.get("action") or "manage").strip() or "manage" + job_id = str(args.get("job_id") or args.get("id") or "").strip() + return f"cron {action}: {job_id}" if job_id else f"cron {action}" return tool_name @@ -377,6 +414,301 @@ def _format_web_search_result(result: Optional[str]) -> Optional[str]: return _truncate_text("\n".join(lines)) +def _format_web_extract_result(result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return None + if data.get("success") is False and data.get("error"): + return f"Web extract failed: {data.get('error')}" + results = data.get("results") + if not isinstance(results, list): + return None + lines = [f"Web extract: {len(results)} URL{'s' if len(results) != 1 else ''}"] + for item in results[:5]: + if not isinstance(item, dict): + continue + url = str(item.get("url") or "").strip() + title = str(item.get("title") or url or "Untitled").strip() + error = str(item.get("error") or "").strip() + content = str(item.get("content") or "").strip() + lines.extend(["", f"### {title}"]) + if url: + lines.append(url) + if error and error not in {"None", "null"}: + lines.append(f"Error: {error}") + continue + if content: + headings = _extract_markdown_headings(content, limit=5) + lines.append(f"Content: {len(content):,} chars") + if headings: + lines.append("Headings: " + "; ".join(headings)) + excerpt = _truncate_text(content, limit=1200) + lines.extend(["", excerpt]) + else: + lines.append("No content returned.") + if len(results) > 5: + lines.append(f"\n... {len(results) - 5} more result(s) omitted") + return _truncate_text("\n".join(lines), limit=7000) + + +def _format_process_result(result: Optional[str], args: Optional[Dict[str, Any]]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return result if isinstance(result, str) and result.strip() else None + if data.get("success") is False and data.get("error"): + return f"Process error: {data.get('error')}" + action = str((args or {}).get("action") or "process").strip() or "process" + if isinstance(data.get("processes"), list): + processes = data["processes"] + lines = [f"Processes: {len(processes)}"] + for proc in processes[:20]: + if not isinstance(proc, dict): + lines.append(f"- {proc}") + continue + sid = str(proc.get("session_id") or proc.get("id") or "?") + status = str(proc.get("status") or ("exited" if proc.get("exited") else "running")) + cmd = str(proc.get("command") or "").strip() + pid = proc.get("pid") + code = proc.get("exit_code") + bits = [status] + if pid is not None: + bits.append(f"pid {pid}") + if code is not None: + bits.append(f"exit {code}") + lines.append(f"- `{sid}` — {', '.join(bits)}" + (f" — {cmd[:120]}" if cmd else "")) + if len(processes) > 20: + lines.append(f"... {len(processes) - 20} more process(es)") + return "\n".join(lines) + + status = str(data.get("status") or data.get("state") or action).strip() + sid = str(data.get("session_id") or (args or {}).get("session_id") or "").strip() + lines = [f"Process {action}: {status}" + (f" (`{sid}`)" if sid else "")] + for key, label in (("command", "Command"), ("pid", "PID"), ("exit_code", "Exit code"), ("returncode", "Exit code"), ("lines", "Lines")): + if data.get(key) is not None: + lines.append(f"- **{label}:** {data.get(key)}") + output = data.get("output") or data.get("new_output") or data.get("log") or data.get("stdout") + error = data.get("error") or data.get("stderr") + if output: + lines.extend(["", "Output:", _truncate_text(str(output), limit=5000)]) + if error: + lines.extend(["", "Error:", _truncate_text(str(error), limit=2000)]) + msg = data.get("message") + if msg and not output and not error: + lines.append(str(msg)) + return _truncate_text("\n".join(lines), limit=7000) + + +def _format_delegate_result(result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return None + if data.get("error") and not isinstance(data.get("results"), list): + return f"Delegation failed: {data.get('error')}" + results = data.get("results") + if not isinstance(results, list): + return None + total = data.get("total_duration_seconds") + lines = [f"Delegation results: {len(results)} task{'s' if len(results) != 1 else ''}" + (f" in {total}s" if total is not None else "")] + icon = {"completed": "✅", "failed": "✗", "error": "✗", "timeout": "⏱", "interrupted": "⚠"} + for item in results: + if not isinstance(item, dict): + lines.append(f"- {item}") + continue + idx = item.get("task_index") + status = str(item.get("status") or "unknown") + model = item.get("model") + dur = item.get("duration_seconds") + role = item.get("_child_role") + header = f"{icon.get(status, '•')} Task {idx + 1 if isinstance(idx, int) else '?'}: {status}" + bits = [] + if model: + bits.append(str(model)) + if role: + bits.append(f"role={role}") + if dur is not None: + bits.append(f"{dur}s") + if bits: + header += " (" + ", ".join(bits) + ")" + lines.extend(["", header]) + summary = str(item.get("summary") or "").strip() + error = str(item.get("error") or "").strip() + if summary: + lines.append(_truncate_text(summary, limit=1200)) + if error: + lines.append("Error: " + _truncate_text(error, limit=800)) + trace = item.get("tool_trace") + if isinstance(trace, list) and trace: + names = [str(t.get("tool") or "?") for t in trace if isinstance(t, dict)] + if names: + lines.append("Tools: " + ", ".join(names[:12]) + (f" (+{len(names)-12})" if len(names) > 12 else "")) + return _truncate_text("\n".join(lines), limit=8000) + + +def _format_session_search_result(result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return None + if data.get("success") is False: + return f"Session search failed: {data.get('error', 'unknown error')}" + results = data.get("results") + if not isinstance(results, list): + return None + mode = data.get("mode") or "search" + query = data.get("query") + lines = ["Recent sessions" if mode == "recent" else f"Session search results" + (f" for `{query}`" if query else "")] + if not results: + lines.append(str(data.get("message") or "No matching sessions found.")) + return "\n".join(lines) + for item in results: + if not isinstance(item, dict): + continue + sid = str(item.get("session_id") or "?") + title = str(item.get("title") or item.get("when") or "Untitled session").strip() + when = str(item.get("last_active") or item.get("started_at") or item.get("when") or "").strip() + count = item.get("message_count") + source = str(item.get("source") or "").strip() + meta = ", ".join(str(x) for x in [when, source, f"{count} msgs" if count is not None else ""] if x) + lines.append(f"- **{title}** (`{sid}`)" + (f" — {meta}" if meta else "")) + summary = str(item.get("summary") or item.get("preview") or "").strip() + if summary: + lines.append(" " + _truncate_text(" ".join(summary.split()), limit=500)) + return _truncate_text("\n".join(lines), limit=7000) + + +def _format_memory_result(result: Optional[str], args: Optional[Dict[str, Any]]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return None + action = str((args or {}).get("action") or "memory").strip() or "memory" + target = str(data.get("target") or (args or {}).get("target") or "memory") + if data.get("success") is False: + lines = [f"✗ Memory {action} failed ({target})", str(data.get("error") or "unknown error")] + matches = data.get("matches") + if isinstance(matches, list) and matches: + lines.append("Matches:") + lines.extend(f"- {_truncate_text(str(m), 160)}" for m in matches[:5]) + return "\n".join(lines) + lines = [f"✅ Memory {action} saved ({target})"] + if data.get("message"): + lines.append(str(data.get("message"))) + if data.get("entry_count") is not None: + lines.append(f"Entries: {data.get('entry_count')}") + if data.get("usage"): + lines.append(f"Usage: {data.get('usage')}") + # Avoid dumping all memory entries into ACP UI; show only the explicit new value preview. + preview = str((args or {}).get("content") or (args or {}).get("old_text") or "").strip() + if preview: + lines.append("Preview: " + _truncate_text(preview, limit=300)) + return "\n".join(lines) + + +def _format_edit_result(tool_name: str, result: Optional[str], args: Optional[Dict[str, Any]]) -> Optional[str]: + data = _json_loads_maybe(result) + path = str((args or {}).get("path") or "file").strip() + if isinstance(data, dict): + if data.get("success") is False or data.get("error"): + return f"{tool_name} failed for {path}: {data.get('error', 'unknown error')}" + message = str(data.get("message") or "").strip() + replacements = data.get("replacements") or data.get("replacement_count") + lines = [f"✅ {tool_name} completed" + (f" for `{path}`" if path else "")] + if message: + lines.append(message) + if replacements is not None: + lines.append(f"Replacements: {replacements}") + if data.get("files_modified"): + files = data.get("files_modified") + if isinstance(files, list): + lines.append("Files: " + ", ".join(f"`{f}`" for f in files[:8])) + return "\n".join(lines) + if isinstance(result, str) and result.strip(): + return _truncate_text(result, limit=3000) + return f"✅ {tool_name} completed" + (f" for `{path}`" if path else "") + + +def _format_browser_result(tool_name: str, result: Optional[str], args: Optional[Dict[str, Any]]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return result if isinstance(result, str) and result.strip() else None + if data.get("success") is False or data.get("error"): + return f"{tool_name} failed: {data.get('error', 'unknown error')}" + if tool_name == "browser_get_images": + images = data.get("images") or data.get("data") + if isinstance(images, list): + lines = [f"Images found: {len(images)}"] + for img in images[:12]: + if isinstance(img, dict): + alt = str(img.get("alt") or "").strip() + url = str(img.get("url") or img.get("src") or "").strip() + lines.append(f"- {alt or 'image'}" + (f" — {url}" if url else "")) + return _truncate_text("\n".join(lines), limit=5000) + title = str(data.get("title") or data.get("url") or data.get("status") or tool_name) + text = str(data.get("text") or data.get("content") or data.get("snapshot") or data.get("analysis") or data.get("message") or "").strip() + lines = [title] + if data.get("url") and data.get("url") != title: + lines.append(str(data.get("url"))) + if text: + lines.extend(["", _truncate_text(text, limit=5000)]) + return _truncate_text("\n".join(lines), limit=7000) + + +def _format_media_or_cron_result(tool_name: str, result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, dict): + return result if isinstance(result, str) and result.strip() else None + if data.get("success") is False or data.get("error"): + return f"{tool_name} failed: {data.get('error', 'unknown error')}" + lines = [f"✅ {tool_name} completed"] + for key in ("file_path", "path", "url", "image_url", "job_id", "id", "status", "message", "next_run"): + if data.get(key): + lines.append(f"- **{key}:** {data.get(key)}") + return "\n".join(lines) + + +def _format_generic_structured_result(tool_name: str, result: Optional[str]) -> Optional[str]: + data = _json_loads_maybe(result) + if not isinstance(data, (dict, list)): + return result if isinstance(result, str) and result.strip() else None + if isinstance(data, list): + lines = [f"{tool_name}: {len(data)} item{'s' if len(data) != 1 else ''}"] + for item in data[:12]: + lines.append(f"- {_truncate_text(str(item), limit=240)}") + return _truncate_text("\n".join(lines), limit=5000) + + if data.get("success") is False or data.get("error"): + return f"{tool_name} failed: {data.get('error', 'unknown error')}" + + lines = [f"✅ {tool_name} completed" if data.get("success") is True else f"{tool_name} result"] + priority_keys = ( + "message", "status", "id", "task_id", "issue_id", "title", "name", "entity_id", + "state", "service", "url", "path", "file_path", "count", "total", "next_run", + ) + seen = set() + for key in priority_keys: + value = data.get(key) + if value in (None, "", [], {}): + continue + seen.add(key) + lines.append(f"- **{key}:** {_truncate_text(str(value), limit=500)}") + + for key, value in data.items(): + if key in seen or key in {"success", "raw", "content", "entries"}: + continue + if value in (None, "", [], {}): + continue + if isinstance(value, (dict, list)): + preview = json.dumps(value, ensure_ascii=False, default=str) + else: + preview = str(value) + lines.append(f"- **{key}:** {_truncate_text(preview, limit=500)}") + if len(lines) >= 14: + break + + content = data.get("content") + if isinstance(content, str) and content.strip(): + lines.extend(["", _truncate_text(content.strip(), limit=1500)]) + return _truncate_text("\n".join(lines), limit=7000) + + def _build_polished_completion_content( tool_name: str, result: Optional[str], @@ -385,12 +717,28 @@ def _build_polished_completion_content( formatter = { "todo": lambda: _format_todo_result(result), "read_file": lambda: _format_read_file_result(result, function_args), + "write_file": lambda: _format_edit_result(tool_name, result, function_args), + "patch": lambda: _format_edit_result(tool_name, result, function_args), "search_files": lambda: _format_search_files_result(result), "execute_code": lambda: _format_execute_code_result(result), + "process": lambda: _format_process_result(result, function_args), + "delegate_task": lambda: _format_delegate_result(result), + "session_search": lambda: _format_session_search_result(result), + "memory": lambda: _format_memory_result(result, function_args), "skill_view": lambda: _format_skill_view_result(result), "skill_manage": lambda: _format_skill_manage_result(result, function_args), "web_search": lambda: _format_web_search_result(result), + "web_extract": lambda: _format_web_extract_result(result), + "browser_navigate": lambda: _format_browser_result(tool_name, result, function_args), + "browser_snapshot": lambda: _format_browser_result(tool_name, result, function_args), + "browser_vision": lambda: _format_browser_result(tool_name, result, function_args), + "browser_get_images": lambda: _format_browser_result(tool_name, result, function_args), + "vision_analyze": lambda: _format_media_or_cron_result(tool_name, result), + "image_generate": lambda: _format_media_or_cron_result(tool_name, result), + "cronjob": lambda: _format_media_or_cron_result(tool_name, result), }.get(tool_name) + if formatter is None and tool_name in _POLISHED_TOOLS: + formatter = lambda: _format_generic_structured_result(tool_name, result) if formatter is None: return None text = formatter() @@ -594,7 +942,6 @@ def build_tool_start( content = _build_patch_mode_content(patch_text) return acp.start_tool_call( tool_call_id, title, kind=kind, content=content, locations=locations, - raw_input=arguments, ) if tool_name == "write_file": @@ -603,7 +950,6 @@ def build_tool_start( content = [acp.tool_diff_content(path=path, new_text=file_content)] return acp.start_tool_call( tool_call_id, title, kind=kind, content=content, locations=locations, - raw_input=arguments, ) if tool_name == "terminal": @@ -712,6 +1058,74 @@ def build_tool_start( tool_call_id, title, kind=kind, content=content, locations=locations, ) + if tool_name == "web_extract": + urls = arguments.get("urls") if isinstance(arguments.get("urls"), list) else [] + preview = "\n".join(f"- {url}" for url in urls[:5]) or "Extracting web content" + content = [_text("Extracting content from:\n" + preview if urls else preview)] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name == "process": + action = str(arguments.get("action") or "").strip() or "manage" + sid = str(arguments.get("session_id") or "").strip() + data_preview = str(arguments.get("data") or "").strip() + text = f"Process action: {action}" + (f"\nSession: {sid}" if sid else "") + if data_preview: + text += "\nInput: " + _truncate_text(data_preview, limit=500) + content = [_text(text)] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name == "delegate_task": + tasks = arguments.get("tasks") + if isinstance(tasks, list) and tasks: + lines = [f"Delegating {len(tasks)} tasks", ""] + for i, task in enumerate(tasks[:8], 1): + if isinstance(task, dict): + goal = str(task.get("goal") or "").strip() + role = str(task.get("role") or "").strip() + lines.append(f"{i}. " + _truncate_text(goal, limit=160) + (f" ({role})" if role else "")) + if len(tasks) > 8: + lines.append(f"... {len(tasks) - 8} more") + content = [_text("\n".join(lines))] + else: + goal = str(arguments.get("goal") or "").strip() + content = [_text("Delegating task" + (f":\n{_truncate_text(goal, limit=800)}" if goal else ""))] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name == "session_search": + query = str(arguments.get("query") or "").strip() + content = [_text(f"Searching past sessions for: {query}" if query else "Loading recent sessions")] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name == "memory": + action = str(arguments.get("action") or "manage").strip() or "manage" + target = str(arguments.get("target") or "memory").strip() or "memory" + preview = str(arguments.get("content") or arguments.get("old_text") or "").strip() + text = f"Memory {action} ({target})" + if preview: + text += "\nPreview: " + _truncate_text(preview, limit=500) + content = [_text(text)] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + + if tool_name in _POLISHED_TOOLS: + try: + args_text = json.dumps(arguments, indent=2, default=str) + except (TypeError, ValueError): + args_text = str(arguments) + content = [_text(_truncate_text(args_text, limit=1200))] + return acp.start_tool_call( + tool_call_id, title, kind=kind, content=content, locations=locations, + ) + # Generic fallback import json try: @@ -721,7 +1135,7 @@ def build_tool_start( content = [acp.tool_content(acp.text_block(args_text))] return acp.start_tool_call( tool_call_id, title, kind=kind, content=content, locations=locations, - raw_input=arguments, + raw_input=None if tool_name in _POLISHED_TOOLS else arguments, ) diff --git a/tests/acp/test_tools.py b/tests/acp/test_tools.py index fa576b6144..40423174a2 100644 --- a/tests/acp/test_tools.py +++ b/tests/acp/test_tools.py @@ -353,6 +353,70 @@ class TestBuildToolComplete: assert "Results truncated" in text assert result.raw_output is None + def test_build_tool_complete_for_process_list_formats_table(self): + result = build_tool_complete( + "tc-process", + "process", + '{"processes":[{"session_id":"p1","status":"running","pid":123,"command":"npm run dev"}]}', + function_args={"action":"list"}, + ) + text = result.content[0].content.text + assert "Processes: 1" in text + assert "`p1`" in text + assert "npm run dev" in text + assert result.raw_output is None + + def test_build_tool_complete_for_delegate_task_summarizes_children(self): + result = build_tool_complete( + "tc-delegate", + "delegate_task", + '{"results":[{"task_index":0,"status":"completed","summary":"Reviewed ACP rendering.","model":"gpt-5.5","duration_seconds":3.2,"tool_trace":[{"tool":"read_file"}]}],"total_duration_seconds":3.4}', + ) + text = result.content[0].content.text + assert "Delegation results: 1 task" in text + assert "Reviewed ACP rendering" in text + assert "gpt-5.5" in text + assert "Tools: read_file" in text + assert result.raw_output is None + + def test_build_tool_complete_for_session_search_recent(self): + result = build_tool_complete( + "tc-session", + "session_search", + '{"success":true,"mode":"recent","results":[{"session_id":"s1","title":"ACP work","last_active":"2026-05-02","message_count":12,"preview":"Polished tool rendering."}],"count":1}', + ) + text = result.content[0].content.text + assert "Recent sessions" in text + assert "ACP work" in text + assert "Polished tool rendering" in text + assert result.raw_output is None + + def test_build_tool_complete_for_memory_avoids_dumping_entries(self): + result = build_tool_complete( + "tc-memory", + "memory", + '{"success":true,"target":"user","entries":["private long memory"],"usage":"1% — 19/2000 chars","entry_count":1,"message":"Entry added."}', + function_args={"action":"add","target":"user","content":"User likes concise ACP rendering."}, + ) + text = result.content[0].content.text + assert "Memory add saved" in text + assert "User likes concise ACP rendering" in text + assert "private long memory" not in text + assert result.raw_output is None + + def test_build_tool_complete_for_web_extract_summarizes_urls(self): + result = build_tool_complete( + "tc-web-extract", + "web_extract", + '{"results":[{"url":"https://example.com","title":"Example","content":"# Intro\\nThis is extracted content."}]}', + ) + text = result.content[0].content.text + assert "Web extract: 1 URL" in text + assert "Example" in text + assert "Content:" in text + assert "Intro" in text + assert result.raw_output is None + def test_build_tool_complete_truncates_large_output(self): """Very large outputs should be truncated.""" big_output = "x" * 10000 From b294d1d0229ff6026838a04c4cb59c3b13e4827f Mon Sep 17 00:00:00 2001 From: Henkey Date: Sat, 2 May 2026 14:45:16 +0100 Subject: [PATCH 33/61] fix(acp): keep read-file starts compact --- acp_adapter/tools.py | 15 ++++----------- tests/acp/test_tools.py | 8 +++----- 2 files changed, 7 insertions(+), 16 deletions(-) diff --git a/acp_adapter/tools.py b/acp_adapter/tools.py index 8fc9eacf07..2f85ebb773 100644 --- a/acp_adapter/tools.py +++ b/acp_adapter/tools.py @@ -960,18 +960,11 @@ def build_tool_start( ) if tool_name == "read_file": - path = arguments.get("path", "") - offset = arguments.get("offset") - limit = arguments.get("limit") - bits = [] - if offset: - bits.append(f"from line {offset}") - if limit: - bits.append(f"limit {limit}") - suffix = f" ({', '.join(bits)})" if bits else "" - content = [_text(f"Reading {path}{suffix}")] + # The title and location already identify the file. Sending a synthetic + # "Reading ..." content block makes Zed render an unhelpful Output + # section before the real file contents arrive on completion. return acp.start_tool_call( - tool_call_id, title, kind=kind, content=content, locations=locations, + tool_call_id, title, kind=kind, content=None, locations=locations, ) if tool_name == "search_files": diff --git a/tests/acp/test_tools.py b/tests/acp/test_tools.py index 40423174a2..8de1292172 100644 --- a/tests/acp/test_tools.py +++ b/tests/acp/test_tools.py @@ -189,15 +189,13 @@ class TestBuildToolStart: assert "ls -la /tmp" in text def test_build_tool_start_for_read_file(self): - """read_file should include the path in content.""" + """read_file start should stay compact; completion carries file contents.""" args = {"path": "/etc/hosts", "offset": 1, "limit": 50} result = build_tool_start("tc-3", "read_file", args) assert isinstance(result, ToolCallStart) assert result.kind == "read" - assert len(result.content) >= 1 - content_item = result.content[0] - assert isinstance(content_item, ContentToolCallContent) - assert "/etc/hosts" in content_item.content.text + assert result.content is None + assert result.raw_input is None def test_build_tool_start_for_search(self): """search_files should include pattern in content.""" From eb612f55748d8f0888f09f055abd86afef925150 Mon Sep 17 00:00:00 2001 From: Henkey Date: Sat, 2 May 2026 15:22:01 +0100 Subject: [PATCH 34/61] fix(acp): keep web extract rendering compact --- acp_adapter/tools.py | 36 ++++++++++++------------------------ tests/acp/test_tools.py | 18 +++++++++++++++--- 2 files changed, 27 insertions(+), 27 deletions(-) diff --git a/acp_adapter/tools.py b/acp_adapter/tools.py index 2f85ebb773..de871229e0 100644 --- a/acp_adapter/tools.py +++ b/acp_adapter/tools.py @@ -424,31 +424,20 @@ def _format_web_extract_result(result: Optional[str]) -> Optional[str]: if not isinstance(results, list): return None lines = [f"Web extract: {len(results)} URL{'s' if len(results) != 1 else ''}"] - for item in results[:5]: + for item in results[:10]: if not isinstance(item, dict): continue url = str(item.get("url") or "").strip() title = str(item.get("title") or url or "Untitled").strip() error = str(item.get("error") or "").strip() - content = str(item.get("content") or "").strip() - lines.extend(["", f"### {title}"]) - if url: - lines.append(url) - if error and error not in {"None", "null"}: - lines.append(f"Error: {error}") - continue - if content: - headings = _extract_markdown_headings(content, limit=5) - lines.append(f"Content: {len(content):,} chars") - if headings: - lines.append("Headings: " + "; ".join(headings)) - excerpt = _truncate_text(content, limit=1200) - lines.extend(["", excerpt]) - else: - lines.append("No content returned.") - if len(results) > 5: - lines.append(f"\n... {len(results) - 5} more result(s) omitted") - return _truncate_text("\n".join(lines), limit=7000) + status = "failed" if error and error not in {"None", "null"} else "extracted" + suffix = f" — {status}" + lines.append(f"- {title}" + (f" — {url}" if url and url != title else "") + suffix) + if status == "failed": + lines.append(f" Error: {_truncate_text(error, limit=500)}") + if len(results) > 10: + lines.append(f"... {len(results) - 10} more result(s) omitted") + return "\n".join(lines) def _format_process_result(result: Optional[str], args: Optional[Dict[str, Any]]) -> Optional[str]: @@ -1052,11 +1041,10 @@ def build_tool_start( ) if tool_name == "web_extract": - urls = arguments.get("urls") if isinstance(arguments.get("urls"), list) else [] - preview = "\n".join(f"- {url}" for url in urls[:5]) or "Extracting web content" - content = [_text("Extracting content from:\n" + preview if urls else preview)] + # The title identifies the URL(s). Avoid a duplicate content block so + # Zed renders this like read_file: compact start, concise completion. return acp.start_tool_call( - tool_call_id, title, kind=kind, content=content, locations=locations, + tool_call_id, title, kind=kind, content=None, locations=locations, ) if tool_name == "process": diff --git a/tests/acp/test_tools.py b/tests/acp/test_tools.py index 8de1292172..fcc9619f9a 100644 --- a/tests/acp/test_tools.py +++ b/tests/acp/test_tools.py @@ -197,6 +197,16 @@ class TestBuildToolStart: assert result.content is None assert result.raw_input is None + def test_build_tool_start_for_web_extract_is_compact(self): + """web_extract start should stay compact; title identifies URLs.""" + args = {"urls": ["https://example.com/docs"]} + result = build_tool_start("tc-web-start", "web_extract", args) + assert isinstance(result, ToolCallStart) + assert result.title == "extract: https://example.com/docs" + assert result.kind == "fetch" + assert result.content is None + assert result.raw_input is None + def test_build_tool_start_for_search(self): """search_files should include pattern in content.""" args = {"pattern": "TODO", "target": "content"} @@ -402,7 +412,7 @@ class TestBuildToolComplete: assert "private long memory" not in text assert result.raw_output is None - def test_build_tool_complete_for_web_extract_summarizes_urls(self): + def test_build_tool_complete_for_web_extract_summarizes_urls_without_page_content(self): result = build_tool_complete( "tc-web-extract", "web_extract", @@ -411,8 +421,10 @@ class TestBuildToolComplete: text = result.content[0].content.text assert "Web extract: 1 URL" in text assert "Example" in text - assert "Content:" in text - assert "Intro" in text + assert "https://example.com" in text + assert "Content:" not in text + assert "Intro" not in text + assert "extracted content" not in text assert result.raw_output is None def test_build_tool_complete_truncates_large_output(self): From 19854c7cd2f00e3d591e72ccbe2e456ad10c4886 Mon Sep 17 00:00:00 2001 From: Henkey Date: Sat, 2 May 2026 20:23:09 +0100 Subject: [PATCH 35/61] Schedule ACP history replay and fence file output --- acp_adapter/server.py | 16 ++++++++++++++-- acp_adapter/tools.py | 12 +++++++++++- tests/acp/test_server.py | 25 +++++++++++++++++++++++++ tests/acp/test_tools.py | 2 +- 4 files changed, 51 insertions(+), 4 deletions(-) diff --git a/acp_adapter/server.py b/acp_adapter/server.py index 498dae88bd..dd9d75af9c 100644 --- a/acp_adapter/server.py +++ b/acp_adapter/server.py @@ -658,6 +658,18 @@ class HermesACPAgent(acp.Agent): models=self._build_model_state(state), ) + def _schedule_history_replay(self, state: SessionState) -> None: + """Replay persisted history after session/load or session/resume returns. + + Zed only attaches streamed transcript/tool updates once the load/resume + response has completed. Sending replay notifications while the request is + still in-flight can make the server look correct in logs while the editor + drops or fails to attach the tool-call history. + """ + loop = asyncio.get_running_loop() + replay_coro = self._replay_session_history(state) + loop.call_soon(asyncio.create_task, replay_coro) + async def load_session( self, cwd: str, @@ -671,7 +683,7 @@ class HermesACPAgent(acp.Agent): return None await self._register_session_mcp_servers(state, mcp_servers) logger.info("Loaded session %s", session_id) - await self._replay_session_history(state) + self._schedule_history_replay(state) self._schedule_available_commands_update(session_id) self._schedule_usage_update(state) return LoadSessionResponse(models=self._build_model_state(state)) @@ -689,7 +701,7 @@ class HermesACPAgent(acp.Agent): state = self.session_manager.create_session(cwd=cwd) await self._register_session_mcp_servers(state, mcp_servers) logger.info("Resumed session %s", state.session_id) - await self._replay_session_history(state) + self._schedule_history_replay(state) self._schedule_available_commands_update(state.session_id) self._schedule_usage_update(state) return ResumeSessionResponse(models=self._build_model_state(state)) diff --git a/acp_adapter/tools.py b/acp_adapter/tools.py index de871229e0..f2c2c7452e 100644 --- a/acp_adapter/tools.py +++ b/acp_adapter/tools.py @@ -208,6 +208,13 @@ def _truncate_text(text: str, limit: int = 5000) -> str: return text[: max(0, limit - 100)] + f"\n... ({len(text)} chars total, truncated)" +def _fenced_text(text: str, language: str = "") -> str: + """Return a Markdown fence that cannot be broken by backticks in text.""" + longest = max((len(run) for run in text.split("`")[1::2]), default=0) + fence = "`" * max(3, longest + 1) + return f"{fence}{language}\n{text}\n{fence}" + + def _format_todo_result(result: Optional[str]) -> Optional[str]: data = _json_loads_maybe(result) if not isinstance(data, dict) or not isinstance(data.get("todos"), list): @@ -261,7 +268,10 @@ def _format_read_file_result(result: Optional[str], args: Optional[Dict[str, Any header = f"Read {path}{suffix}" if data.get("total_lines") is not None: header += f" — {data.get('total_lines')} total lines" - return _truncate_text(f"{header}\n\n{content}") + # Hermes read_file output is line-numbered with `|`. If we send it as raw + # Markdown, Zed can interpret pipes as tables and collapse the layout. + # Fence the payload so file lines stay readable and literal. + return _truncate_text(f"{header}\n\n{_fenced_text(content)}") def _format_search_files_result(result: Optional[str]) -> Optional[str]: diff --git a/tests/acp/test_server.py b/tests/acp/test_server.py index 282a4553c0..a4dad4aefa 100644 --- a/tests/acp/test_server.py +++ b/tests/acp/test_server.py @@ -306,6 +306,8 @@ class TestSessionOps: mock_conn.session_update.reset_mock() resp = await agent.load_session(cwd="/tmp", session_id=new_resp.session_id) + await asyncio.sleep(0) + await asyncio.sleep(0) assert isinstance(resp, LoadSessionResponse) calls = mock_conn.session_update.await_args_list @@ -347,6 +349,8 @@ class TestSessionOps: mock_conn.session_update.reset_mock() resp = await agent.resume_session(cwd="/tmp", session_id=new_resp.session_id) + await asyncio.sleep(0) + await asyncio.sleep(0) assert isinstance(resp, ResumeSessionResponse) updates = [call.kwargs["update"] for call in mock_conn.session_update.await_args_list] @@ -356,6 +360,27 @@ class TestSessionOps: for update in updates ) + @pytest.mark.asyncio + async def test_load_session_schedules_history_replay_after_response(self, agent): + """Zed only attaches replayed updates after session/load has completed.""" + new_resp = await agent.new_session(cwd="/tmp") + state = agent.session_manager.get_session(new_resp.session_id) + state.history = [{"role": "user", "content": "hello from history"}] + events = [] + + async def replay_after_response(_state): + events.append("replay") + + with patch.object(agent, "_replay_session_history", side_effect=replay_after_response): + resp = await agent.load_session(cwd="/tmp", session_id=new_resp.session_id) + events.append("returned") + + assert isinstance(resp, LoadSessionResponse) + assert events == ["returned"] + await asyncio.sleep(0) + await asyncio.sleep(0) + assert events == ["returned", "replay"] + @pytest.mark.asyncio async def test_resume_session_creates_new_if_missing(self, agent): resume_resp = await agent.resume_session(cwd="/tmp", session_id="nonexistent") diff --git a/tests/acp/test_tools.py b/tests/acp/test_tools.py index fcc9619f9a..f600bcabff 100644 --- a/tests/acp/test_tools.py +++ b/tests/acp/test_tools.py @@ -344,7 +344,7 @@ class TestBuildToolComplete: ) text = result.content[0].content.text assert "Read README.md" in text - assert "1|hello" in text + assert "```\n1|hello\n2|world\n```" in text assert result.raw_output is None def test_build_tool_complete_for_search_files_formats_matches(self): From 9987f3d82486b04151dee2d27d640b3ae01b7b16 Mon Sep 17 00:00:00 2001 From: Henkey Date: Sat, 2 May 2026 17:41:43 +0100 Subject: [PATCH 36/61] fix(acp): compact Zed tool replay rendering --- acp_adapter/tools.py | 39 ++++++++++++++++++++++++--------------- tests/acp/test_tools.py | 18 ++++++++++++------ 2 files changed, 36 insertions(+), 21 deletions(-) diff --git a/acp_adapter/tools.py b/acp_adapter/tools.py index f2c2c7452e..e7e53a6277 100644 --- a/acp_adapter/tools.py +++ b/acp_adapter/tools.py @@ -425,6 +425,7 @@ def _format_web_search_result(result: Optional[str]) -> Optional[str]: def _format_web_extract_result(result: Optional[str]) -> Optional[str]: + """Return only web_extract errors for ACP; success stays compact via title.""" data = _json_loads_maybe(result) if not isinstance(data, dict): return None @@ -433,20 +434,24 @@ def _format_web_extract_result(result: Optional[str]) -> Optional[str]: results = data.get("results") if not isinstance(results, list): return None - lines = [f"Web extract: {len(results)} URL{'s' if len(results) != 1 else ''}"] + + failures: list[str] = [] for item in results[:10]: if not isinstance(item, dict): continue + error = str(item.get("error") or "").strip() + if not error or error in {"None", "null"}: + continue url = str(item.get("url") or "").strip() title = str(item.get("title") or url or "Untitled").strip() - error = str(item.get("error") or "").strip() - status = "failed" if error and error not in {"None", "null"} else "extracted" - suffix = f" — {status}" - lines.append(f"- {title}" + (f" — {url}" if url and url != title else "") + suffix) - if status == "failed": - lines.append(f" Error: {_truncate_text(error, limit=500)}") - if len(results) > 10: - lines.append(f"... {len(results) - 10} more result(s) omitted") + failures.append( + f"- {title}" + (f" — {url}" if url and url != title else "") + f"\n Error: {_truncate_text(error, limit=500)}" + ) + + if not failures: + return None + lines = [f"Web extract failed for {len(failures)} URL{'s' if len(failures) != 1 else ''}"] + lines.extend(failures) return "\n".join(lines) @@ -1139,12 +1144,16 @@ def build_tool_complete( ) -> ToolCallProgress: """Create a ToolCallUpdate (progress) event for a completed tool call.""" kind = get_tool_kind(tool_name) - content = _build_tool_complete_content( - tool_name, - result, - function_args=function_args, - snapshot=snapshot, - ) + if tool_name == "web_extract": + error_text = _format_web_extract_result(result) + content = [_text(error_text)] if error_text else None + else: + content = _build_tool_complete_content( + tool_name, + result, + function_args=function_args, + snapshot=snapshot, + ) return acp.update_tool_call( tool_call_id, kind=kind, diff --git a/tests/acp/test_tools.py b/tests/acp/test_tools.py index f600bcabff..f9b0dac6d6 100644 --- a/tests/acp/test_tools.py +++ b/tests/acp/test_tools.py @@ -412,19 +412,25 @@ class TestBuildToolComplete: assert "private long memory" not in text assert result.raw_output is None - def test_build_tool_complete_for_web_extract_summarizes_urls_without_page_content(self): + def test_build_tool_complete_for_web_extract_success_stays_compact(self): result = build_tool_complete( "tc-web-extract", "web_extract", '{"results":[{"url":"https://example.com","title":"Example","content":"# Intro\\nThis is extracted content."}]}', ) + assert result.content is None + assert result.raw_output is None + + def test_build_tool_complete_for_web_extract_error_shows_error(self): + result = build_tool_complete( + "tc-web-extract-error", + "web_extract", + '{"results":[{"url":"https://example.com","title":"Example","error":"timeout"}]}', + ) text = result.content[0].content.text - assert "Web extract: 1 URL" in text - assert "Example" in text + assert "Web extract failed" in text assert "https://example.com" in text - assert "Content:" not in text - assert "Intro" not in text - assert "extracted content" not in text + assert "timeout" in text assert result.raw_output is None def test_build_tool_complete_truncates_large_output(self): From a22465e07ab4b71019f711e7a6463f6590c50742 Mon Sep 17 00:00:00 2001 From: MottledShadow <159539633+MottledShadow@users.noreply.github.com> Date: Sun, 3 May 2026 00:21:26 +0800 Subject: [PATCH 37/61] fix(weixin): send_weixin_direct cross-loop session check MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When send_message tool is called from inside a running gateway, the _run_async bridge spawns a worker thread with a separate event loop. send_weixin_direct then reuses the live adapter's aiohttp session which was created on the gateway's main loop. aiohttp's TimerContext checks asyncio.current_task(loop=session._loop) and sees None because we're executing on the worker thread's loop → raises 'Timeout context manager should be used inside a task'. Fix: skip the live-adapter shortcut when the session belongs to a different event loop, falling through to the fresh-session path. --- gateway/platforms/weixin.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gateway/platforms/weixin.py b/gateway/platforms/weixin.py index 72b7d2a4df..3fd7174270 100644 --- a/gateway/platforms/weixin.py +++ b/gateway/platforms/weixin.py @@ -2030,7 +2030,9 @@ async def send_weixin_direct( live_adapter = _LIVE_ADAPTERS.get(resolved_token) send_session = getattr(live_adapter, '_send_session', None) - if live_adapter is not None and send_session is not None and not send_session.closed: + if (live_adapter is not None and send_session is not None + and not send_session.closed + and send_session._loop is asyncio.get_running_loop()): last_result: Optional[SendResult] = None cleaned = live_adapter.format_message(message) if cleaned: From 9b5b88b5e028f8c799053aae624be40e616b5d8d Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sun, 3 May 2026 01:44:17 -0700 Subject: [PATCH 38/61] chore: add MottledShadow to AUTHOR_MAP --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index 0c046ee46e..32453d723d 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -46,6 +46,7 @@ AUTHOR_MAP = { "leone.parise@gmail.com": "leoneparise", "teknium@nousresearch.com": "teknium1", "127238744+teknium1@users.noreply.github.com": "teknium1", + "159539633+MottledShadow@users.noreply.github.com": "MottledShadow", "aludwin+gh@gmail.com": "adamludwin", "2093036+exiao@users.noreply.github.com": "exiao", "rylen.anil@gmail.com": "rylena", From 457c7b76cd69089142f7ee02bf26ed5fef9d8741 Mon Sep 17 00:00:00 2001 From: kshitij <82637225+kshitijk4poor@users.noreply.github.com> Date: Sun, 3 May 2026 01:54:24 -0700 Subject: [PATCH 39/61] feat(openrouter): add response caching support (#19132) Enable OpenRouter's response caching feature (beta) via X-OpenRouter-Cache headers. When enabled, identical API requests return cached responses for free (zero billing), reducing both latency and cost. Configuration via config.yaml: openrouter: response_cache: true # default: on response_cache_ttl: 300 # 1-86400 seconds Changes: - Add openrouter config section to DEFAULT_CONFIG (response_cache + TTL) - Add build_or_headers() in auxiliary_client.py that builds attribution headers plus optional cache headers based on config - Replace inline _OR_HEADERS dicts with build_or_headers() at all 5 sites: run_agent.py __init__, _apply_client_headers_for_base_url(), and auxiliary_client.py _try_openrouter() + _to_async_client() - Add _check_openrouter_cache_status() method to AIAgent that reads X-OpenRouter-Cache-Status from streaming response headers and logs HIT/MISS status - Document in cli-config.yaml.example - Add 28 tests (22 unit + 6 integration) Ref: https://openrouter.ai/docs/guides/features/response-caching --- agent/auxiliary_client.py | 65 +++- cli-config.yaml.example | 12 + hermes_cli/config.py | 12 + run_agent.py | 40 ++- tests/agent/test_openrouter_response_cache.py | 284 ++++++++++++++++++ .../test_provider_attribution_headers.py | 48 +++ .../docs/reference/environment-variables.md | 2 + 7 files changed, 451 insertions(+), 12 deletions(-) create mode 100644 tests/agent/test_openrouter_response_cache.py diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py index bed5c8d470..b86f78f8ec 100644 --- a/agent/auxiliary_client.py +++ b/agent/auxiliary_client.py @@ -259,13 +259,68 @@ _PROVIDERS_WITHOUT_VISION: frozenset = frozenset({ "kimi-coding-cn", }) -# OpenRouter app attribution headers -_OR_HEADERS = { +# OpenRouter app attribution headers (base — always sent) +_OR_HEADERS_BASE = { "HTTP-Referer": "https://hermes-agent.nousresearch.com", "X-OpenRouter-Title": "Hermes Agent", "X-OpenRouter-Categories": "productivity,cli-agent", } +# Truthy values for boolean env-var parsing. +_TRUTHY_ENV_VALUES = frozenset({"1", "true", "yes", "on"}) + + +def build_or_headers(or_config: dict | None = None) -> dict: + """Build OpenRouter headers, optionally including response-cache headers. + + Precedence for response cache: env var > config.yaml > default (enabled). + + Environment variables: + ``HERMES_OPENROUTER_CACHE`` — truthy (``1``/``true``/``yes``/``on``) + enables caching; ``0``/``false``/``no``/``off`` disables. + Overrides ``openrouter.response_cache`` in config.yaml. + ``HERMES_OPENROUTER_CACHE_TTL`` — integer seconds (1-86400). + Overrides ``openrouter.response_cache_ttl`` in config.yaml. + + *or_config* is the ``openrouter`` section from config.yaml. When *None*, + falls back to reading config from disk via ``load_config()``. + """ + headers = dict(_OR_HEADERS_BASE) + + # Resolve config from disk if not provided. + if or_config is None: + try: + from hermes_cli.config import load_config + or_config = load_config().get("openrouter", {}) + except Exception: + or_config = {} + + # Determine cache enabled: env var overrides config. + env_cache = os.environ.get("HERMES_OPENROUTER_CACHE", "").strip().lower() + if env_cache: + cache_enabled = env_cache in _TRUTHY_ENV_VALUES + else: + cache_enabled = or_config.get("response_cache", False) + + if not cache_enabled: + return headers + + headers["X-OpenRouter-Cache"] = "true" + + # Determine TTL: env var overrides config. + env_ttl = os.environ.get("HERMES_OPENROUTER_CACHE_TTL", "").strip() + if env_ttl: + if env_ttl.isdigit(): + ttl = int(env_ttl) + if 1 <= ttl <= 86400: + headers["X-OpenRouter-Cache-TTL"] = str(ttl) + else: + ttl = or_config.get("response_cache_ttl", 300) + if isinstance(ttl, (int, float)) and 1 <= ttl <= 86400: + headers["X-OpenRouter-Cache-TTL"] = str(int(ttl)) + + return headers + # Vercel AI Gateway app attribution headers. HTTP-Referer maps to # referrerUrl and X-Title maps to appName in the gateway's analytics. from hermes_cli import __version__ as _HERMES_VERSION @@ -1158,14 +1213,14 @@ def _try_openrouter(explicit_api_key: str = None) -> Tuple[Optional[OpenAI], Opt base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL logger.debug("Auxiliary client: OpenRouter via pool") return OpenAI(api_key=or_key, base_url=base_url, - default_headers=_OR_HEADERS), _OPENROUTER_MODEL + default_headers=build_or_headers()), _OPENROUTER_MODEL or_key = explicit_api_key or os.getenv("OPENROUTER_API_KEY") if not or_key: return None, None logger.debug("Auxiliary client: OpenRouter") return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL, - default_headers=_OR_HEADERS), _OPENROUTER_MODEL + default_headers=build_or_headers()), _OPENROUTER_MODEL def _describe_openrouter_unavailable() -> str: @@ -1911,7 +1966,7 @@ def _to_async_client(sync_client, model: str, is_vision: bool = False): } sync_base_url = str(sync_client.base_url) if base_url_host_matches(sync_base_url, "openrouter.ai"): - async_kwargs["default_headers"] = dict(_OR_HEADERS) + async_kwargs["default_headers"] = build_or_headers() elif base_url_host_matches(sync_base_url, "api.githubcopilot.com"): from hermes_cli.copilot_auth import copilot_request_headers diff --git a/cli-config.yaml.example b/cli-config.yaml.example index c92be7e26b..963268d4ba 100644 --- a/cli-config.yaml.example +++ b/cli-config.yaml.example @@ -121,6 +121,18 @@ model: # # Data policy: "allow" (default) or "deny" to exclude providers that may store data # # data_collection: "deny" +# ============================================================================= +# OpenRouter Response Caching (only applies when using OpenRouter) +# ============================================================================= +# Cache identical API responses at the OpenRouter edge for free instant replays. +# When enabled, identical requests (same model, messages, parameters) return +# cached responses with zero billing. Separate from Anthropic prompt caching. +# See: https://openrouter.ai/docs/guides/features/response-caching +# +# openrouter: +# response_cache: true # Enable response caching (default: true) +# response_cache_ttl: 300 # Cache TTL in seconds, 1-86400 (default: 300) + # ============================================================================= # Git Worktree Isolation # ============================================================================= diff --git a/hermes_cli/config.py b/hermes_cli/config.py index 9e7ff8897c..25df4b3e2f 100644 --- a/hermes_cli/config.py +++ b/hermes_cli/config.py @@ -644,6 +644,18 @@ DEFAULT_CONFIG = { "cache_ttl": "5m", }, + # OpenRouter-specific settings. + # response_cache: enable OpenRouter response caching (X-OpenRouter-Cache header). + # When enabled, identical requests return cached responses for free (zero billing). + # This is separate from Anthropic prompt caching and works alongside it. + # See: https://openrouter.ai/docs/guides/features/response-caching + # response_cache_ttl: how long cached responses remain valid, in seconds (1-86400). + # Default 300 (5 minutes). Only used when response_cache is enabled. + "openrouter": { + "response_cache": True, + "response_cache_ttl": 300, + }, + # AWS Bedrock provider configuration. # Only used when model.provider is "bedrock". "bedrock": { diff --git a/run_agent.py b/run_agent.py index aac067ed4e..cfcd325eb6 100644 --- a/run_agent.py +++ b/run_agent.py @@ -1258,6 +1258,10 @@ class AIAgent: # after each API call. Accessed by /usage slash command. self._rate_limit_state: Optional["RateLimitState"] = None + # OpenRouter response cache hit counter — incremented when + # X-OpenRouter-Cache-Status: HIT is seen in streaming response headers. + self._or_cache_hits: int = 0 + # Centralized logging — agent.log (INFO+) and errors.log (WARNING+) # both live under ~/.hermes/logs/. Idempotent, so gateway mode # (which creates a new AIAgent per message) won't duplicate handlers. @@ -1421,11 +1425,8 @@ class AIAgent: client_kwargs["args"] = self.acp_args effective_base = base_url if base_url_host_matches(effective_base, "openrouter.ai"): - client_kwargs["default_headers"] = { - "HTTP-Referer": "https://hermes-agent.nousresearch.com", - "X-OpenRouter-Title": "Hermes Agent", - "X-OpenRouter-Categories": "productivity,cli-agent", - } + from agent.auxiliary_client import build_or_headers + client_kwargs["default_headers"] = build_or_headers() elif base_url_host_matches(effective_base, "api.routermint.com"): client_kwargs["default_headers"] = _routermint_headers() elif base_url_host_matches(effective_base, "api.githubcopilot.com"): @@ -4580,6 +4581,28 @@ class AIAgent: """Return the last captured RateLimitState, or None.""" return self._rate_limit_state + def _check_openrouter_cache_status(self, http_response: Any) -> None: + """Read X-OpenRouter-Cache-Status from response headers and log it. + + Increments ``_or_cache_hits`` on HIT so callers can report savings. + """ + if http_response is None: + return + headers = getattr(http_response, "headers", None) + if not headers: + return + try: + status = headers.get("x-openrouter-cache-status") + if not status: + return + if status.upper() == "HIT": + self._or_cache_hits += 1 + logger.info("OpenRouter response cache HIT (total: %d)", self._or_cache_hits) + else: + logger.debug("OpenRouter response cache %s", status.upper()) + except Exception: + pass # Never let header parsing break the agent loop + def get_activity_summary(self) -> dict: """Return a snapshot of the agent's current activity for diagnostics. @@ -6157,10 +6180,10 @@ class AIAgent: return True def _apply_client_headers_for_base_url(self, base_url: str) -> None: - from agent.auxiliary_client import _AI_GATEWAY_HEADERS, _OR_HEADERS + from agent.auxiliary_client import _AI_GATEWAY_HEADERS, build_or_headers if base_url_host_matches(base_url, "openrouter.ai"): - self._client_kwargs["default_headers"] = dict(_OR_HEADERS) + self._client_kwargs["default_headers"] = build_or_headers() elif base_url_host_matches(base_url, "ai-gateway.vercel.sh"): self._client_kwargs["default_headers"] = dict(_AI_GATEWAY_HEADERS) elif base_url_host_matches(base_url, "api.routermint.com"): @@ -6780,6 +6803,9 @@ class AIAgent: # response via .response before any chunks are consumed. self._capture_rate_limits(getattr(stream, "response", None)) + # Log OpenRouter response cache status when present. + self._check_openrouter_cache_status(getattr(stream, "response", None)) + content_parts: list = [] tool_calls_acc: dict = {} tool_gen_notified: set = set() diff --git a/tests/agent/test_openrouter_response_cache.py b/tests/agent/test_openrouter_response_cache.py new file mode 100644 index 0000000000..612ec34469 --- /dev/null +++ b/tests/agent/test_openrouter_response_cache.py @@ -0,0 +1,284 @@ +"""Tests for OpenRouter response caching header injection.""" + +from types import SimpleNamespace +from unittest.mock import patch + +import pytest + + +# --------------------------------------------------------------------------- +# build_or_headers +# --------------------------------------------------------------------------- + +class TestBuildOrHeaders: + """Test the build_or_headers() helper in agent/auxiliary_client.py.""" + + def test_base_attribution_always_present(self): + """Attribution headers must always be included regardless of cache setting.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": False}) + assert headers["HTTP-Referer"] == "https://hermes-agent.nousresearch.com" + assert headers["X-OpenRouter-Title"] == "Hermes Agent" + assert headers["X-OpenRouter-Categories"] == "productivity,cli-agent" + + def test_cache_enabled(self): + """When response_cache is True, X-OpenRouter-Cache header is set.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": True}) + assert headers["X-OpenRouter-Cache"] == "true" + + def test_cache_disabled(self): + """When response_cache is False, no cache header is sent.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": False}) + assert "X-OpenRouter-Cache" not in headers + assert "X-OpenRouter-Cache-TTL" not in headers + + def test_cache_disabled_by_default_empty_config(self): + """Empty config dict means no cache headers (response_cache defaults to False).""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={}) + assert "X-OpenRouter-Cache" not in headers + + def test_ttl_default(self): + """Default TTL (300) is included when cache is enabled.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": True, "response_cache_ttl": 300}) + assert headers["X-OpenRouter-Cache-TTL"] == "300" + + def test_ttl_custom(self): + """Custom TTL values within range are sent.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": True, "response_cache_ttl": 3600}) + assert headers["X-OpenRouter-Cache-TTL"] == "3600" + + def test_ttl_max(self): + """Maximum TTL (86400) is accepted.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": True, "response_cache_ttl": 86400}) + assert headers["X-OpenRouter-Cache-TTL"] == "86400" + + def test_ttl_out_of_range_too_high(self): + """TTL above 86400 is silently ignored (no TTL header sent).""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": True, "response_cache_ttl": 100000}) + assert "X-OpenRouter-Cache-TTL" not in headers + # But cache is still enabled + assert headers["X-OpenRouter-Cache"] == "true" + + def test_ttl_out_of_range_zero(self): + """TTL of 0 is below minimum — no TTL header sent.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": True, "response_cache_ttl": 0}) + assert "X-OpenRouter-Cache-TTL" not in headers + + def test_ttl_negative(self): + """Negative TTL is ignored.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": True, "response_cache_ttl": -5}) + assert "X-OpenRouter-Cache-TTL" not in headers + + def test_ttl_not_a_number(self): + """Non-numeric TTL is ignored.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": True, "response_cache_ttl": "five"}) + assert "X-OpenRouter-Cache-TTL" not in headers + + def test_ttl_float_truncated(self): + """Float TTL values are truncated to int.""" + from agent.auxiliary_client import build_or_headers + + headers = build_or_headers(or_config={"response_cache": True, "response_cache_ttl": 600.7}) + assert headers["X-OpenRouter-Cache-TTL"] == "600" + + def test_returns_fresh_dict(self): + """Each call returns a new dict so mutations don't leak.""" + from agent.auxiliary_client import build_or_headers + + cfg = {"response_cache": True} + h1 = build_or_headers(or_config=cfg) + h2 = build_or_headers(or_config=cfg) + assert h1 is not h2 + assert h1 == h2 + + def test_none_config_falls_back_to_load_config(self): + """When or_config is None, build_or_headers reads from load_config().""" + from agent.auxiliary_client import build_or_headers + + fake_cfg = { + "openrouter": {"response_cache": True, "response_cache_ttl": 900}, + } + with patch("hermes_cli.config.load_config", return_value=fake_cfg): + headers = build_or_headers(or_config=None) + assert headers["X-OpenRouter-Cache"] == "true" + assert headers["X-OpenRouter-Cache-TTL"] == "900" + + def test_none_config_load_config_fails_gracefully(self): + """When load_config() fails, build_or_headers still returns base headers.""" + from agent.auxiliary_client import build_or_headers + + with patch("hermes_cli.config.load_config", side_effect=RuntimeError("boom")): + headers = build_or_headers(or_config=None) + # Should have base attribution but no cache headers + assert "HTTP-Referer" in headers + assert "X-OpenRouter-Cache" not in headers + + +# --------------------------------------------------------------------------- +# Environment variable overrides +# --------------------------------------------------------------------------- + +class TestEnvVarOverrides: + """Test env var precedence over config.yaml for response caching.""" + + def test_env_enables_cache(self, monkeypatch): + """HERMES_OPENROUTER_CACHE=true enables cache even when config disables it.""" + from agent.auxiliary_client import build_or_headers + + monkeypatch.setenv("HERMES_OPENROUTER_CACHE", "true") + headers = build_or_headers(or_config={"response_cache": False}) + assert headers["X-OpenRouter-Cache"] == "true" + + def test_env_disables_cache(self, monkeypatch): + """HERMES_OPENROUTER_CACHE=false disables cache even when config enables it.""" + from agent.auxiliary_client import build_or_headers + + monkeypatch.setenv("HERMES_OPENROUTER_CACHE", "false") + headers = build_or_headers(or_config={"response_cache": True}) + assert "X-OpenRouter-Cache" not in headers + + @pytest.mark.parametrize("value", ["1", "true", "TRUE", "yes", "Yes", "on"]) + def test_truthy_values(self, monkeypatch, value): + """Various truthy strings enable caching.""" + from agent.auxiliary_client import build_or_headers + + monkeypatch.setenv("HERMES_OPENROUTER_CACHE", value) + headers = build_or_headers(or_config={}) + assert headers["X-OpenRouter-Cache"] == "true" + + @pytest.mark.parametrize("value", ["0", "false", "no", "off", "maybe", ""]) + def test_non_truthy_values(self, monkeypatch, value): + """Non-truthy strings do not enable caching (empty falls through to config).""" + from agent.auxiliary_client import build_or_headers + + monkeypatch.setenv("HERMES_OPENROUTER_CACHE", value) + # Empty string falls through to config; others are explicitly non-truthy + if value == "": + # Empty env var falls through to config default (False) + headers = build_or_headers(or_config={"response_cache": False}) + else: + headers = build_or_headers(or_config={"response_cache": True}) + assert "X-OpenRouter-Cache" not in headers + + def test_env_ttl_overrides_config(self, monkeypatch): + """HERMES_OPENROUTER_CACHE_TTL overrides config TTL.""" + from agent.auxiliary_client import build_or_headers + + monkeypatch.setenv("HERMES_OPENROUTER_CACHE", "true") + monkeypatch.setenv("HERMES_OPENROUTER_CACHE_TTL", "1800") + headers = build_or_headers(or_config={"response_cache_ttl": 300}) + assert headers["X-OpenRouter-Cache-TTL"] == "1800" + + @pytest.mark.parametrize("ttl", ["0", "86401", "abc", "-1", "12.5"]) + def test_invalid_env_ttl_dropped(self, monkeypatch, ttl): + """Invalid TTL env values are ignored; cache still enabled without TTL.""" + from agent.auxiliary_client import build_or_headers + + monkeypatch.setenv("HERMES_OPENROUTER_CACHE", "1") + monkeypatch.setenv("HERMES_OPENROUTER_CACHE_TTL", ttl) + headers = build_or_headers(or_config={}) + assert headers["X-OpenRouter-Cache"] == "true" + assert "X-OpenRouter-Cache-TTL" not in headers + + @pytest.mark.parametrize("ttl", ["1", "300", "86400"]) + def test_valid_env_ttl_boundaries(self, monkeypatch, ttl): + """Boundary TTL values (1, 300, 86400) are accepted.""" + from agent.auxiliary_client import build_or_headers + + monkeypatch.setenv("HERMES_OPENROUTER_CACHE", "yes") + monkeypatch.setenv("HERMES_OPENROUTER_CACHE_TTL", ttl) + assert build_or_headers(or_config={})["X-OpenRouter-Cache-TTL"] == ttl + + def test_no_env_vars_falls_through_to_config(self, monkeypatch): + """Without env vars, config.yaml controls behavior.""" + from agent.auxiliary_client import build_or_headers + + monkeypatch.delenv("HERMES_OPENROUTER_CACHE", raising=False) + monkeypatch.delenv("HERMES_OPENROUTER_CACHE_TTL", raising=False) + headers = build_or_headers(or_config={"response_cache": True, "response_cache_ttl": 600}) + assert headers["X-OpenRouter-Cache"] == "true" + assert headers["X-OpenRouter-Cache-TTL"] == "600" + +class TestDefaultConfig: + """Verify the openrouter config section is in DEFAULT_CONFIG.""" + + def test_openrouter_section_exists(self): + from hermes_cli.config import DEFAULT_CONFIG + + assert "openrouter" in DEFAULT_CONFIG + or_cfg = DEFAULT_CONFIG["openrouter"] + assert or_cfg["response_cache"] is True + assert or_cfg["response_cache_ttl"] == 300 + + +# --------------------------------------------------------------------------- +# _check_openrouter_cache_status +# --------------------------------------------------------------------------- + +class TestCheckOpenrouterCacheStatus: + """Test the _check_openrouter_cache_status method on AIAgent.""" + + def _make_agent(self): + """Create a minimal AIAgent-like object with just the method under test.""" + from run_agent import AIAgent + + # Use object.__new__ to skip __init__, then set the attributes we need + agent = object.__new__(AIAgent) + agent._or_cache_hits = 0 + return agent + + def test_hit_increments_counter(self): + agent = self._make_agent() + resp = SimpleNamespace(headers={"x-openrouter-cache-status": "HIT"}) + agent._check_openrouter_cache_status(resp) + assert agent._or_cache_hits == 1 + # Second hit increments + agent._check_openrouter_cache_status(resp) + assert agent._or_cache_hits == 2 + + def test_miss_does_not_increment(self): + agent = self._make_agent() + resp = SimpleNamespace(headers={"x-openrouter-cache-status": "MISS"}) + agent._check_openrouter_cache_status(resp) + assert getattr(agent, "_or_cache_hits", 0) == 0 + + def test_no_header_is_noop(self): + agent = self._make_agent() + resp = SimpleNamespace(headers={}) + agent._check_openrouter_cache_status(resp) + assert getattr(agent, "_or_cache_hits", 0) == 0 + + def test_none_response_is_safe(self): + agent = self._make_agent() + agent._check_openrouter_cache_status(None) # no crash + + def test_no_headers_attr_is_safe(self): + agent = self._make_agent() + agent._check_openrouter_cache_status(object()) # no crash + + def test_case_insensitive(self): + agent = self._make_agent() + resp = SimpleNamespace(headers={"x-openrouter-cache-status": "hit"}) + agent._check_openrouter_cache_status(resp) + assert agent._or_cache_hits == 1 diff --git a/tests/run_agent/test_provider_attribution_headers.py b/tests/run_agent/test_provider_attribution_headers.py index cf9d8bb8fb..2ce440741f 100644 --- a/tests/run_agent/test_provider_attribution_headers.py +++ b/tests/run_agent/test_provider_attribution_headers.py @@ -81,3 +81,51 @@ def test_unknown_base_url_clears_default_headers(mock_openai): agent._apply_client_headers_for_base_url("https://api.example.com/v1") assert "default_headers" not in agent._client_kwargs + + +@patch("run_agent.OpenAI") +def test_openrouter_headers_include_response_cache_when_enabled(mock_openai): + """When openrouter.response_cache is True, the cache header is injected.""" + mock_openai.return_value = MagicMock() + agent = AIAgent( + api_key="test-key", + base_url="https://openrouter.ai/api/v1", + model="test/model", + quiet_mode=True, + skip_context_files=True, + skip_memory=True, + ) + + with patch("hermes_cli.config.load_config", return_value={ + "openrouter": {"response_cache": True, "response_cache_ttl": 600}, + }): + agent._apply_client_headers_for_base_url("https://openrouter.ai/api/v1") + + headers = agent._client_kwargs["default_headers"] + assert headers["HTTP-Referer"] == "https://hermes-agent.nousresearch.com" + assert headers["X-OpenRouter-Cache"] == "true" + assert headers["X-OpenRouter-Cache-TTL"] == "600" + + +@patch("run_agent.OpenAI") +def test_openrouter_headers_no_cache_when_disabled(mock_openai): + """When openrouter.response_cache is False, no cache headers are sent.""" + mock_openai.return_value = MagicMock() + agent = AIAgent( + api_key="test-key", + base_url="https://openrouter.ai/api/v1", + model="test/model", + quiet_mode=True, + skip_context_files=True, + skip_memory=True, + ) + + with patch("hermes_cli.config.load_config", return_value={ + "openrouter": {"response_cache": False}, + }): + agent._apply_client_headers_for_base_url("https://openrouter.ai/api/v1") + + headers = agent._client_kwargs["default_headers"] + assert headers["HTTP-Referer"] == "https://hermes-agent.nousresearch.com" + assert "X-OpenRouter-Cache" not in headers + assert "X-OpenRouter-Cache-TTL" not in headers diff --git a/website/docs/reference/environment-variables.md b/website/docs/reference/environment-variables.md index afe2c40d2a..955f460014 100644 --- a/website/docs/reference/environment-variables.md +++ b/website/docs/reference/environment-variables.md @@ -14,6 +14,8 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config |----------|-------------| | `OPENROUTER_API_KEY` | OpenRouter API key (recommended for flexibility) | | `OPENROUTER_BASE_URL` | Override the OpenRouter-compatible base URL | +| `HERMES_OPENROUTER_CACHE` | Enable OpenRouter response caching (`1`/`true`/`yes`/`on`). Overrides `openrouter.response_cache` in config.yaml. See [Response Caching](https://openrouter.ai/docs/guides/features/response-caching). | +| `HERMES_OPENROUTER_CACHE_TTL` | Cache TTL in seconds (1-86400). Overrides `openrouter.response_cache_ttl` in config.yaml. | | `NOUS_BASE_URL` | Override Nous Portal base URL (rarely needed; development/testing only) | | `NOUS_INFERENCE_BASE_URL` | Override Nous inference endpoint directly | | `AI_GATEWAY_API_KEY` | Vercel AI Gateway API key ([ai-gateway.vercel.sh](https://ai-gateway.vercel.sh)) | From c4c0e5abc2b579ce1a4cca4d5ff808550f754662 Mon Sep 17 00:00:00 2001 From: charliekerfoot Date: Sat, 2 May 2026 14:30:05 -0500 Subject: [PATCH 40/61] =?UTF-8?q?fix:=20After=20=5Fclamp=5Fcommand=5Fnames?= =?UTF-8?q?=20truncates=20skill=20names=20to=20fit=20the=2032-cha=E2=80=A6?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- hermes_cli/commands.py | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/hermes_cli/commands.py b/hermes_cli/commands.py index 1b4b85bd67..6626cff08e 100644 --- a/hermes_cli/commands.py +++ b/hermes_cli/commands.py @@ -514,8 +514,9 @@ def _clamp_command_names( If all 10 digit slots are taken the entry is silently dropped. """ used: set[str] = set(reserved) - result: list[tuple[str, str]] = [] - for name, desc in entries: + result: list[tuple] = [] + for entry in entries: + name, desc, *extra = entry if len(name) > _CMD_NAME_LIMIT: candidate = name[:_CMD_NAME_LIMIT] if candidate in used: @@ -531,7 +532,7 @@ def _clamp_command_names( if name in used: continue used.add(name) - result.append((name, desc)) + result.append((name, desc, *extra)) return result @@ -651,17 +652,15 @@ def _collect_gateway_skill_entries( except Exception: pass - # Clamp names; _clamp_command_names works on (name, desc) pairs so we - # need to zip/unzip. - skill_pairs = [(n, d) for n, d, _ in skill_triples] - key_by_pair = {(n, d): k for n, d, k in skill_triples} - skill_pairs = _clamp_command_names(skill_pairs, reserved_names) + # Clamp names; cmd_key is passed through as extra payload so it survives + # any clamp-induced renames. + skill_triples = _clamp_command_names(skill_triples, reserved_names) # Skills fill remaining slots — only tier that gets trimmed remaining = max(0, max_slots - len(all_entries)) - hidden_count = max(0, len(skill_pairs) - remaining) - for n, d in skill_pairs[:remaining]: - all_entries.append((n, d, key_by_pair.get((n, d), ""))) + hidden_count = max(0, len(skill_triples) - remaining) + for n, d, k in skill_triples[:remaining]: + all_entries.append((n, d, k)) return all_entries[:max_slots], hidden_count From 5d5b8912bece744b08b5d6428f2ad12ff6969f87 Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Sun, 3 May 2026 15:13:10 +0530 Subject: [PATCH 41/61] test: add tests for cmd_key preservation through name clamping - TestClampCommandNamesTriples: unit tests for 3-tuple support in _clamp_command_names (short names, long names, collisions, multiple entries, backward compat with 2-tuples) - TestDiscordSkillCmdKeyDispatch: integration test through the full discord_skill_commands pipeline verifying long skill names retain their original cmd_key after clamping - Add contributor CharlieKerfoot to AUTHOR_MAP --- hermes_cli/commands.py | 8 +- scripts/release.py | 1 + tests/hermes_cli/test_commands.py | 97 +++++++++++++++++++ .../test_discord_skill_clamp_warning.py | 75 ++++++++++++++ 4 files changed, 179 insertions(+), 2 deletions(-) diff --git a/hermes_cli/commands.py b/hermes_cli/commands.py index 6626cff08e..07e7273bf7 100644 --- a/hermes_cli/commands.py +++ b/hermes_cli/commands.py @@ -502,9 +502,9 @@ def _sanitize_telegram_name(raw: str) -> str: def _clamp_command_names( - entries: list[tuple[str, str]], + entries: list[tuple[str, ...]], reserved: set[str], -) -> list[tuple[str, str]]: +) -> list[tuple[str, ...]]: """Enforce 32-char command name limit with collision avoidance. Both Telegram and Discord cap slash command names at 32 characters. @@ -512,6 +512,10 @@ def _clamp_command_names( (against *reserved* names or earlier entries in the same batch), the name is shortened to 31 chars and a digit ``0``-``9`` is appended to differentiate. If all 10 digit slots are taken the entry is silently dropped. + + Accepts tuples of any length >= 2. Extra elements beyond ``(name, desc)`` + (e.g. ``cmd_key``) are passed through unchanged, so callers can attach + metadata that survives the rename. """ used: set[str] = set(reserved) result: list[tuple] = [] diff --git a/scripts/release.py b/scripts/release.py index 32453d723d..e06d1d2a31 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -673,6 +673,7 @@ AUTHOR_MAP = { "web3blind@gmail.com": "web3blind", "ztzheng@163.com": "chengoak", # PR #17467 "24110240104@m.fudan.edu.cn": "YuShu", # co-author only + "charliekerfoot@gmail.com": "CharlieKerfoot", # PR #18951 } diff --git a/tests/hermes_cli/test_commands.py b/tests/hermes_cli/test_commands.py index 7c19730d9e..d505c8a1a7 100644 --- a/tests/hermes_cli/test_commands.py +++ b/tests/hermes_cli/test_commands.py @@ -822,6 +822,103 @@ class TestClampTelegramNames: assert result[0] == ("foo", "d1") +class TestClampCommandNamesTriples: + """Tests for _clamp_command_names with 3-tuples (name, desc, cmd_key). + + Skill entries pass through _clamp_command_names as 3-tuples so the + original cmd_key survives name truncation. Before the fix in PR #18951, + the code stripped cmd_key into a side-dict keyed by the *original* + (name, desc) pair — after truncation the lookup key no longer matched, + silently losing the cmd_key. + """ + + def test_short_triple_preserved(self): + entries = [("skill", "A skill", "/skill")] + result = _clamp_command_names(entries, set()) + assert result == [("skill", "A skill", "/skill")] + + def test_long_name_preserves_cmd_key(self): + long = "a" * 50 + cmd_key = f"/{long}" + result = _clamp_command_names([(long, "desc", cmd_key)], set()) + assert len(result) == 1 + name, desc, key = result[0] + assert len(name) == _CMD_NAME_LIMIT + assert key == cmd_key, "cmd_key must survive name clamping" + + def test_collision_preserves_cmd_key(self): + prefix = "x" * _CMD_NAME_LIMIT + long = "x" * 50 + result = _clamp_command_names( + [(long, "desc", "/long-skill")], reserved={prefix}, + ) + assert len(result) == 1 + name, _desc, key = result[0] + assert name == "x" * (_CMD_NAME_LIMIT - 1) + "0" + assert key == "/long-skill" + + def test_multiple_long_names_preserve_respective_keys(self): + base = "y" * 40 + entries = [ + (base + "_alpha", "d1", "/alpha-skill"), + (base + "_beta", "d2", "/beta-skill"), + ] + result = _clamp_command_names(entries, set()) + assert len(result) == 2 + assert result[0][2] == "/alpha-skill" + assert result[1][2] == "/beta-skill" + + def test_backward_compat_with_pairs(self): + """Legacy 2-tuple callers (Telegram) must still work.""" + entries = [("help", "Show help"), ("status", "Show status")] + result = _clamp_command_names(entries, set()) + assert result == entries + + +class TestDiscordSkillCmdKeyDispatch: + """Integration: discord_skill_commands preserves cmd_key for long names. + + This tests the full pipeline: skill_commands → _collect_gateway_skill_entries + → _clamp_command_names → returned triples, verifying that skills with names + exceeding Discord's 32-char limit still have their original cmd_key for + dispatch. + """ + + def test_long_skill_name_retains_cmd_key(self, tmp_path, monkeypatch): + from unittest.mock import patch + + long_name = "this-is-a-very-long-skill-name-that-exceeds-limit" + cmd_key = f"/{long_name}" + fake_skills_dir = tmp_path / "skills" + fake_skills_dir.mkdir(exist_ok=True) + # Use resolved path — macOS /var → /private/var symlink + # causes SKILLS_DIR.resolve() to differ from tmp_path. + resolved_dir = str(fake_skills_dir.resolve()) + + fake_cmds = { + cmd_key: { + "name": long_name, + "description": "A skill with a long name", + "skill_md_path": f"{resolved_dir}/{long_name}/SKILL.md", + "skill_dir": f"{resolved_dir}/{long_name}", + }, + } + + with patch("agent.skill_commands.get_skill_commands", return_value=fake_cmds), \ + patch("tools.skills_tool.SKILLS_DIR", fake_skills_dir), \ + patch("agent.skill_utils.get_external_skills_dirs", return_value=[]): + entries, hidden = discord_skill_commands( + max_slots=100, reserved_names=set(), + ) + + assert len(entries) == 1 + name, desc, key = entries[0] + assert len(name) <= _CMD_NAME_LIMIT, "Name should be clamped to 32 chars" + assert key == cmd_key, ( + f"cmd_key must be the original /{long_name}, got {key!r}" + ) + + class TestTelegramMenuCommands: """Integration: telegram_menu_commands enforces the 32-char limit.""" diff --git a/tests/hermes_cli/test_discord_skill_clamp_warning.py b/tests/hermes_cli/test_discord_skill_clamp_warning.py index 541eeddc41..c9b686aae1 100644 --- a/tests/hermes_cli/test_discord_skill_clamp_warning.py +++ b/tests/hermes_cli/test_discord_skill_clamp_warning.py @@ -169,3 +169,78 @@ def test_no_collision_no_warning(tmp_path: Path, caplog) -> None: and ("clamp" in r.getMessage() or "reserved" in r.getMessage()) ] assert clamp_warnings == [] + + +def test_long_skill_name_preserves_cmd_key_through_by_category( + tmp_path: Path, +) -> None: + """Skills with names > 32 chars must keep their original cmd_key. + + ``discord_skill_commands_by_category`` clamps the display name to 32 + chars but the third tuple element (cmd_key) must stay as the original + ``/full-skill-name`` so that ``_skill_handler`` dispatches via + ``_run_simple_slash`` with the full command, not the truncated one. + + This is the actual runtime path used by the Discord adapter via + ``_refresh_skill_catalog_state``. + """ + from hermes_cli.commands import discord_skill_commands_by_category + + skills_dir = tmp_path / "skills" + skills_dir.mkdir() + resolved = str(skills_dir.resolve()) + + long_name = "generate-ascii-art-from-text-description-detailed" + cmd_key = f"/{long_name}" + fake_cmds = { + cmd_key: { + "name": long_name, + "description": "Generate ASCII art from a text description", + "skill_md_path": f"{resolved}/creative/{long_name}/SKILL.md", + "skill_dir": f"{resolved}/creative/{long_name}", + }, + "/short-skill": { + "name": "short-skill", + "description": "A short skill", + "skill_md_path": f"{resolved}/creative/short-skill/SKILL.md", + "skill_dir": f"{resolved}/creative/short-skill", + }, + } + + with patch("agent.skill_commands.get_skill_commands", return_value=fake_cmds), \ + patch("tools.skills_tool.SKILLS_DIR", skills_dir): + categories, uncategorized, hidden = discord_skill_commands_by_category( + reserved_names=set(), + ) + + # Flatten (same as _refresh_skill_catalog_state does) + entries = list(uncategorized) + for cat_skills in categories.values(): + entries.extend(cat_skills) + + # Build lookup (same as _refresh_skill_catalog_state does) + skill_lookup = {n: (d, k) for n, d, k in entries} + + # Find the long skill + long_entry = [e for e in entries if e[2] == cmd_key] + assert len(long_entry) == 1, f"Long skill should appear once, got: {long_entry}" + + display_name, desc, key = long_entry[0] + assert len(display_name) <= 32, ( + f"Display name should be clamped to 32 chars, got {len(display_name)}" + ) + assert key == cmd_key, ( + f"cmd_key must be the original /{long_name}, got {key!r}" + ) + + # Verify lookup works: clamped display name -> original cmd_key + assert display_name in skill_lookup + _desc, looked_up_key = skill_lookup[display_name] + assert looked_up_key == cmd_key, ( + f"Lookup must map clamped name to original cmd_key, got {looked_up_key!r}" + ) + + # Short skill should also be present and correct + short_entry = [e for e in entries if e[2] == "/short-skill"] + assert len(short_entry) == 1 + assert short_entry[0][0] == "short-skill" From 19ba9e43b621cbcc1488cd9f9c38050154386e37 Mon Sep 17 00:00:00 2001 From: 0xyg3n Date: Sun, 3 May 2026 16:13:39 +0530 Subject: [PATCH 42/61] fix(gateway/discord): require allowlist auth on slash commands Slash commands (_run_simple_slash, _handle_thread_create_slash) bypassed every DISCORD_ALLOWED_* gate enforced by on_message. Any guild member could invoke /background (RCE via terminal), /restart, /model, /skill, etc. CVSS 9.8 Critical. - _evaluate_slash_authorization mirrors on_message gates (user, role, channel, ignored channel) with fail-closed semantics - _check_slash_authorization sends ephemeral reject + logs + admin alert - Auth gate runs before defer() so rejections are ephemeral - /skill autocomplete returns [] for unauthorized users (no catalog leak) - Component views (ExecApproval, SlashConfirm, UpdatePrompt, ModelPicker) now honor role allowlists via shared _component_check_auth helper - Optional DISCORD_HIDE_SLASH_COMMANDS defense-in-depth - Cross-platform admin alert (Telegram/Slack fallback) on unauthorized attempts Based on PR #18125 by @0xyg3n. --- gateway/platforms/discord.py | 422 ++++++++++- gateway/run.py | 4 +- tests/gateway/test_discord_component_auth.py | 230 ++++++ tests/gateway/test_discord_slash_auth.py | 737 +++++++++++++++++++ tests/gateway/test_discord_slash_commands.py | 17 +- 5 files changed, 1392 insertions(+), 18 deletions(-) create mode 100644 tests/gateway/test_discord_component_auth.py create mode 100644 tests/gateway/test_discord_slash_auth.py diff --git a/gateway/platforms/discord.py b/gateway/platforms/discord.py index 369a607a90..243e81d3e8 100644 --- a/gateway/platforms/discord.py +++ b/gateway/platforms/discord.py @@ -497,6 +497,7 @@ class DiscordAdapter(BasePlatformAdapter): self._ready_event = asyncio.Event() self._allowed_user_ids: set = set() # For button approval authorization self._allowed_role_ids: set = set() # For DISCORD_ALLOWED_ROLES filtering + self.gateway_runner = None # Set by gateway/run.py for cross-platform delivery # Voice channel state (per-guild) self._voice_clients: Dict[int, Any] = {} # guild_id -> VoiceClient self._voice_locks: Dict[int, asyncio.Lock] = {} # guild_id -> serialize join/leave @@ -1929,6 +1930,225 @@ class DiscordAdapter(BasePlatformAdapter): return True return False + # ── Slash command authorization ───────────────────────────────────── + # Slash commands (``_run_simple_slash`` and ``_handle_thread_create_slash``) + # are a separate Discord interaction surface from regular messages and + # historically ran with NO authorization check — bypassing every gate + # ``on_message`` enforces (DISCORD_ALLOWED_USERS, DISCORD_ALLOWED_ROLES, + # DISCORD_ALLOWED_CHANNELS, DISCORD_IGNORED_CHANNELS). Any guild member + # could invoke ``/background``, ``/restart``, ``/sethome``, etc. as the + # operator. ``_check_slash_authorization`` mirrors the on_message gates + # one-for-one so the slash surface honors the same trust boundary. + # + # By design, this is a no-op for deployments with no allowlist env vars + # set — ``_is_allowed_user`` returns True and the channel checks early-out + # — preserving the existing "single-tenant, all guild members trusted" + # default. Deployments that DO set any DISCORD_ALLOWED_* var get slash + # parity with on_message. + + def _evaluate_slash_authorization( + self, interaction: "discord.Interaction", + ) -> Tuple[bool, Optional[str]]: + """Evaluate slash authorization without producing any response. + + Returns ``(allowed, reason)``. ``reason`` is populated only when + ``allowed`` is False. This is the shared core used by both the + responding wrapper (``_check_slash_authorization``) and side-effect- + free callers like the ``/skill`` autocomplete callback, which must + return an empty list for unauthorized users instead of leaking an + ephemeral rejection per-keystroke. + + Fail-closed semantics for malformed payloads: when an allowlist is + configured but the interaction is missing the data needed to + evaluate it (no channel id with channel policy active, no user + with user/role policy active), the gate REJECTS rather than + falling through. Without these guards a guild interaction that + happens to deserialize without a channel id would silently bypass + ``DISCORD_ALLOWED_CHANNELS`` and a payload missing ``user`` would + raise ``AttributeError`` in the user check below, surfacing as + an opaque interaction failure rather than a clean rejection. + """ + chan_obj = getattr(interaction, "channel", None) + in_dm = isinstance(chan_obj, discord.DMChannel) if chan_obj is not None else False + + # ── Channel scope (mirrors on_message lines 3374-3388) ── + # DMs aren't channel-gated — DMs follow on_message's DM lockdown + # path which has its own user-allowlist enforcement. + if not in_dm: + chan_id_raw = getattr(interaction, "channel_id", None) or getattr( + chan_obj, "id", None, + ) + channel_ids: set = set() + if chan_id_raw is not None: + channel_ids.add(str(chan_id_raw)) + # Mirror on_message: also test the parent channel for threads + # so per-channel allow/deny lists work consistently. + if isinstance(chan_obj, discord.Thread): + parent_id = self._get_parent_channel_id(chan_obj) + if parent_id: + channel_ids.add(str(parent_id)) + + allowed_raw = os.getenv("DISCORD_ALLOWED_CHANNELS", "") + if allowed_raw: + allowed = {c.strip() for c in allowed_raw.split(",") if c.strip()} + if "*" not in allowed: + if not channel_ids: + # Channel policy is configured but the interaction + # has no resolvable channel id. Fail closed. + return ( + False, + "channel id missing with DISCORD_ALLOWED_CHANNELS configured", + ) + if not (channel_ids & allowed): + return (False, "channel not in DISCORD_ALLOWED_CHANNELS") + + # Ignored beats allowed: even when a thread's parent channel + # is on the allowlist, an explicit DISCORD_IGNORED_CHANNELS + # entry on the thread or its parent rejects the interaction. + ignored_raw = os.getenv("DISCORD_IGNORED_CHANNELS", "") + if ignored_raw and channel_ids: + ignored = {c.strip() for c in ignored_raw.split(",") if c.strip()} + if "*" in ignored or (channel_ids & ignored): + return (False, "channel in DISCORD_IGNORED_CHANNELS") + + # ── User / role allowlist (mirrors on_message line 681) ── + user = getattr(interaction, "user", None) + allowed_users = getattr(self, "_allowed_user_ids", set()) or set() + allowed_roles = getattr(self, "_allowed_role_ids", set()) or set() + if user is None or getattr(user, "id", None) is None: + # No identifiable user. With any user/role allowlist + # configured, fail closed rather than raise AttributeError + # on ``interaction.user.id`` below. With no allowlist this + # is the existing "no allowlist = everyone" backwards-compat. + if allowed_users or allowed_roles: + return (False, "missing interaction.user with allowlist configured") + return (True, None) + + user_id = str(user.id) + if not self._is_allowed_user(user_id, author=user): + return ( + False, + "user not in DISCORD_ALLOWED_USERS / DISCORD_ALLOWED_ROLES", + ) + + return (True, None) + + async def _check_slash_authorization( + self, interaction: "discord.Interaction", command_text: str, + ) -> bool: + """Mirror on_message's user/role/channel gates onto a slash invocation. + + Returns True to proceed. Returns False *after* sending an ephemeral + rejection, logging a warning, and scheduling a cross-platform admin + alert — the caller must stop on False (the interaction has already + been responded to). + """ + allowed, reason = self._evaluate_slash_authorization(interaction) + if allowed: + return True + return await self._reject_slash( + interaction, command_text, reason=reason or "unauthorized", + ) + + async def _reject_slash( + self, interaction: "discord.Interaction", command_text: str, *, reason: str, + ) -> bool: + """Send ephemeral reject + log warning + schedule admin alert. Returns False. + + Tolerates a missing ``interaction.user`` -- the fail-closed branch + in ``_evaluate_slash_authorization`` deliberately routes here for + malformed payloads (no user) when an allowlist is configured, and + ``str(interaction.user.id)`` would raise AttributeError before the + ephemeral rejection could be sent. + """ + user = getattr(interaction, "user", None) + if user is not None: + user_id = str(getattr(user, "id", "?")) + user_name = getattr(user, "name", "?") + else: + user_id = "?" + user_name = "?" + chan_id = getattr(interaction, "channel_id", None) or getattr( + getattr(interaction, "channel", None), "id", None, + ) + guild_id = getattr(interaction, "guild_id", None) + + logger.warning( + "[Discord] Unauthorized slash attempt: user=%s id=%s channel=%s " + "guild=%s cmd=%r reason=%r", + user_name, user_id, chan_id, guild_id, command_text, reason, + ) + + try: + await interaction.response.send_message( + "You're not authorized to use this command.", + ephemeral=True, + ) + except Exception as e: + # Interaction may already be responded to (e.g. caller deferred + # before the auth check, or Discord retried). Best-effort only. + logger.debug("[Discord] Could not send unauthorized ephemeral: %s", e) + + # Fire-and-forget: don't block the interaction handler on Telegram I/O. + try: + asyncio.create_task(self._notify_unauthorized_slash( + user_name, user_id, chan_id, guild_id, command_text, reason, + )) + except Exception as e: + logger.debug("[Discord] Could not schedule admin notify task: %s", e) + + return False + + async def _notify_unauthorized_slash( + self, user_name: str, user_id: str, chan_id, guild_id, + command_text: str, reason: str, + ) -> None: + """Best-effort cross-platform alert to the gateway operator. + + Tries TELEGRAM first (most operators set TELEGRAM_HOME_CHANNEL), + then SLACK. Silently no-ops if no other platform is configured + with a home channel. + + A soft send failure -- adapter.send() returning a result with + ``success=False`` rather than raising -- continues the fallback + chain. Treating a SendResult(success=False) as delivered would + mean a Telegram outage that the adapter politely surfaces (e.g. + rate-limit, auth failure) silently swallows the alert without + attempting Slack. Hard exceptions still take the same path via + the except branch below. + """ + runner = getattr(self, "gateway_runner", None) + if not runner: + return + for target in (Platform.TELEGRAM, Platform.SLACK): + try: + adapter = runner.adapters.get(target) + if not adapter: + continue + home = runner.config.get_home_channel(target) + if not home or not getattr(home, "chat_id", None): + continue + msg = ( + "⚠️ Unauthorized Discord slash attempt\n" + f"User: {user_name} ({user_id})\n" + f"Channel: {chan_id} (guild {guild_id})\n" + f"Command: {command_text}\n" + f"Reason: {reason}" + ) + result = await adapter.send(str(home.chat_id), msg) + # Only return on confirmed delivery. SendResult(success=False) + # -> continue to the next platform. + if getattr(result, "success", None) is False: + logger.debug( + "[Discord] Admin notify via %s returned success=False" + " (error=%r); falling through", + target, getattr(result, "error", None), + ) + continue + return + except Exception as e: + logger.debug("[Discord] Admin notify via %s failed: %s", target, e) + async def send_image_file( self, chat_id: str, @@ -2316,6 +2536,11 @@ class DiscordAdapter(BasePlatformAdapter): except Exception: pass # logging must never block command dispatch + # Auth gate — must run before defer() so an ephemeral rejection can + # be delivered on the still-unresponded interaction. + if not await self._check_slash_authorization(interaction, command_text): + return + await interaction.response.defer(ephemeral=True) event = self._build_slash_event(interaction, command_text) await self.handle_message(event) @@ -2460,7 +2685,8 @@ class DiscordAdapter(BasePlatformAdapter): message: str = "", auto_archive_duration: int = 1440, ): - await interaction.response.defer(ephemeral=True) + # defer() is performed inside the handler *after* the auth gate + # so a rejected invoker can receive an ephemeral rejection. await self._handle_thread_create_slash(interaction, name, message, auto_archive_duration) @tree.command(name="queue", description="Queue a prompt for the next turn (doesn't interrupt)") @@ -2581,6 +2807,54 @@ class DiscordAdapter(BasePlatformAdapter): # supporting up to 25 categories × 25 skills = 625 skills. self._register_skill_group(tree) + # Optional defense-in-depth: hide every slash command from non-admin + # guild members in Discord's slash picker. Server-side authorization + # (``_check_slash_authorization``) is the actual gate; this is purely + # UX so users don't see commands they can't invoke. Off by default + # to preserve the slash UX for deployments that intentionally allow + # everyone in the guild. + if os.getenv("DISCORD_HIDE_SLASH_COMMANDS", "false").strip().lower() in ( + "true", "1", "yes", "on", + ): + self._apply_owner_only_visibility(tree) + + def _apply_owner_only_visibility(self, tree) -> None: + """Set default_member_permissions=0 on every registered slash command. + + Discord interprets ``Permissions(0)`` as "requires no permissions", + which paradoxically means the command is hidden from every guild + member except those with the Administrator permission. Server admins + can re-grant per user/role via Server Settings → Integrations → + → Permissions. + + Authoritative gate is ``_check_slash_authorization`` on every + invocation, which catches stale clients, role grants made by + mistake, and direct API calls bypassing Discord's UI hide. + """ + try: + no_perms = discord.Permissions(0) + except Exception as e: + logger.warning( + "[Discord] _apply_owner_only_visibility: cannot build Permissions(0): %s", + e, + ) + return + applied = 0 + for cmd in tree.get_commands(): + try: + cmd.default_permissions = no_perms + applied += 1 + except Exception as e: + logger.debug( + "[Discord] Could not set default_permissions on %r: %s", + getattr(cmd, "name", "?"), e, + ) + logger.info( + "[Discord] Hid %d slash command(s) from non-admin guild members " + "(opt-in defense in depth via DISCORD_HIDE_SLASH_COMMANDS).", + applied, + ) + def _register_skill_group(self, tree) -> None: """Register a single ``/skill`` command with autocomplete on the name. @@ -2635,9 +2909,25 @@ class DiscordAdapter(BasePlatformAdapter): PDFs even if the name doesn't. Discord caps this list at 25 entries per query. + Authorization: a quiet pre-check evaluates the slash + allowlists and returns ``[]`` for unauthorized users so + the installed skill catalog is not leaked to anyone who + can see the command in the picker. Returning a generic + empty list here is intentional — sending a per-keystroke + ephemeral rejection would produce a barrage of error + popups during typing. + Reads ``self._skill_entries`` so a ``/reload-skills`` run since process start shows up on the very next keystroke. """ + try: + allowed, _reason = self._evaluate_slash_authorization(interaction) + except Exception: + # Defensive: never raise from autocomplete. Fail + # closed by returning an empty suggestion list. + return [] + if not allowed: + return [] q = (current or "").strip().lower() choices: list = [] for name, desc, _key in self._skill_entries: @@ -2664,6 +2954,12 @@ class DiscordAdapter(BasePlatformAdapter): async def _skill_handler( interaction: "discord.Interaction", name: str, args: str = "", ): + # Authorize BEFORE any skill lookup so that known and + # unknown skill names produce identical rejections for + # unauthorized users (no probing the installed catalog + # via "Unknown skill: " responses). + if not await self._check_slash_authorization(interaction, "/skill"): + return entry = self._skill_lookup.get(name) if not entry: await interaction.response.send_message( @@ -2811,6 +3107,9 @@ class DiscordAdapter(BasePlatformAdapter): auto_archive_duration: int = 1440, ) -> None: """Create a Discord thread from a slash command and start a session in it.""" + if not await self._check_slash_authorization(interaction, "/thread"): + return + await interaction.response.defer(ephemeral=True) result = await self._create_thread( interaction, name=name, @@ -3105,6 +3404,7 @@ class DiscordAdapter(BasePlatformAdapter): view = ExecApprovalView( session_key=session_key, allowed_user_ids=self._allowed_user_ids, + allowed_role_ids=self._allowed_role_ids, ) msg = await channel.send(embed=embed, view=view) @@ -3143,6 +3443,7 @@ class DiscordAdapter(BasePlatformAdapter): session_key=session_key, confirm_id=confirm_id, allowed_user_ids=self._allowed_user_ids, + allowed_role_ids=self._allowed_role_ids, ) msg = await channel.send(embed=embed, view=view) @@ -3177,6 +3478,7 @@ class DiscordAdapter(BasePlatformAdapter): view = UpdatePromptView( session_key=session_key, allowed_user_ids=self._allowed_user_ids, + allowed_role_ids=self._allowed_role_ids, ) msg = await channel.send(embed=embed, view=view) return SendResult(success=True, message_id=str(msg.id)) @@ -3234,6 +3536,7 @@ class DiscordAdapter(BasePlatformAdapter): session_key=session_key, on_model_selected=on_model_selected, allowed_user_ids=self._allowed_user_ids, + allowed_role_ids=self._allowed_role_ids, ) msg = await channel.send(embed=embed, view=view) @@ -3789,6 +4092,72 @@ class DiscordAdapter(BasePlatformAdapter): # Discord UI Components (outside the adapter class) # --------------------------------------------------------------------------- + +def _component_check_auth( + interaction, + allowed_user_ids: Optional[set], + allowed_role_ids: Optional[set], +) -> bool: + """Shared user-or-role OR semantics for component view button clicks. + + Mirrors ``DiscordAdapter._is_allowed_user`` / the slash and on_message + gates so every Discord interaction surface honors the same trust + boundary. Component views (ExecApprovalView, SlashConfirmView, + UpdatePromptView, ModelPickerView) used to receive only + ``allowed_user_ids``: in role-only deployments + (DISCORD_ALLOWED_ROLES set, DISCORD_ALLOWED_USERS empty) the user + set was empty and the legacy "no allowlist = allow everyone" branch + let any guild member click the buttons -- approving exec commands, + cancelling slash confirmations, switching the model. + + Behavior: + + - both allowlists empty -> allow (preserves existing no-allowlist + deployments, no regression) + - user is in user allowlist -> allow + - role allowlist set + user has a role in it -> allow + - role allowlist set + interaction.user has no resolvable + ``roles`` attribute (e.g. DM context with a role policy active) + -> reject (fail closed) + - otherwise -> reject + """ + user_set = allowed_user_ids or set() + role_set = allowed_role_ids or set() + has_users = bool(user_set) + has_roles = bool(role_set) + if not has_users and not has_roles: + return True + + user = getattr(interaction, "user", None) + if user is None: + return False + + if has_users: + try: + uid = str(user.id) + except AttributeError: + uid = "" + if uid and uid in user_set: + return True + + if has_roles: + roles_attr = getattr(user, "roles", None) + if roles_attr is None: + # Role policy is configured but the interaction doesn't + # carry role data (DM-context Member, raw User payload). + # Fail closed: a user without a resolvable role list cannot + # satisfy a role allowlist. + return False + try: + user_role_ids = {getattr(r, "id", None) for r in roles_attr} + except TypeError: + return False + if user_role_ids & role_set: + return True + + return False + + if DISCORD_AVAILABLE: class ExecApprovalView(discord.ui.View): @@ -3801,17 +4170,23 @@ if DISCORD_AVAILABLE: Only users in the allowed list can click. Times out after 5 minutes. """ - def __init__(self, session_key: str, allowed_user_ids: set): + def __init__( + self, + session_key: str, + allowed_user_ids: set, + allowed_role_ids: Optional[set] = None, + ): super().__init__(timeout=300) # 5-minute timeout self.session_key = session_key self.allowed_user_ids = allowed_user_ids + self.allowed_role_ids = allowed_role_ids or set() self.resolved = False def _check_auth(self, interaction: discord.Interaction) -> bool: """Verify the user clicking is authorized.""" - if not self.allowed_user_ids: - return True # No allowlist = anyone can approve - return str(interaction.user.id) in self.allowed_user_ids + return _component_check_auth( + interaction, self.allowed_user_ids, self.allowed_role_ids, + ) async def _resolve( self, interaction: discord.Interaction, choice: str, @@ -3903,17 +4278,24 @@ if DISCORD_AVAILABLE: 5 minutes (matches the gateway primitive's timeout). """ - def __init__(self, session_key: str, confirm_id: str, allowed_user_ids: set): + def __init__( + self, + session_key: str, + confirm_id: str, + allowed_user_ids: set, + allowed_role_ids: Optional[set] = None, + ): super().__init__(timeout=300) self.session_key = session_key self.confirm_id = confirm_id self.allowed_user_ids = allowed_user_ids + self.allowed_role_ids = allowed_role_ids or set() self.resolved = False def _check_auth(self, interaction: discord.Interaction) -> bool: - if not self.allowed_user_ids: - return True - return str(interaction.user.id) in self.allowed_user_ids + return _component_check_auth( + interaction, self.allowed_user_ids, self.allowed_role_ids, + ) async def _resolve( self, interaction: discord.Interaction, choice: str, @@ -3991,16 +4373,22 @@ if DISCORD_AVAILABLE: 5-minute timeout on its side). """ - def __init__(self, session_key: str, allowed_user_ids: set): + def __init__( + self, + session_key: str, + allowed_user_ids: set, + allowed_role_ids: Optional[set] = None, + ): super().__init__(timeout=300) self.session_key = session_key self.allowed_user_ids = allowed_user_ids + self.allowed_role_ids = allowed_role_ids or set() self.resolved = False def _check_auth(self, interaction: discord.Interaction) -> bool: - if not self.allowed_user_ids: - return True - return str(interaction.user.id) in self.allowed_user_ids + return _component_check_auth( + interaction, self.allowed_user_ids, self.allowed_role_ids, + ) async def _respond( self, interaction: discord.Interaction, answer: str, @@ -4077,6 +4465,7 @@ if DISCORD_AVAILABLE: session_key: str, on_model_selected, allowed_user_ids: set, + allowed_role_ids: Optional[set] = None, ): super().__init__(timeout=120) self.providers = providers @@ -4085,15 +4474,16 @@ if DISCORD_AVAILABLE: self.session_key = session_key self.on_model_selected = on_model_selected self.allowed_user_ids = allowed_user_ids + self.allowed_role_ids = allowed_role_ids or set() self.resolved = False self._selected_provider: str = "" self._build_provider_select() def _check_auth(self, interaction: discord.Interaction) -> bool: - if not self.allowed_user_ids: - return True - return str(interaction.user.id) in self.allowed_user_ids + return _component_check_auth( + interaction, self.allowed_user_ids, self.allowed_role_ids, + ) def _build_provider_select(self): """Build the provider dropdown menu.""" diff --git a/gateway/run.py b/gateway/run.py index db6fcc9756..97f72121bb 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -3982,7 +3982,9 @@ class GatewayRunner: if not check_discord_requirements(): logger.warning("Discord: discord.py not installed") return None - return DiscordAdapter(config) + adapter = DiscordAdapter(config) + adapter.gateway_runner = self # For cross-platform admin alerts on unauthorized slash + return adapter elif platform == Platform.WHATSAPP: from gateway.platforms.whatsapp import WhatsAppAdapter, check_whatsapp_requirements diff --git a/tests/gateway/test_discord_component_auth.py b/tests/gateway/test_discord_component_auth.py new file mode 100644 index 0000000000..5758e82561 --- /dev/null +++ b/tests/gateway/test_discord_component_auth.py @@ -0,0 +1,230 @@ +"""Security regression tests: Discord component views honor role allowlists. + +The four interactive component views (ExecApprovalView, SlashConfirmView, +UpdatePromptView, ModelPickerView) historically accepted only +``allowed_user_ids``. Deployments that configure DISCORD_ALLOWED_ROLES +without DISCORD_ALLOWED_USERS therefore had a wide-open component +surface: any guild member who could see the prompt could approve exec +commands, cancel slash confirmations, or switch the model -- even when +the same user would be rejected at the slash and on_message gates. + +These tests pin the user-or-role OR semantics and the fail-closed +behavior on missing role data so the parity cannot regress. +""" + +from types import SimpleNamespace + +import pytest + +# Trigger the shared discord mock from tests/gateway/conftest.py before +# importing the production module. +from gateway.platforms.discord import ( # noqa: E402 + ExecApprovalView, + ModelPickerView, + SlashConfirmView, + UpdatePromptView, + _component_check_auth, +) + + +# --------------------------------------------------------------------------- +# Direct helper coverage -- the four views all delegate to this helper, so +# pinning the helper's contract pins all four call sites. +# --------------------------------------------------------------------------- + + +def _interaction(user_id, role_ids=None, *, drop_user=False, drop_roles=False): + """Build a mock interaction with the requested user/role shape. + + drop_user simulates a payload whose .user attribute is None. + drop_roles simulates a payload where .user has no .roles attribute + at all (DM-context Member, raw User payload). + """ + if drop_user: + return SimpleNamespace(user=None) + + user_kwargs = {"id": user_id} + if not drop_roles: + user_kwargs["roles"] = [SimpleNamespace(id=r) for r in (role_ids or [])] + return SimpleNamespace(user=SimpleNamespace(**user_kwargs)) + + +# ── back-compat: empty allowlists -> allow everyone ──────────────────────── + + +def test_component_check_empty_allowlists_allows_everyone(): + """SECURITY-CRITICAL backwards-compat: deployments without any + DISCORD_ALLOWED_* env vars set must continue to allow component + interactions from anyone (no regression for unconfigured setups).""" + interaction = _interaction(11111) + assert _component_check_auth(interaction, set(), set()) is True + assert _component_check_auth(interaction, None, None) is True + + +# ── user allowlist ───────────────────────────────────────────────────────── + + +def test_component_check_user_in_user_allowlist_passes(): + interaction = _interaction(11111) + assert _component_check_auth(interaction, {"11111"}, set()) is True + + +def test_component_check_user_not_in_user_allowlist_rejected(): + interaction = _interaction(99999) + assert _component_check_auth(interaction, {"11111"}, set()) is False + + +# ── role allowlist OR semantics ──────────────────────────────────────────── + + +def test_component_check_role_only_user_with_matching_role_passes(): + """Role-only deployment (DISCORD_ALLOWED_ROLES set, DISCORD_ALLOWED_USERS + empty) where the user is not in the empty user list but DOES carry a + matching role: must pass. This is the regression that prompted the + fix -- previously _check_auth allowed everyone when the user set was + empty, ignoring the role allowlist.""" + interaction = _interaction(99999, role_ids=[42]) + assert _component_check_auth(interaction, set(), {42}) is True + + +def test_component_check_role_only_user_without_matching_role_rejected(): + """Role-only deployment where the user has no matching role: reject. + Previously this allowed everyone because allowed_user_ids was empty.""" + interaction = _interaction(99999, role_ids=[7, 8]) + assert _component_check_auth(interaction, set(), {42}) is False + + +def test_component_check_user_or_role_user_match(): + """Both allowlists set; user matches user allowlist: pass.""" + interaction = _interaction(11111, role_ids=[7]) + assert _component_check_auth(interaction, {"11111"}, {42}) is True + + +def test_component_check_user_or_role_role_match(): + """Both allowlists set; user not in user list but in role list: pass.""" + interaction = _interaction(99999, role_ids=[42]) + assert _component_check_auth(interaction, {"11111"}, {42}) is True + + +def test_component_check_user_or_role_neither_match(): + """Both allowlists set; user matches neither: reject.""" + interaction = _interaction(99999, role_ids=[7]) + assert _component_check_auth(interaction, {"11111"}, {42}) is False + + +# ── fail-closed on missing role data ─────────────────────────────────────── + + +def test_component_check_role_policy_with_no_roles_attr_rejects(): + """Role allowlist configured but interaction.user has no .roles + attribute (DM-context Member, raw User payload): must reject. A user + without resolvable roles cannot satisfy a role allowlist.""" + interaction = _interaction(11111, drop_roles=True) + assert _component_check_auth(interaction, set(), {42}) is False + + +def test_component_check_missing_user_with_allowlist_rejects(): + """interaction.user is None with any allowlist configured: fail + closed without raising AttributeError.""" + interaction = _interaction(0, drop_user=True) + assert _component_check_auth(interaction, {"11111"}, set()) is False + assert _component_check_auth(interaction, set(), {42}) is False + + +# --------------------------------------------------------------------------- +# View construction: every view must accept allowed_role_ids and route +# through the shared helper. Default value preserves prior call-sites. +# --------------------------------------------------------------------------- + + +def test_exec_approval_view_accepts_role_allowlist(): + view = ExecApprovalView( + session_key="sess-1", + allowed_user_ids={"11111"}, + allowed_role_ids={42}, + ) + # Role-only user passes + assert view._check_auth(_interaction(99999, role_ids=[42])) is True + # Neither user nor role match: reject + assert view._check_auth(_interaction(99999, role_ids=[7])) is False + + +def test_exec_approval_view_role_default_is_empty_set(): + """Existing call sites that pass only allowed_user_ids must continue + working with the legacy semantics (no role gate).""" + view = ExecApprovalView(session_key="sess-1", allowed_user_ids={"11111"}) + assert view.allowed_role_ids == set() + assert view._check_auth(_interaction(11111)) is True + assert view._check_auth(_interaction(99999)) is False + + +def test_slash_confirm_view_accepts_role_allowlist(): + view = SlashConfirmView( + session_key="sess-1", + confirm_id="c1", + allowed_user_ids=set(), + allowed_role_ids={42}, + ) + assert view._check_auth(_interaction(99999, role_ids=[42])) is True + assert view._check_auth(_interaction(99999, role_ids=[7])) is False + + +def test_update_prompt_view_accepts_role_allowlist(): + view = UpdatePromptView( + session_key="sess-1", + allowed_user_ids=set(), + allowed_role_ids={42}, + ) + assert view._check_auth(_interaction(99999, role_ids=[42])) is True + assert view._check_auth(_interaction(99999, role_ids=[7])) is False + + +def test_model_picker_view_accepts_role_allowlist(): + async def _noop(*_a, **_k): + return "" + + view = ModelPickerView( + providers=[], + current_model="m", + current_provider="p", + session_key="sess-1", + on_model_selected=_noop, + allowed_user_ids=set(), + allowed_role_ids={42}, + ) + assert view._check_auth(_interaction(99999, role_ids=[42])) is True + assert view._check_auth(_interaction(99999, role_ids=[7])) is False + + +# --------------------------------------------------------------------------- +# Empty allowlists across views: legacy "allow everyone" must hold. +# --------------------------------------------------------------------------- + + +@pytest.mark.parametrize( + "view_factory", + [ + lambda: ExecApprovalView(session_key="s", allowed_user_ids=set()), + lambda: SlashConfirmView(session_key="s", confirm_id="c", allowed_user_ids=set()), + lambda: UpdatePromptView(session_key="s", allowed_user_ids=set()), + ], +) +def test_views_empty_allowlists_allow_everyone(view_factory): + view = view_factory() + assert view._check_auth(_interaction(99999)) is True + + +def test_model_picker_view_empty_allowlists_allow_everyone(): + async def _noop(*_a, **_k): + return "" + + view = ModelPickerView( + providers=[], + current_model="m", + current_provider="p", + session_key="s", + on_model_selected=_noop, + allowed_user_ids=set(), + ) + assert view.allowed_role_ids == set() + assert view._check_auth(_interaction(99999)) is True diff --git a/tests/gateway/test_discord_slash_auth.py b/tests/gateway/test_discord_slash_auth.py new file mode 100644 index 0000000000..a52ee1fd7e --- /dev/null +++ b/tests/gateway/test_discord_slash_auth.py @@ -0,0 +1,737 @@ +"""Security regression tests: slash commands honor on_message authorization gates. + +Slash invocations (``_run_simple_slash``, ``_handle_thread_create_slash``) +historically bypassed every gate ``on_message`` enforces — DISCORD_ALLOWED_USERS, +DISCORD_ALLOWED_ROLES, DISCORD_ALLOWED_CHANNELS, DISCORD_IGNORED_CHANNELS. +Any guild member could invoke ``/background``, ``/restart``, etc. as the +operator. ``_check_slash_authorization`` mirrors all four gates one-for-one. + +These tests pin the security-correct behavior so the bypass cannot regress. +""" + +import asyncio +import logging +import sys +from types import SimpleNamespace +from unittest.mock import AsyncMock, MagicMock + +import pytest + +from gateway.config import PlatformConfig + + +# --------------------------------------------------------------------------- +# Discord module mock — borrowed from test_discord_slash_commands.py so this +# file runs on machines without discord.py installed. +# --------------------------------------------------------------------------- + + +def _ensure_discord_mock(): + if "discord" in sys.modules and hasattr(sys.modules["discord"], "__file__"): + return # real discord installed + + if sys.modules.get("discord") is None: + discord_mod = MagicMock() + discord_mod.Intents.default.return_value = MagicMock() + discord_mod.DMChannel = type("DMChannel", (), {}) + discord_mod.Thread = type("Thread", (), {}) + discord_mod.ForumChannel = type("ForumChannel", (), {}) + discord_mod.Interaction = object + + class _FakePermissions: + def __init__(self, value=0, **_): + self.value = value + + discord_mod.Permissions = _FakePermissions + + class _FakeGroup: + def __init__(self, *, name, description, parent=None): + self.name = name + self.description = description + self.parent = parent + self._children: dict[str, object] = {} + if parent is not None: + parent.add_command(self) + + def add_command(self, cmd): + self._children[cmd.name] = cmd + + class _FakeCommand: + def __init__(self, *, name, description, callback, parent=None): + self.name = name + self.description = description + self.callback = callback + self.parent = parent + self.default_permissions = None + + discord_mod.app_commands = SimpleNamespace( + describe=lambda **kwargs: (lambda fn: fn), + choices=lambda **kwargs: (lambda fn: fn), + autocomplete=lambda **kwargs: (lambda fn: fn), + Choice=lambda **kwargs: SimpleNamespace(**kwargs), + Group=_FakeGroup, + Command=_FakeCommand, + ) + + ext_mod = MagicMock() + commands_mod = MagicMock() + commands_mod.Bot = MagicMock + ext_mod.commands = commands_mod + + sys.modules["discord"] = discord_mod + sys.modules.setdefault("discord.ext", ext_mod) + sys.modules.setdefault("discord.ext.commands", commands_mod) + + +_ensure_discord_mock() + +from gateway.platforms.discord import DiscordAdapter # noqa: E402 + + +@pytest.fixture(autouse=True) +def _isolate_discord_env(monkeypatch): + for var in ( + "DISCORD_ALLOWED_USERS", + "DISCORD_ALLOWED_ROLES", + "DISCORD_ALLOWED_CHANNELS", + "DISCORD_IGNORED_CHANNELS", + "DISCORD_HIDE_SLASH_COMMANDS", + "DISCORD_ALLOW_BOTS", + ): + monkeypatch.delenv(var, raising=False) + + +@pytest.fixture(autouse=True) +def _stub_discord_permissions(monkeypatch): + """Pin discord.Permissions to a plain stand-in so tests can assert the + bitfield value regardless of whether real discord.py or a sibling test + module's MagicMock is loaded.""" + import discord + + class _Perm: + def __init__(self, value=0, **_): + self.value = value + + monkeypatch.setattr(discord, "Permissions", _Perm) + + +@pytest.fixture +def adapter(): + config = PlatformConfig(enabled=True, token="***") + a = DiscordAdapter(config) + a._client = SimpleNamespace(user=SimpleNamespace(id=99999, name="HermesBot"), guilds=[]) + return a + + +_SENTINEL = object() + + +def _make_interaction( + user_id, *, channel_id=12345, guild_id=42, in_dm=False, in_thread=False, + parent_channel_id=None, user=_SENTINEL, +): + """Build a mock Discord Interaction with a still-unresponded response. + + ``channel_id`` may be set to ``None`` to simulate a guild interaction + payload missing a resolvable channel id (fail-closed exercise). + Pass ``user=None`` to simulate a payload missing the user object. + """ + import discord + + response = SimpleNamespace(send_message=AsyncMock(), defer=AsyncMock()) + + if in_dm: + channel = discord.DMChannel() + elif in_thread: + channel = discord.Thread() + channel.id = channel_id + channel.parent_id = parent_channel_id + elif channel_id is None: + channel = None + else: + channel = SimpleNamespace(id=channel_id) + + if user is _SENTINEL: + user_obj = SimpleNamespace(id=int(user_id), name=f"user_{user_id}") + else: + user_obj = user + + return SimpleNamespace( + user=user_obj, + guild=SimpleNamespace(owner_id=999), + guild_id=guild_id, + channel_id=channel_id, + channel=channel, + response=response, + ) + + +# --------------------------------------------------------------------------- +# Backwards-compat: empty allowlist → everything passes (matches on_message) +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_no_allowlist_allows_everyone(adapter): + """SECURITY-CRITICAL backwards-compat: deployments without any allowlist + env vars set must see ZERO behavior change. on_message lets everyone + through in this case (returns True at line 1890); slash must do the same. + """ + interaction = _make_interaction("999999999") + assert await adapter._check_slash_authorization(interaction, "/help") is True + interaction.response.send_message.assert_not_awaited() + + +@pytest.mark.asyncio +async def test_no_allowlist_dm_also_allowed(adapter): + """Same for DMs — no allowlist means no restriction, matching on_message.""" + interaction = _make_interaction("999999999", in_dm=True) + assert await adapter._check_slash_authorization(interaction, "/help") is True + + +# --------------------------------------------------------------------------- +# User allowlist (DISCORD_ALLOWED_USERS) parity +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_allowed_user_passes(adapter): + adapter._allowed_user_ids = {"100200300"} + interaction = _make_interaction("100200300") + assert await adapter._check_slash_authorization(interaction, "/background hi") is True + interaction.response.send_message.assert_not_awaited() + + +@pytest.mark.asyncio +async def test_disallowed_user_rejected_with_ephemeral(adapter, caplog): + adapter._allowed_user_ids = {"100200300"} + interaction = _make_interaction("999999999") + with caplog.at_level(logging.WARNING): + assert await adapter._check_slash_authorization(interaction, "/background hi") is False + interaction.response.send_message.assert_awaited_once() + args, kwargs = interaction.response.send_message.call_args + assert kwargs.get("ephemeral") is True + assert "not authorized" in (args[0] if args else kwargs.get("content", "")).lower() + assert any("Unauthorized slash attempt" in r.message for r in caplog.records) + assert any("DISCORD_ALLOWED_USERS" in r.message for r in caplog.records) + + +# --------------------------------------------------------------------------- +# Role allowlist (DISCORD_ALLOWED_ROLES) parity +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_role_member_passes(adapter): + """A user whose Member.roles includes an allowed role passes the gate.""" + adapter._allowed_role_ids = {1234} + interaction = _make_interaction("999999999") + interaction.user.roles = [SimpleNamespace(id=1234)] + assert await adapter._check_slash_authorization(interaction, "/help") is True + + +@pytest.mark.asyncio +async def test_role_non_member_rejected(adapter): + """A user without any matching role is rejected even if no user allowlist.""" + adapter._allowed_role_ids = {1234} + interaction = _make_interaction("999999999") + interaction.user.roles = [SimpleNamespace(id=9999)] # different role + assert await adapter._check_slash_authorization(interaction, "/help") is False + + +# --------------------------------------------------------------------------- +# Channel allowlist (DISCORD_ALLOWED_CHANNELS) parity — the gate prajer used +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_channel_not_in_allowlist_rejected(adapter, monkeypatch, caplog): + """on_message blocks messages in channels not in DISCORD_ALLOWED_CHANNELS; + slash must do the same. This is the EXACT bypass prajer exploited. + """ + monkeypatch.setenv("DISCORD_ALLOWED_CHANNELS", "1111,2222") + interaction = _make_interaction("100200300", channel_id=9999) + with caplog.at_level(logging.WARNING): + assert await adapter._check_slash_authorization(interaction, "/background hi") is False + assert any("DISCORD_ALLOWED_CHANNELS" in r.message for r in caplog.records) + + +@pytest.mark.asyncio +async def test_channel_in_allowlist_passes(adapter, monkeypatch): + monkeypatch.setenv("DISCORD_ALLOWED_CHANNELS", "1111,2222") + interaction = _make_interaction("100200300", channel_id=1111) + assert await adapter._check_slash_authorization(interaction, "/help") is True + + +@pytest.mark.asyncio +async def test_channel_allowlist_wildcard_passes(adapter, monkeypatch): + """``*`` in DISCORD_ALLOWED_CHANNELS = allow any channel, matching on_message.""" + monkeypatch.setenv("DISCORD_ALLOWED_CHANNELS", "*") + interaction = _make_interaction("100200300", channel_id=9999) + assert await adapter._check_slash_authorization(interaction, "/help") is True + + +@pytest.mark.asyncio +async def test_channel_allowlist_does_not_apply_to_dms(adapter, monkeypatch): + """DMs aren't channel-gated — they go through on_message's DM lockdown.""" + monkeypatch.setenv("DISCORD_ALLOWED_CHANNELS", "1111") + interaction = _make_interaction("100200300", in_dm=True) + assert await adapter._check_slash_authorization(interaction, "/help") is True + + +# --------------------------------------------------------------------------- +# Channel blocklist (DISCORD_IGNORED_CHANNELS) parity +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_ignored_channel_rejected(adapter, monkeypatch, caplog): + monkeypatch.setenv("DISCORD_IGNORED_CHANNELS", "9999") + interaction = _make_interaction("100200300", channel_id=9999) + with caplog.at_level(logging.WARNING): + assert await adapter._check_slash_authorization(interaction, "/help") is False + assert any("DISCORD_IGNORED_CHANNELS" in r.message for r in caplog.records) + + +@pytest.mark.asyncio +async def test_ignored_channel_wildcard_blocks_all(adapter, monkeypatch): + monkeypatch.setenv("DISCORD_IGNORED_CHANNELS", "*") + interaction = _make_interaction("100200300", channel_id=9999) + assert await adapter._check_slash_authorization(interaction, "/help") is False + + +# --------------------------------------------------------------------------- +# Cross-platform admin notification +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_unauthorized_attempt_notifies_telegram(adapter): + from gateway.session import Platform + + telegram_adapter = SimpleNamespace(send=AsyncMock()) + home = SimpleNamespace(chat_id="987654321") + runner = SimpleNamespace( + adapters={Platform.TELEGRAM: telegram_adapter}, + config=SimpleNamespace(get_home_channel=lambda p: home if p is Platform.TELEGRAM else None), + ) + adapter.gateway_runner = runner + adapter._allowed_user_ids = {"100200300"} + + interaction = _make_interaction("999999999") + await adapter._check_slash_authorization(interaction, "/background hi") + + # Notify is fire-and-forget — let the scheduled task run. + await asyncio.sleep(0) + await asyncio.sleep(0) + + telegram_adapter.send.assert_awaited_once() + chat_id, msg = telegram_adapter.send.call_args.args + assert chat_id == "987654321" + assert "Unauthorized" in msg + assert "999999999" in msg + assert "/background hi" in msg + assert "DISCORD_ALLOWED_USERS" in msg + + +@pytest.mark.asyncio +async def test_notify_silently_no_ops_without_runner(adapter): + adapter.gateway_runner = None + await adapter._notify_unauthorized_slash("u", "1", 2, 3, "/x", "reason") # must not raise + + +@pytest.mark.asyncio +async def test_notify_falls_back_to_slack_if_no_telegram(adapter): + from gateway.session import Platform + + slack_adapter = SimpleNamespace(send=AsyncMock()) + home_slack = SimpleNamespace(chat_id="C12345") + runner = SimpleNamespace( + adapters={Platform.SLACK: slack_adapter}, + config=SimpleNamespace( + get_home_channel=lambda p: home_slack if p is Platform.SLACK else None, + ), + ) + adapter.gateway_runner = runner + await adapter._notify_unauthorized_slash("u", "1", 2, 3, "/x", "reason") + slack_adapter.send.assert_awaited_once() + + +# --------------------------------------------------------------------------- +# Opt-in visibility hide +# --------------------------------------------------------------------------- + + +def test_visibility_hide_off_by_default_is_noop(adapter, monkeypatch): + """DISCORD_HIDE_SLASH_COMMANDS unset → don't touch any command's permissions.""" + cmd = SimpleNamespace(name="x", default_permissions="UNCHANGED") + tree = SimpleNamespace(get_commands=lambda: [cmd]) + + # Re-run the registration tail logic by calling the bit that decides: + # we don't have a clean way to simulate the env-gated branch from + # _register_slash_commands, so we just confirm the helper itself works + # AND assert the env-gating logic is correct. + assert os.environ.get("DISCORD_HIDE_SLASH_COMMANDS") is None + # Helper should still work when called directly: + adapter._apply_owner_only_visibility(tree) + # When called directly the helper applies — env gating is at the call site, + # which we exercise in an integration-style test below. + + +def test_visibility_hide_helper_zeroes_perms(adapter): + cmd_a = SimpleNamespace(name="a", default_permissions=None) + cmd_b = SimpleNamespace(name="b", default_permissions=None) + tree = SimpleNamespace(get_commands=lambda: [cmd_a, cmd_b]) + adapter._apply_owner_only_visibility(tree) + assert cmd_a.default_permissions is not None + assert cmd_b.default_permissions is not None + assert cmd_a.default_permissions.value == 0 + assert cmd_b.default_permissions.value == 0 + + +def test_visibility_hide_tolerates_unsetable_command(adapter, caplog): + class _Frozen: + __slots__ = ("name",) + def __init__(self, name): + self.name = name + + cmd_ok = SimpleNamespace(name="ok", default_permissions=None) + cmd_bad = _Frozen("bad") + tree = SimpleNamespace(get_commands=lambda: [cmd_bad, cmd_ok]) + + with caplog.at_level(logging.DEBUG): + adapter._apply_owner_only_visibility(tree) + + assert cmd_ok.default_permissions.value == 0 + + +# os import for test_visibility_hide_off_by_default_is_noop +import os # noqa: E402 + + +# --------------------------------------------------------------------------- +# Fail-closed parity on malformed slash auth context +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_missing_channel_id_rejected_when_channel_policy_configured( + adapter, monkeypatch, +): + """A guild interaction without a resolvable channel id must fail + closed when DISCORD_ALLOWED_CHANNELS is configured. Without this + guard the entire channel-policy block silently fell through.""" + monkeypatch.setenv("DISCORD_ALLOWED_CHANNELS", "1111,2222") + interaction = _make_interaction("100200300", channel_id=None) + assert await adapter._check_slash_authorization(interaction, "/help") is False + interaction.response.send_message.assert_awaited_once() + + +@pytest.mark.asyncio +async def test_missing_channel_id_allowed_when_no_channel_policy(adapter): + """No DISCORD_ALLOWED_CHANNELS configured + missing channel id: still + pass through the channel block (matches no-allowlist default).""" + interaction = _make_interaction("100200300", channel_id=None) + assert await adapter._check_slash_authorization(interaction, "/help") is True + + +@pytest.mark.asyncio +async def test_missing_user_rejected_when_allowlist_configured(adapter): + """interaction.user is None with a user/role allowlist active: + fail closed without raising AttributeError.""" + adapter._allowed_user_ids = {"100200300"} + interaction = _make_interaction("100200300", user=None) + # Must not raise — must return False with an ephemeral rejection + assert await adapter._check_slash_authorization(interaction, "/help") is False + interaction.response.send_message.assert_awaited_once() + + +@pytest.mark.asyncio +async def test_missing_user_allowed_when_no_allowlist_configured(adapter): + """interaction.user is None but no allowlist configured: allow + (preserves no-allowlist back-compat -- anyone is allowed when no + policy is in effect).""" + interaction = _make_interaction("100200300", user=None) + assert await adapter._check_slash_authorization(interaction, "/help") is True + + +# --------------------------------------------------------------------------- +# Thread parent channel allowlist parity +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_thread_parent_in_allowlist_passes(adapter, monkeypatch): + """Thread whose parent channel is on DISCORD_ALLOWED_CHANNELS passes + even though the thread id itself isn't on the list.""" + monkeypatch.setenv("DISCORD_ALLOWED_CHANNELS", "5555") + interaction = _make_interaction( + "100200300", channel_id=9999, in_thread=True, parent_channel_id=5555, + ) + assert await adapter._check_slash_authorization(interaction, "/help") is True + + +@pytest.mark.asyncio +async def test_thread_parent_in_ignorelist_rejects(adapter, monkeypatch): + """Thread whose parent channel is on DISCORD_IGNORED_CHANNELS rejects + even when the thread id itself isn't ignored.""" + monkeypatch.setenv("DISCORD_IGNORED_CHANNELS", "5555") + interaction = _make_interaction( + "100200300", channel_id=9999, in_thread=True, parent_channel_id=5555, + ) + assert await adapter._check_slash_authorization(interaction, "/help") is False + + +@pytest.mark.asyncio +async def test_ignored_beats_allowed(adapter, monkeypatch): + """Channel listed in BOTH allowed and ignored: the ignored entry wins. + Anything else would be a foot-gun where adding to ignored does nothing + if the channel is also explicitly allowed.""" + monkeypatch.setenv("DISCORD_ALLOWED_CHANNELS", "1111") + monkeypatch.setenv("DISCORD_IGNORED_CHANNELS", "1111") + interaction = _make_interaction("100200300", channel_id=1111) + assert await adapter._check_slash_authorization(interaction, "/help") is False + + +# --------------------------------------------------------------------------- +# Admin notify soft-fail fallback +# --------------------------------------------------------------------------- + + +@pytest.mark.asyncio +async def test_notify_falls_back_to_slack_on_telegram_soft_fail(adapter): + """adapter.send returning SendResult(success=False) must NOT short- + circuit the fallback chain. Treating a soft failure as delivered + means a Telegram outage swallows alerts silently.""" + from gateway.session import Platform + + soft_fail = SimpleNamespace(success=False, error="rate limited") + telegram_adapter = SimpleNamespace(send=AsyncMock(return_value=soft_fail)) + slack_adapter = SimpleNamespace(send=AsyncMock()) + home_tg = SimpleNamespace(chat_id="987654321") + home_sl = SimpleNamespace(chat_id="C12345") + homes = {Platform.TELEGRAM: home_tg, Platform.SLACK: home_sl} + runner = SimpleNamespace( + adapters={ + Platform.TELEGRAM: telegram_adapter, + Platform.SLACK: slack_adapter, + }, + config=SimpleNamespace(get_home_channel=lambda p: homes.get(p)), + ) + adapter.gateway_runner = runner + + await adapter._notify_unauthorized_slash("u", "1", 2, 3, "/x", "reason") + + telegram_adapter.send.assert_awaited_once() + slack_adapter.send.assert_awaited_once() + + +@pytest.mark.asyncio +async def test_notify_returns_on_telegram_truthy_success(adapter): + """adapter.send returning SendResult(success=True) -- or any object + without a falsy success attribute -- should still short-circuit at + Telegram. (This guards against the soft-fail patch over-correcting.)""" + from gateway.session import Platform + + ok = SimpleNamespace(success=True, message_id="m1") + telegram_adapter = SimpleNamespace(send=AsyncMock(return_value=ok)) + slack_adapter = SimpleNamespace(send=AsyncMock()) + home_tg = SimpleNamespace(chat_id="987654321") + home_sl = SimpleNamespace(chat_id="C12345") + homes = {Platform.TELEGRAM: home_tg, Platform.SLACK: home_sl} + runner = SimpleNamespace( + adapters={ + Platform.TELEGRAM: telegram_adapter, + Platform.SLACK: slack_adapter, + }, + config=SimpleNamespace(get_home_channel=lambda p: homes.get(p)), + ) + adapter.gateway_runner = runner + + await adapter._notify_unauthorized_slash("u", "1", 2, 3, "/x", "reason") + + telegram_adapter.send.assert_awaited_once() + slack_adapter.send.assert_not_awaited() + + +# --------------------------------------------------------------------------- +# /skill autocomplete + callback gating +# --------------------------------------------------------------------------- + + +def _capture_skill_registration(adapter, monkeypatch, entries): + """Run ``_register_skill_group`` against a stubbed skill catalog and + return ``(handler_callback, autocomplete_callback)``. + + The autocomplete callback is captured by monkeypatching + ``discord.app_commands.autocomplete`` -- the production decorator is + a no-op stub in this test file's discord mock, so capturing the + callback through it is the direct route in tests. + """ + import discord + + captured: dict = {} + + def fake_categories(reserved_names): + # Match discord_skill_commands_by_category's tuple shape: + # (categories_dict, uncategorized_list, hidden_count) + return ({}, list(entries), 0) + + import hermes_cli.commands as _hc + monkeypatch.setattr( + _hc, "discord_skill_commands_by_category", fake_categories, + ) + + def capture_autocomplete(**kwargs): + # Only one autocomplete in /skill registration: name=... + captured["autocomplete"] = kwargs.get("name") + + def _passthrough(fn): + return fn + + return _passthrough + + monkeypatch.setattr( + discord.app_commands, "autocomplete", capture_autocomplete, + raising=False, + ) + + registered: list = [] + + class _Tree: + def get_commands(self): + return [] + + def add_command(self, cmd): + registered.append(cmd) + + adapter._register_skill_group(_Tree()) + assert registered, "_register_skill_group did not register a command" + return registered[0].callback, captured["autocomplete"] + + +@pytest.mark.asyncio +async def test_skill_autocomplete_returns_empty_for_unauthorized( + adapter, monkeypatch, +): + """Autocomplete must not leak the installed skill catalog to users + who can't run /skill. With DISCORD_ALLOWED_USERS configured and the + interaction user outside it, the autocomplete callback returns [].""" + adapter._allowed_user_ids = {"100200300"} + entries = [ + ("alpha", "First skill", "/alpha"), + ("beta", "Second skill", "/beta"), + ] + _handler, autocomplete = _capture_skill_registration( + adapter, monkeypatch, entries, + ) + + interaction = _make_interaction("999999999") + result = await autocomplete(interaction, "") + assert result == [] + + +@pytest.mark.asyncio +async def test_skill_autocomplete_returns_choices_for_authorized( + adapter, monkeypatch, +): + """Sanity: an authorized user still gets the autocomplete suggestions.""" + adapter._allowed_user_ids = {"100200300"} + entries = [ + ("alpha", "First skill", "/alpha"), + ("beta", "Second skill", "/beta"), + ] + _handler, autocomplete = _capture_skill_registration( + adapter, monkeypatch, entries, + ) + + interaction = _make_interaction("100200300") + result = await autocomplete(interaction, "") + assert len(result) == 2 + assert {choice.value for choice in result} == {"alpha", "beta"} + + +@pytest.mark.asyncio +async def test_skill_handler_rejects_before_dispatch_for_unauthorized( + adapter, monkeypatch, +): + """The /skill handler must call _check_slash_authorization BEFORE + skill_lookup. Otherwise unknown vs known names produce divergent + responses ("Unknown skill: foo" vs auth rejection) which is a + catalog-probing oracle.""" + adapter._allowed_user_ids = {"100200300"} + entries = [("alpha", "First skill", "/alpha")] + handler, _autocomplete = _capture_skill_registration( + adapter, monkeypatch, entries, + ) + + # Patch _run_simple_slash so we can detect any leak through it. + dispatched: list = [] + + async def fake_dispatch(_interaction, text): + dispatched.append(text) + + adapter._run_simple_slash = fake_dispatch # type: ignore[assignment] + + interaction = _make_interaction("999999999") + await handler(interaction, "alpha", "") + + interaction.response.send_message.assert_awaited_once() + args, kwargs = interaction.response.send_message.call_args + assert kwargs.get("ephemeral") is True + assert "not authorized" in ( + args[0] if args else kwargs.get("content", "") + ).lower() + # Critically: nothing was dispatched, and the auth message did NOT + # mention the skill name "alpha" (no catalog leak). + assert dispatched == [] + + +@pytest.mark.asyncio +async def test_skill_handler_known_and_unknown_produce_same_rejection( + adapter, monkeypatch, +): + """An unauthorized user probing for valid skill names must see the + same rejection text regardless of whether the name they tried is + on the registered catalog.""" + adapter._allowed_user_ids = {"100200300"} + entries = [("alpha", "First skill", "/alpha")] + handler, _ = _capture_skill_registration(adapter, monkeypatch, entries) + + adapter._run_simple_slash = AsyncMock() # type: ignore[assignment] + + known_interaction = _make_interaction("999999999") + unknown_interaction = _make_interaction("999999999") + await handler(known_interaction, "alpha", "") + await handler(unknown_interaction, "definitely-not-a-skill", "") + + known_interaction.response.send_message.assert_awaited_once() + unknown_interaction.response.send_message.assert_awaited_once() + known_args, known_kwargs = known_interaction.response.send_message.call_args + unknown_args, unknown_kwargs = ( + unknown_interaction.response.send_message.call_args + ) + assert known_args == unknown_args + assert known_kwargs == unknown_kwargs + + +@pytest.mark.asyncio +async def test_skill_handler_dispatches_for_authorized( + adapter, monkeypatch, +): + """Sanity: an authorized user reaches _run_simple_slash with the + resolved cmd_key and arguments.""" + adapter._allowed_user_ids = {"100200300"} + entries = [("alpha", "First skill", "/alpha")] + handler, _ = _capture_skill_registration(adapter, monkeypatch, entries) + + dispatched: list = [] + + async def fake_dispatch(_interaction, text): + dispatched.append(text) + + adapter._run_simple_slash = fake_dispatch # type: ignore[assignment] + + interaction = _make_interaction("100200300") + await handler(interaction, "alpha", "extra args") + assert dispatched == ["/alpha extra args"] diff --git a/tests/gateway/test_discord_slash_commands.py b/tests/gateway/test_discord_slash_commands.py index 7b15a7ed0c..589e8053bc 100644 --- a/tests/gateway/test_discord_slash_commands.py +++ b/tests/gateway/test_discord_slash_commands.py @@ -107,6 +107,10 @@ def adapter(): user=SimpleNamespace(id=99999, name="HermesBot"), ) adapter._text_batch_delay_seconds = 0 # disable batching for tests + # Slash auth is exercised in test_discord_slash_auth.py — bypass it here + # so registration / dispatch / thread behavior tests don't have to + # construct a full auth context (allowlist / channel scope). + adapter._check_slash_authorization = AsyncMock(return_value=True) return adapter @@ -117,6 +121,10 @@ def adapter(): @pytest.mark.asyncio async def test_registers_native_thread_slash_command(adapter): + # The /thread slash closure now delegates ALL the work — including + # defer() — to _handle_thread_create_slash so the auth gate can send + # an ephemeral rejection on the still-unresponded interaction. The + # closure should just forward. adapter._handle_thread_create_slash = AsyncMock() adapter._register_slash_commands() @@ -127,7 +135,9 @@ async def test_registers_native_thread_slash_command(adapter): await command(interaction, name="Planning", message="", auto_archive_duration=1440) - interaction.response.defer.assert_awaited_once_with(ephemeral=True) + # defer is now performed inside _handle_thread_create_slash, AFTER the + # auth check passes — not by the closure. + interaction.response.defer.assert_not_awaited() adapter._handle_thread_create_slash.assert_awaited_once_with(interaction, "Planning", "", 1440) @@ -298,6 +308,7 @@ async def test_handle_thread_create_slash_reports_success(adapter): user=SimpleNamespace(display_name="Jezza", id=42), guild=SimpleNamespace(name="TestGuild"), followup=SimpleNamespace(send=AsyncMock()), + response=SimpleNamespace(defer=AsyncMock()), ) await adapter._handle_thread_create_slash(interaction, "Planning", "Kickoff", 1440) @@ -326,6 +337,7 @@ async def test_handle_thread_create_slash_dispatches_session_when_message_provid user=SimpleNamespace(display_name="Jezza", id=42), guild=SimpleNamespace(name="TestGuild"), followup=SimpleNamespace(send=AsyncMock()), + response=SimpleNamespace(defer=AsyncMock()), ) adapter._dispatch_thread_session = AsyncMock() @@ -348,6 +360,7 @@ async def test_handle_thread_create_slash_no_dispatch_without_message(adapter): user=SimpleNamespace(display_name="Jezza", id=42), guild=SimpleNamespace(name="TestGuild"), followup=SimpleNamespace(send=AsyncMock()), + response=SimpleNamespace(defer=AsyncMock()), ) adapter._dispatch_thread_session = AsyncMock() @@ -371,6 +384,7 @@ async def test_handle_thread_create_slash_falls_back_to_seed_message(adapter): user=SimpleNamespace(display_name="Jezza", id=42), guild=SimpleNamespace(name="TestGuild"), followup=SimpleNamespace(send=AsyncMock()), + response=SimpleNamespace(defer=AsyncMock()), ) await adapter._handle_thread_create_slash(interaction, "Planning", "Kickoff", 1440) @@ -395,6 +409,7 @@ async def test_handle_thread_create_slash_reports_failure(adapter): channel_id=123, user=SimpleNamespace(display_name="Jezza", id=42), followup=SimpleNamespace(send=AsyncMock()), + response=SimpleNamespace(defer=AsyncMock()), ) await adapter._handle_thread_create_slash(interaction, "Planning", "", 1440) From c14bf441a313cf02b82f2964c1b46e5e252a12ba Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Sun, 3 May 2026 16:13:58 +0530 Subject: [PATCH 43/61] chore: add 0xyg3n noreply email to AUTHOR_MAP --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index e06d1d2a31..9e6a8a30e6 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -505,6 +505,7 @@ AUTHOR_MAP = { "michel.belleau@malaiwah.com": "malaiwah", "gnanasekaran.sekareee@gmail.com": "gnanam1990", "jz.pentest@gmail.com": "0xyg3n", + "7093928+0xyg3n@users.noreply.github.com": "0xyg3n", "hypnosis.mda@gmail.com": "Hypn0sis", "ywt000818@gmail.com": "OwenYWT", "dhandhalyabhavik@gmail.com": "v1k22", From 6c1322b9972ce61419df7df6ad7ae5a261fef9d2 Mon Sep 17 00:00:00 2001 From: nftpoetrist Date: Sun, 3 May 2026 00:09:04 +0300 Subject: [PATCH 44/61] fix(slack): close previous handler in connect() to prevent zombie Socket Mode connections MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit SlackAdapter.connect() overwrote self._handler, self._app, and self._socket_mode_task without closing the prior AsyncSocketModeHandler first. If connect() was called a second time on the same adapter (e.g. during a gateway restart or in-process reconnect attempt), the old Socket Mode websocket stayed alive. Both the old and new connections received every Slack event and dispatched it twice — producing double responses with different wording, the same bug that affected DiscordAdapter (#18187, fixed in #18758). Fix: add a close-before-reassign guard at the start of the connection setup path, mirroring the guard DiscordAdapter.connect() already has. When self._handler is None (fresh adapter, first connect()) the block is a harmless no-op. Scoped to the handler/app fields only — no behavior change for any path that does not call connect() twice. Fixes #18980 --- gateway/platforms/slack.py | 15 ++++++++++++ tests/gateway/test_slack.py | 49 +++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) diff --git a/gateway/platforms/slack.py b/gateway/platforms/slack.py index 3208a80a6a..c8ee28859d 100644 --- a/gateway/platforms/slack.py +++ b/gateway/platforms/slack.py @@ -528,6 +528,21 @@ class SlackAdapter(BasePlatformAdapter): return False lock_acquired = True + # Close any previous handler before creating a new one so that + # calling connect() a second time (e.g. during a gateway restart or + # in-process reconnect attempt) does not leave a zombie Socket Mode + # connection alive. Both the old and new connections would otherwise + # receive every Slack event and dispatch it twice, producing double + # responses — the same bug that affected DiscordAdapter (#18187). + if self._handler is not None: + try: + await self._handler.close_async() + except Exception: + logger.debug("[%s] Failed to close previous Slack handler", self.name) + finally: + self._handler = None + self._app = None + # First token is the primary — used for AsyncApp / Socket Mode primary_token = bot_tokens[0] self._app = AsyncApp(token=primary_token) diff --git a/tests/gateway/test_slack.py b/tests/gateway/test_slack.py index 0eebf49c88..478370d8c4 100644 --- a/tests/gateway/test_slack.py +++ b/tests/gateway/test_slack.py @@ -231,6 +231,55 @@ class TestSlackConnectCleanup: mock_release.assert_called_once_with("slack-app-token", "xapp-fake") assert adapter._platform_lock_identity is None + @pytest.mark.asyncio + async def test_reconnect_closes_previous_handler_to_prevent_zombie_socket(self): + """Regression for #18980: calling connect() on an adapter that already has + a live handler (e.g. during a gateway restart) must close the old + AsyncSocketModeHandler before creating a new one. Without this guard, + the old Socket Mode websocket stays alive and both connections dispatch + every Slack event, producing double responses — the same bug that + affected DiscordAdapter (#18187). + """ + config = PlatformConfig(enabled=True, token="xoxb-fake") + adapter = SlackAdapter(config) + + # Simulate state left over from a prior connect() call. + first_handler = AsyncMock() + first_handler.close_async = AsyncMock() + adapter._handler = first_handler + + mock_app = MagicMock() + def _noop_decorator(event_type): + def decorator(fn): return fn + return decorator + mock_app.event = _noop_decorator + mock_app.command = _noop_decorator + mock_app.action = _noop_decorator + mock_app.client = AsyncMock() + + mock_web_client = AsyncMock() + mock_web_client.auth_test = AsyncMock(return_value={ + "user_id": "U_BOT", + "user": "testbot", + "team_id": "T_FAKE", + "team": "FakeTeam", + }) + + second_handler = MagicMock() + + with patch.object(_slack_mod, "AsyncApp", return_value=mock_app), \ + patch.object(_slack_mod, "AsyncWebClient", return_value=mock_web_client), \ + patch.object(_slack_mod, "AsyncSocketModeHandler", return_value=second_handler), \ + patch.dict(os.environ, {"SLACK_APP_TOKEN": "xapp-fake"}), \ + patch("gateway.status.acquire_scoped_lock", return_value=(True, None)), \ + patch("gateway.status.release_scoped_lock"), \ + patch("asyncio.create_task"): + result = await adapter.connect() + + assert result is True + first_handler.close_async.assert_awaited_once_with() + assert adapter._handler is second_handler + # --------------------------------------------------------------------------- # TestSlackProxyBehavior From 0a97ce6bff49163b74b1d76418c4b1b3f2455b76 Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Sun, 3 May 2026 16:17:09 +0530 Subject: [PATCH 45/61] chore: add nftpoetrist to AUTHOR_MAP --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index 9e6a8a30e6..7bad9dd1a1 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -506,6 +506,7 @@ AUTHOR_MAP = { "gnanasekaran.sekareee@gmail.com": "gnanam1990", "jz.pentest@gmail.com": "0xyg3n", "7093928+0xyg3n@users.noreply.github.com": "0xyg3n", + "nftpoetrist@gmail.com": "nftpoetrist", # PR #18982 "hypnosis.mda@gmail.com": "Hypn0sis", "ywt000818@gmail.com": "OwenYWT", "dhandhalyabhavik@gmail.com": "v1k22", From f1e0292517c15be09f9f1fb6a61046993b562586 Mon Sep 17 00:00:00 2001 From: millerc79 Date: Sat, 2 May 2026 19:19:24 -0500 Subject: [PATCH 46/61] fix(gateway): resume sessions after crash/restart instead of blanket suspend suspend_recently_active() was unconditionally setting suspended=True on startup, causing get_or_create_session() to wipe conversation history on every restart. Change to set resume_pending=True instead, so sessions auto-resume while still allowing stuck-loop escalation after 3 failures. --- gateway/run.py | 2 +- gateway/session.py | 29 +++++++++++++++++------------ 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/gateway/run.py b/gateway/run.py index 97f72121bb..aadb067dcb 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -2755,7 +2755,7 @@ class GatewayRunner: try: suspended = self.session_store.suspend_recently_active() if suspended: - logger.info("Suspended %d in-flight session(s) from previous run", suspended) + logger.info("Marked %d in-flight session(s) as resumable from previous run", suspended) except Exception as e: logger.warning("Session suspension on startup failed: %s", e) diff --git a/gateway/session.py b/gateway/session.py index fcff336afa..3129f7a325 100644 --- a/gateway/session.py +++ b/gateway/session.py @@ -1086,19 +1086,22 @@ class SessionStore: return len(removed_keys) def suspend_recently_active(self, max_age_seconds: int = 120) -> int: - """Mark recently-active sessions as suspended. + """Mark recently-active sessions as resumable after an unexpected exit. - Called on gateway startup to prevent sessions that were likely - in-flight when the gateway last exited from being blindly resumed - (#7536). Only suspends sessions updated within *max_age_seconds* - to avoid resetting long-idle sessions that are harmless to resume. - Returns the number of sessions that were suspended. + Called on gateway startup after a crash or fast restart to preserve + in-flight sessions instead of destroying their conversation history + (#7536). Only marks sessions updated within *max_age_seconds* to + avoid touching long-idle sessions. Sets ``resume_pending=True`` so + the next incoming message on the same session_key auto-resumes from + the existing transcript. - Entries flagged ``resume_pending=True`` are skipped — those were - marked intentionally by the drain-timeout path as recoverable. - Terminal escalation for genuinely stuck ``resume_pending`` sessions - is handled by the existing ``.restart_failure_counts`` stuck-loop - counter, which runs after this method on startup. + Entries already flagged ``resume_pending=True`` are skipped. Entries + explicitly ``suspended=True`` (from /stop or stuck-loop escalation) + are also skipped. Terminal escalation for genuinely stuck sessions + is still handled by the existing ``.restart_failure_counts`` counter + (threshold 3), which runs after this method and sets ``suspended=True``. + + Returns the number of sessions marked resumable. """ from datetime import timedelta @@ -1110,7 +1113,9 @@ class SessionStore: if entry.resume_pending: continue if not entry.suspended and entry.updated_at >= cutoff: - entry.suspended = True + entry.resume_pending = True + entry.resume_reason = "restart_interrupted" + entry.last_resume_marked_at = _now() count += 1 if count: self._save() From bf3239472ff17c1fbe8dbb7580812e5a810877ec Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Sun, 3 May 2026 16:18:47 +0530 Subject: [PATCH 47/61] chore: add millerc79 to AUTHOR_MAP --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index 7bad9dd1a1..1a451697af 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -507,6 +507,7 @@ AUTHOR_MAP = { "jz.pentest@gmail.com": "0xyg3n", "7093928+0xyg3n@users.noreply.github.com": "0xyg3n", "nftpoetrist@gmail.com": "nftpoetrist", # PR #18982 + "millerc79@users.noreply.github.com": "millerc79", # PR #19033 "hypnosis.mda@gmail.com": "Hypn0sis", "ywt000818@gmail.com": "OwenYWT", "dhandhalyabhavik@gmail.com": "v1k22", From 934103476f3199f6c7ed081641bb4e48478b821a Mon Sep 17 00:00:00 2001 From: Hermes Agent Date: Sat, 2 May 2026 17:41:47 +0000 Subject: [PATCH 48/61] fix(gateway): send /new response before cancel_session_processing to avoid race (#18912) When /new is issued while an agent is actively processing, the confirmation response was never sent to the user because cancel_session_processing() was called before _send_with_retry(). Task cancellation side effects could silently drop the response. Fix: reorder to send the response BEFORE cancelling the old task. Add logging at the send point (matching the pattern at line 2800 in _process_message_background) so future failures are visible. Closes: #18912 --- gateway/platforms/base.py | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/gateway/platforms/base.py b/gateway/platforms/base.py index ef08b05405..78e0dd7e25 100644 --- a/gateway/platforms/base.py +++ b/gateway/platforms/base.py @@ -2489,15 +2489,20 @@ class BasePlatformAdapter(ABC): try: response = await self._message_handler(event) - # Old adapter task (if any) is cancelled AFTER the runner has - # fully handled the command — keeps ordering deterministic. - await self.cancel_session_processing( - session_key, - release_guard=False, - discard_pending=False, - ) _text, _eph_ttl = self._unwrap_ephemeral(response) + # Send the response BEFORE cancelling the old task so the send + # cannot be affected by task-cancellation side effects (race + # condition fix — issue #18912). Previously the send happened + # after cancel_session_processing, which could silently drop the + # "/new" confirmation when an agent was actively running. if _text: + logger.info( + "[%s] Sending command '/%s' response (%d chars) to %s", + self.name, + cmd, + len(_text), + event.source.chat_id, + ) _r = await self._send_with_retry( chat_id=event.source.chat_id, content=_text, @@ -2510,6 +2515,13 @@ class BasePlatformAdapter(ABC): message_id=_r.message_id, ttl_seconds=_eph_ttl, ) + # Old adapter task (if any) is cancelled AFTER the response has + # been sent — keeps ordering deterministic and avoids the race. + await self.cancel_session_processing( + session_key, + release_guard=False, + discard_pending=False, + ) except Exception: # On failure, restore the original guard if one still exists so # we don't leave the session in a half-reset state. From 7a22c639dc840aecd317312d7c267596d9ac6adb Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Sun, 3 May 2026 16:19:37 +0530 Subject: [PATCH 49/61] chore: add shellybotmoyer to AUTHOR_MAP --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index 1a451697af..7c84fb9c03 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -508,6 +508,7 @@ AUTHOR_MAP = { "7093928+0xyg3n@users.noreply.github.com": "0xyg3n", "nftpoetrist@gmail.com": "nftpoetrist", # PR #18982 "millerc79@users.noreply.github.com": "millerc79", # PR #19033 + "hermes@example.com": "shellybotmoyer", # PR #18915 (bot-committed) "hypnosis.mda@gmail.com": "Hypn0sis", "ywt000818@gmail.com": "OwenYWT", "dhandhalyabhavik@gmail.com": "v1k22", From 1148c462417369640fc0a821d1879d0c9426ed30 Mon Sep 17 00:00:00 2001 From: charliekerfoot Date: Sat, 2 May 2026 16:19:13 -0500 Subject: [PATCH 50/61] fix(gateway): correct ws scheme conversion for https urls --- gateway/platforms/homeassistant.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gateway/platforms/homeassistant.py b/gateway/platforms/homeassistant.py index 746465594c..6bc9ae6eb6 100644 --- a/gateway/platforms/homeassistant.py +++ b/gateway/platforms/homeassistant.py @@ -139,7 +139,7 @@ class HomeAssistantAdapter(BasePlatformAdapter): async def _ws_connect(self) -> bool: """Establish WebSocket connection and authenticate.""" - ws_url = self._hass_url.replace("http://", "ws://").replace("https://", "wss://") + ws_url = self._hass_url.replace("https://", "wss://").replace("http://", "ws://") ws_url = f"{ws_url}/api/websocket" self._session = aiohttp.ClientSession( From 6f2dab248a6cc8591af46e5deb2dc939c2b43146 Mon Sep 17 00:00:00 2001 From: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Date: Sun, 3 May 2026 16:22:57 +0530 Subject: [PATCH 51/61] fix: update tests for resume_pending semantics + add AUTHOR_MAP entries Tests updated to reflect suspend_recently_active now setting resume_pending=True (preserves session) instead of suspended=True (wipes session history). AUTHOR_MAP entries: millerc79 (#19033), shellybotmoyer (#18915) --- tests/gateway/test_clean_shutdown_marker.py | 22 +++++++++++--------- tests/gateway/test_restart_resume_pending.py | 8 ++++--- 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/tests/gateway/test_clean_shutdown_marker.py b/tests/gateway/test_clean_shutdown_marker.py index 1a476bc49a..c6d3cab5c1 100644 --- a/tests/gateway/test_clean_shutdown_marker.py +++ b/tests/gateway/test_clean_shutdown_marker.py @@ -49,9 +49,10 @@ class TestSuspendRecentlyActive: count = store.suspend_recently_active() assert count == 1 - # Re-fetch — should be suspended now + # Re-fetch — should be resume_pending (preserved, not wiped) refreshed = store.get_or_create_session(source) - assert refreshed.was_auto_reset + assert refreshed.resume_pending + assert refreshed.session_id == entry.session_id # same session preserved def test_does_not_suspend_old_sessions(self, tmp_path): store = _make_store(tmp_path) @@ -66,21 +67,22 @@ class TestSuspendRecentlyActive: count = store.suspend_recently_active(max_age_seconds=120) assert count == 0 - def test_already_suspended_not_double_counted(self, tmp_path): + def test_already_resume_pending_not_double_counted(self, tmp_path): store = _make_store(tmp_path) source = _make_source() entry = store.get_or_create_session(source) - # Suspend once + # Mark resume_pending once count1 = store.suspend_recently_active() assert count1 == 1 - # Create a new session (the old one got reset on next access) + # Re-fetch returns the SAME session (preserved, not reset) entry2 = store.get_or_create_session(source) + assert entry2.session_id == entry.session_id - # Suspend again — the new session is recent but not yet suspended + # Second call skips already-resume_pending entries count2 = store.suspend_recently_active() - assert count2 == 1 + assert count2 == 0 # --------------------------------------------------------------------------- @@ -180,11 +182,11 @@ class TestCleanShutdownMarker: else: store.suspend_recently_active() - # Session SHOULD be suspended (crash recovery) + # Session SHOULD be resume_pending (crash recovery preserves history) with store._lock: store._ensure_loaded_locked() - suspended_count = sum(1 for e in store._entries.values() if e.suspended) - assert suspended_count == 1, "Session should be suspended after crash (no marker)" + resume_count = sum(1 for e in store._entries.values() if e.resume_pending) + assert resume_count == 1, "Session should be resume_pending after crash (no marker)" def test_marker_written_on_restart_stop(self, tmp_path, monkeypatch): """stop(restart=True) should also write the marker.""" diff --git a/tests/gateway/test_restart_resume_pending.py b/tests/gateway/test_restart_resume_pending.py index 77c639d05f..bda6c7a412 100644 --- a/tests/gateway/test_restart_resume_pending.py +++ b/tests/gateway/test_restart_resume_pending.py @@ -376,8 +376,8 @@ class TestSuspendRecentlyActiveSkipsResumePending: assert e.suspended is False assert e.resume_pending is True - def test_non_resume_pending_still_suspended(self, tmp_path): - """Non-resume sessions still get the old crash-recovery suspension.""" + def test_non_resume_pending_gets_resume_pending(self, tmp_path): + """Non-resume sessions are now marked resume_pending (not suspended).""" store = _make_store(tmp_path) source_a = _make_source(chat_id="a") source_b = _make_source(chat_id="b") @@ -386,9 +386,11 @@ class TestSuspendRecentlyActiveSkipsResumePending: store.mark_resume_pending(entry_a.session_key) count = store.suspend_recently_active() + # entry_a is already resume_pending → skipped. entry_b gets marked. assert count == 1 assert store._entries[entry_a.session_key].suspended is False - assert store._entries[entry_b.session_key].suspended is True + assert store._entries[entry_b.session_key].resume_pending is True + assert store._entries[entry_b.session_key].suspended is False # --------------------------------------------------------------------------- From 55647a581349d245b903621ab1ccbd55c4a7ede2 Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sun, 3 May 2026 05:22:30 -0700 Subject: [PATCH 52/61] fix(whatsapp): pin protobufjs >=7.5.5 via npm overrides to clear 3 critical vulns (#19204) The whatsapp-bridge pulls @whiskeysockets/baileys at a pinned git commit whose transitive dep tree ships protobufjs <7.5.5, triggering GHSA-xq3m-2v4x-88gg (critical, arbitrary code execution). npm audit reported 3 cascading criticals: protobufjs, @whiskeysockets/libsignal-node (pulls protobufjs), and baileys itself (effect rollup). Fix: add npm overrides block pinning protobufjs to ^7.5.5. Deduplicates to a single 7.5.6 copy at node_modules/protobufjs that both libsignal-node and any other consumers resolve through normal module resolution. Why not bump baileys: npm-published baileys@6.17.16 is deprecated by the maintainers (wrong version), 7.0.0-rc.* still pulls the same vulnerable libsignal-node, and upstream Baileys HEAD adds a 4th vuln (music-metadata). The override is the minimal, behavior-preserving fix. Validation: - npm audit: 3 critical -> 0 vulnerabilities - node -e "import('@whiskeysockets/baileys')" -> all 5 named exports (makeWASocket, useMultiFileAuthState, DisconnectReason, fetchLatestBaileysVersion, downloadMediaMessage) resolve - node bridge.js loads all modules and reaches Express bind (exits only on EADDRINUSE because the live gateway owns :3000) - Single deduped protobufjs@7.5.6 in the tree --- scripts/whatsapp-bridge/package-lock.json | 216 ++++++++++------------ scripts/whatsapp-bridge/package.json | 3 + 2 files changed, 100 insertions(+), 119 deletions(-) diff --git a/scripts/whatsapp-bridge/package-lock.json b/scripts/whatsapp-bridge/package-lock.json index 2698a28728..b662982cf5 100644 --- a/scripts/whatsapp-bridge/package-lock.json +++ b/scripts/whatsapp-bridge/package-lock.json @@ -25,15 +25,15 @@ } }, "node_modules/@cacheable/memory": { - "version": "2.0.7", - "resolved": "https://registry.npmjs.org/@cacheable/memory/-/memory-2.0.7.tgz", - "integrity": "sha512-RbxnxAMf89Tp1dLhXMS7ceft/PGsDl1Ip7T20z5nZ+pwIAsQ1p2izPjVG69oCLv/jfQ7HDPHTWK0c9rcAWXN3A==", + "version": "2.0.8", + "resolved": "https://registry.npmjs.org/@cacheable/memory/-/memory-2.0.8.tgz", + "integrity": "sha512-FvEb29x5wVwu/Kf93IWwsOOEuhHh6dYCJF3vcKLzXc0KXIW181AOzv6ceT4ZpBHDvAfG60eqb+ekmrnLHIy+jw==", "license": "MIT", "dependencies": { - "@cacheable/utils": "^2.3.3", - "@keyv/bigmap": "^1.3.0", - "hookified": "^1.14.0", - "keyv": "^5.5.5" + "@cacheable/utils": "^2.4.0", + "@keyv/bigmap": "^1.3.1", + "hookified": "^1.15.1", + "keyv": "^5.6.0" } }, "node_modules/@cacheable/node-cache": { @@ -51,19 +51,19 @@ } }, "node_modules/@cacheable/utils": { - "version": "2.3.4", - "resolved": "https://registry.npmjs.org/@cacheable/utils/-/utils-2.3.4.tgz", - "integrity": "sha512-knwKUJEYgIfwShABS1BX6JyJJTglAFcEU7EXqzTdiGCXur4voqkiJkdgZIQtWNFhynzDWERcTYv/sETMu3uJWA==", + "version": "2.4.1", + "resolved": "https://registry.npmjs.org/@cacheable/utils/-/utils-2.4.1.tgz", + "integrity": "sha512-eiFgzCbIneyMlLOmNG4g9xzF7Hv3Mga4LjxjcSC/ues6VYq2+gUbQI8JqNuw/ZM8tJIeIaBGpswAsqV2V7ApgA==", "license": "MIT", "dependencies": { - "hashery": "^1.3.0", + "hashery": "^1.5.1", "keyv": "^5.6.0" } }, "node_modules/@emnapi/runtime": { - "version": "1.8.1", - "resolved": "https://registry.npmjs.org/@emnapi/runtime/-/runtime-1.8.1.tgz", - "integrity": "sha512-mehfKSMWjjNol8659Z8KxEMrdSJDDot5SXMq00dM8BN4o+CLNXQ0xH2V7EchNHV4RmbZLmmPdEaXZc5H2FXmDg==", + "version": "1.10.0", + "resolved": "https://registry.npmjs.org/@emnapi/runtime/-/runtime-1.10.0.tgz", + "integrity": "sha512-ewvYlk86xUoGI0zQRNq/mC+16R1QeDlKQy21Ki3oSYXNgLb45GV1P6A0M+/s6nyCuNDqe5VpaY84BzXGwVbwFA==", "license": "MIT", "optional": true, "peer": true, @@ -87,9 +87,9 @@ "license": "BSD-3-Clause" }, "node_modules/@img/colour": { - "version": "1.0.0", - "resolved": "https://registry.npmjs.org/@img/colour/-/colour-1.0.0.tgz", - "integrity": "sha512-A5P/LfWGFSl6nsckYtjw9da+19jB8hkJ6ACTGcDfEJ0aE+l2n2El7dsVM7UVHZQ9s2lmYMWlrS21YLy2IR1LUw==", + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/@img/colour/-/colour-1.1.0.tgz", + "integrity": "sha512-Td76q7j57o/tLVdgS746cYARfSyxk8iEfRxewL9h4OMzYhbW4TAcppl0mT4eyqXddh6L/jwoM75mo7ixa/pCeQ==", "license": "MIT", "peer": true, "engines": { @@ -617,9 +617,9 @@ "license": "BSD-3-Clause" }, "node_modules/@protobufjs/codegen": { - "version": "2.0.4", - "resolved": "https://registry.npmjs.org/@protobufjs/codegen/-/codegen-2.0.4.tgz", - "integrity": "sha512-YyFaikqM5sH0ziFZCN3xDC7zeGaB/d0IUb9CATugHWbd1FRFwWwt4ld4OYMPWu5a3Xe01mGAULCdqhMlPl29Jg==", + "version": "2.0.5", + "resolved": "https://registry.npmjs.org/@protobufjs/codegen/-/codegen-2.0.5.tgz", + "integrity": "sha512-zgXFLzW3Ap33e6d0Wlj4MGIm6Ce8O89n/apUaGNB/jx+hw+ruWEp7EwGUshdLKVRCxZW12fp9r40E1mQrf/34g==", "license": "BSD-3-Clause" }, "node_modules/@protobufjs/eventemitter": { @@ -645,9 +645,9 @@ "license": "BSD-3-Clause" }, "node_modules/@protobufjs/inquire": { - "version": "1.1.0", - "resolved": "https://registry.npmjs.org/@protobufjs/inquire/-/inquire-1.1.0.tgz", - "integrity": "sha512-kdSefcPdruJiFMVSbn801t4vFK7KB/5gd2fYvrxhuJYg8ILrmn9SKSX2tZdV6V+ksulWqS7aXjBcRXl3wHoD9Q==", + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@protobufjs/inquire/-/inquire-1.1.1.tgz", + "integrity": "sha512-mnzgDV26ueAvk7rsbt9L7bE0SuAoqyuys/sMMrmVcN5x9VsxpcG3rqAUSgDyLp0UZlmNfIbQ4fHfCtreVBk8Ew==", "license": "BSD-3-Clause" }, "node_modules/@protobufjs/path": { @@ -663,9 +663,9 @@ "license": "BSD-3-Clause" }, "node_modules/@protobufjs/utf8": { - "version": "1.1.0", - "resolved": "https://registry.npmjs.org/@protobufjs/utf8/-/utf8-1.1.0.tgz", - "integrity": "sha512-Vvn3zZrhQZkkBE8LSuW3em98c0FwgO4nxzv6OdSxPKJIEKY2bGbHn+mhGIPerzI4twdxaP8/0+06HBpwf345Lw==", + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/@protobufjs/utf8/-/utf8-1.1.1.tgz", + "integrity": "sha512-oOAWABowe8EAbMyWKM0tYDKi8Yaox52D+HWZhAIJqQXbqe0xI/GV7FhLWqlEKreMkfDjshR5FKgi3mnle0h6Eg==", "license": "BSD-3-Clause" }, "node_modules/@tokenizer/inflate": { @@ -714,25 +714,20 @@ "integrity": "sha512-OvjF+z51L3ov0OyAU0duzsYuvO01PH7x4t6DJx+guahgTnBHkhJdG7soQeTSFLWN3efnHyibZ4Z8l2EuWwJN3A==", "license": "MIT" }, - "node_modules/@types/long": { - "version": "4.0.2", - "resolved": "https://registry.npmjs.org/@types/long/-/long-4.0.2.tgz", - "integrity": "sha512-MqTGEo5bj5t157U6fA/BiDynNkn0YknVdh48CMPkTSpFTVmvao5UQmm7uEF6xBEo7qIMAlY/JSleYaE6VOdpaA==", - "license": "MIT" - }, "node_modules/@types/node": { - "version": "25.3.1", - "resolved": "https://registry.npmjs.org/@types/node/-/node-25.3.1.tgz", - "integrity": "sha512-hj9YIJimBCipHVfHKRMnvmHg+wfhKc0o4mTtXh9pKBjC8TLJzz0nzGmLi5UJsYAUgSvXFHgb0V2oY10DUFtImw==", + "version": "25.6.0", + "resolved": "https://registry.npmjs.org/@types/node/-/node-25.6.0.tgz", + "integrity": "sha512-+qIYRKdNYJwY3vRCZMdJbPLJAtGjQBudzZzdzwQYkEPQd+PJGixUL5QfvCLDaULoLv+RhT3LDkwEfKaAkgSmNQ==", "license": "MIT", "dependencies": { - "undici-types": "~7.18.0" + "undici-types": "~7.19.0" } }, "node_modules/@whiskeysockets/baileys": { "name": "baileys", "version": "7.0.0-rc.9", "resolved": "git+ssh://git@github.com/WhiskeySockets/Baileys.git#01047debd81beb20da7b7779b08edcb06aa03770", + "integrity": "sha512-letWyB96JHD6NdqpAiseOfaUBi13u8AhiRcKSRqcVjc5Vw5xoPTZGvVnw8K/NvGBFAvyLJkwim9Mjvwzhx/SlA==", "hasInstallScript": true, "license": "MIT", "dependencies": { @@ -807,9 +802,9 @@ } }, "node_modules/body-parser": { - "version": "1.20.4", - "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-1.20.4.tgz", - "integrity": "sha512-ZTgYYLMOXY9qKU/57FAo8F+HA2dGX7bqGc71txDRC1rS4frdFI5R7NhluHxH6M0YItAP0sHB4uqAOcYKxO6uGA==", + "version": "1.20.5", + "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-1.20.5.tgz", + "integrity": "sha512-3grm+/2tUOvu2cjJkvsIxrv/wVpfXQW4PsQHYm7yk4vfpu7Ekl6nEsYBoJUL6qDwZUx8wUhQ8tR2qz+ad9c9OA==", "license": "MIT", "dependencies": { "bytes": "~3.1.2", @@ -820,7 +815,7 @@ "http-errors": "~2.0.1", "iconv-lite": "~0.4.24", "on-finished": "~2.4.1", - "qs": "~6.14.0", + "qs": "~6.15.1", "raw-body": "~2.5.3", "type-is": "~1.6.18", "unpipe": "~1.0.0" @@ -830,6 +825,21 @@ "npm": "1.2.8000 || >= 1.4.16" } }, + "node_modules/body-parser/node_modules/qs": { + "version": "6.15.1", + "resolved": "https://registry.npmjs.org/qs/-/qs-6.15.1.tgz", + "integrity": "sha512-6YHEFRL9mfgcAvql/XhwTvf5jKcOiiupt2FiJxHkiX1z4j7WL8J/jRHYLluORvc1XxB5rV20KoeK00gVJamspg==", + "license": "BSD-3-Clause", + "dependencies": { + "side-channel": "^1.1.0" + }, + "engines": { + "node": ">=0.6" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, "node_modules/bytes": { "version": "3.1.2", "resolved": "https://registry.npmjs.org/bytes/-/bytes-3.1.2.tgz", @@ -840,16 +850,16 @@ } }, "node_modules/cacheable": { - "version": "2.3.2", - "resolved": "https://registry.npmjs.org/cacheable/-/cacheable-2.3.2.tgz", - "integrity": "sha512-w+ZuRNmex9c1TR9RcsxbfTKCjSL0rh1WA5SABbrWprIHeNBdmyQLSYonlDy9gpD+63XT8DgZ/wNh1Smvc9WnJA==", + "version": "2.3.4", + "resolved": "https://registry.npmjs.org/cacheable/-/cacheable-2.3.4.tgz", + "integrity": "sha512-djgxybDbw9fL/ZWMI3+CE8ZilNxcwFkVtDc1gJ+IlOSSWkSMPQabhV/XCHTQ6pwwN6aivXPZ43omTooZiX06Ew==", "license": "MIT", "dependencies": { - "@cacheable/memory": "^2.0.7", - "@cacheable/utils": "^2.3.3", + "@cacheable/memory": "^2.0.8", + "@cacheable/utils": "^2.4.0", "hookified": "^1.15.0", - "keyv": "^5.5.5", - "qified": "^0.6.0" + "keyv": "^5.6.0", + "qified": "^0.9.0" } }, "node_modules/call-bind-apply-helpers": { @@ -1212,21 +1222,21 @@ } }, "node_modules/hashery": { - "version": "1.5.0", - "resolved": "https://registry.npmjs.org/hashery/-/hashery-1.5.0.tgz", - "integrity": "sha512-nhQ6ExaOIqti2FDWoEMWARUqIKyjr2VcZzXShrI+A3zpeiuPWzx6iPftt44LhP74E5sW36B75N6VHbvRtpvO6Q==", + "version": "1.5.1", + "resolved": "https://registry.npmjs.org/hashery/-/hashery-1.5.1.tgz", + "integrity": "sha512-iZyKG96/JwPz1N55vj2Ie2vXbhu440zfUfJvSwEqEbeLluk7NnapfGqa7LH0mOsnDxTF85Mx8/dyR6HfqcbmbQ==", "license": "MIT", "dependencies": { - "hookified": "^1.14.0" + "hookified": "^1.15.0" }, "engines": { "node": ">=20" } }, "node_modules/hasown": { - "version": "2.0.2", - "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz", - "integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==", + "version": "2.0.3", + "resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.3.tgz", + "integrity": "sha512-ej4AhfhfL2Q2zpMmLo7U1Uv9+PyhIZpgQLGT1F9miIGmiCJIoCgSmczFdrc97mWT4kVY72KA+WnnhJ5pghSvSg==", "license": "MIT", "dependencies": { "function-bind": "^1.1.2" @@ -1327,44 +1337,6 @@ "protobufjs": "6.8.8" } }, - "node_modules/libsignal/node_modules/@types/node": { - "version": "10.17.60", - "resolved": "https://registry.npmjs.org/@types/node/-/node-10.17.60.tgz", - "integrity": "sha512-F0KIgDJfy2nA3zMLmWGKxcH2ZVEtCZXHHdOQs2gSaQ27+lNeEfGxzkIw90aXswATX7AZ33tahPbzy6KAfUreVw==", - "license": "MIT" - }, - "node_modules/libsignal/node_modules/long": { - "version": "4.0.0", - "resolved": "https://registry.npmjs.org/long/-/long-4.0.0.tgz", - "integrity": "sha512-XsP+KhQif4bjX1kbuSiySJFNAehNxgLb6hPRGJ9QsUr8ajHkuXGdrHmFUTUUXhDwVX2R5bY4JNZEwbUiMhV+MA==", - "license": "Apache-2.0" - }, - "node_modules/libsignal/node_modules/protobufjs": { - "version": "6.8.8", - "resolved": "https://registry.npmjs.org/protobufjs/-/protobufjs-6.8.8.tgz", - "integrity": "sha512-AAmHtD5pXgZfi7GMpllpO3q1Xw1OYldr+dMUlAnffGTAhqkg72WdmSY71uKBF/JuyiKs8psYbtKrhi0ASCD8qw==", - "hasInstallScript": true, - "license": "BSD-3-Clause", - "dependencies": { - "@protobufjs/aspromise": "^1.1.2", - "@protobufjs/base64": "^1.1.2", - "@protobufjs/codegen": "^2.0.4", - "@protobufjs/eventemitter": "^1.1.0", - "@protobufjs/fetch": "^1.1.0", - "@protobufjs/float": "^1.0.2", - "@protobufjs/inquire": "^1.1.0", - "@protobufjs/path": "^1.1.2", - "@protobufjs/pool": "^1.1.0", - "@protobufjs/utf8": "^1.1.0", - "@types/long": "^4.0.0", - "@types/node": "^10.1.0", - "long": "^4.0.0" - }, - "bin": { - "pbjs": "bin/pbjs", - "pbts": "bin/pbts" - } - }, "node_modules/long": { "version": "5.3.2", "resolved": "https://registry.npmjs.org/long/-/long-5.3.2.tgz", @@ -1372,9 +1344,9 @@ "license": "Apache-2.0" }, "node_modules/lru-cache": { - "version": "11.2.6", - "resolved": "https://registry.npmjs.org/lru-cache/-/lru-cache-11.2.6.tgz", - "integrity": "sha512-ESL2CrkS/2wTPfuend7Zhkzo2u0daGJ/A2VucJOgQ/C48S/zB8MMeMHSGKYpXhIjbPxfuezITkaBH1wqv00DDQ==", + "version": "11.3.5", + "resolved": "https://registry.npmjs.org/lru-cache/-/lru-cache-11.3.5.tgz", + "integrity": "sha512-NxVFwLAnrd9i7KUBxC4DrUhmgjzOs+1Qm50D3oF1/oL+r1NpZ4gA7xvG0/zJ8evR7zIKn4vLf7qTNduWFtCrRw==", "license": "BlueOak-1.0.0", "engines": { "node": "20 || >=22" @@ -1552,12 +1524,12 @@ } }, "node_modules/p-queue": { - "version": "9.1.0", - "resolved": "https://registry.npmjs.org/p-queue/-/p-queue-9.1.0.tgz", - "integrity": "sha512-O/ZPaXuQV29uSLbxWBGGZO1mCQXV2BLIwUr59JUU9SoH76mnYvtms7aafH/isNSNGwuEfP6W/4xD0/TJXxrizw==", + "version": "9.2.0", + "resolved": "https://registry.npmjs.org/p-queue/-/p-queue-9.2.0.tgz", + "integrity": "sha512-dWgLE8AH0HjQ9fe74pUkKkvzzYT18Inp4zra3lKHnnwqGvcfcUBrvF2EAVX+envufDNBOzpPq/IBUONDbI7+3g==", "license": "MIT", "dependencies": { - "eventemitter3": "^5.0.1", + "eventemitter3": "^5.0.4", "p-timeout": "^7.0.0" }, "engines": { @@ -1648,22 +1620,22 @@ "license": "MIT" }, "node_modules/protobufjs": { - "version": "7.5.4", - "resolved": "https://registry.npmjs.org/protobufjs/-/protobufjs-7.5.4.tgz", - "integrity": "sha512-CvexbZtbov6jW2eXAvLukXjXUW1TzFaivC46BpWc/3BpcCysb5Vffu+B3XHMm8lVEuy2Mm4XGex8hBSg1yapPg==", + "version": "7.5.6", + "resolved": "https://registry.npmjs.org/protobufjs/-/protobufjs-7.5.6.tgz", + "integrity": "sha512-M71sTMB146U3u0di3yup8iM+zv8yPRNQVr1KK4tyBitl3qFvEGucq/rGDRShD2rsJhtN02RJaJ7j5X5hmy8SJg==", "hasInstallScript": true, "license": "BSD-3-Clause", "dependencies": { "@protobufjs/aspromise": "^1.1.2", "@protobufjs/base64": "^1.1.2", - "@protobufjs/codegen": "^2.0.4", + "@protobufjs/codegen": "^2.0.5", "@protobufjs/eventemitter": "^1.1.0", "@protobufjs/fetch": "^1.1.0", "@protobufjs/float": "^1.0.2", - "@protobufjs/inquire": "^1.1.0", + "@protobufjs/inquire": "^1.1.1", "@protobufjs/path": "^1.1.2", "@protobufjs/pool": "^1.1.0", - "@protobufjs/utf8": "^1.1.0", + "@protobufjs/utf8": "^1.1.1", "@types/node": ">=13.7.0", "long": "^5.0.0" }, @@ -1685,17 +1657,23 @@ } }, "node_modules/qified": { - "version": "0.6.0", - "resolved": "https://registry.npmjs.org/qified/-/qified-0.6.0.tgz", - "integrity": "sha512-tsSGN1x3h569ZSU1u6diwhltLyfUWDp3YbFHedapTmpBl0B3P6U3+Qptg7xu+v+1io1EwhdPyyRHYbEw0KN2FA==", + "version": "0.9.1", + "resolved": "https://registry.npmjs.org/qified/-/qified-0.9.1.tgz", + "integrity": "sha512-n7mar4T0xQ+39dE2vGTAlbxUEpndwPANH0kDef1/MYsB8Bba9wshkybIRx74qgcvKQPEWErf9AqAdYjhzY2Ilg==", "license": "MIT", "dependencies": { - "hookified": "^1.14.0" + "hookified": "^2.1.1" }, "engines": { "node": ">=20" } }, + "node_modules/qified/node_modules/hookified": { + "version": "2.2.0", + "resolved": "https://registry.npmjs.org/hookified/-/hookified-2.2.0.tgz", + "integrity": "sha512-p/LgFzRN5FeoD3DLS6bkUapeye6E4SI6yJs6KetENd18S+FBthqYq2amJUWpt5z0EQwwHemidjY5OqJGEKm5uA==", + "license": "MIT" + }, "node_modules/qrcode-terminal": { "version": "0.12.0", "resolved": "https://registry.npmjs.org/qrcode-terminal/-/qrcode-terminal-0.12.0.tgz", @@ -1922,13 +1900,13 @@ } }, "node_modules/side-channel-list": { - "version": "1.0.0", - "resolved": "https://registry.npmjs.org/side-channel-list/-/side-channel-list-1.0.0.tgz", - "integrity": "sha512-FCLHtRD/gnpCiCHEiJLOwdmFP+wzCmDEkc9y7NsYxeF4u7Btsn1ZuwgwJGxImImHicJArLP4R0yX4c2KCrMrTA==", + "version": "1.0.1", + "resolved": "https://registry.npmjs.org/side-channel-list/-/side-channel-list-1.0.1.tgz", + "integrity": "sha512-mjn/0bi/oUURjc5Xl7IaWi/OJJJumuoJFQJfDDyO46+hBWsfaVM65TBHq2eoZBhzl9EchxOijpkbRC8SVBQU0w==", "license": "MIT", "dependencies": { "es-errors": "^1.3.0", - "object-inspect": "^1.13.3" + "object-inspect": "^1.13.4" }, "engines": { "node": ">= 0.4" @@ -2094,9 +2072,9 @@ } }, "node_modules/undici-types": { - "version": "7.18.2", - "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.18.2.tgz", - "integrity": "sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w==", + "version": "7.19.2", + "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.19.2.tgz", + "integrity": "sha512-qYVnV5OEm2AW8cJMCpdV20CDyaN3g0AjDlOGf1OW4iaDEx8MwdtChUp4zu4H0VP3nDRF/8RKWH+IPp9uW0YGZg==", "license": "MIT" }, "node_modules/unpipe": { @@ -2139,9 +2117,9 @@ "license": "MIT" }, "node_modules/ws": { - "version": "8.19.0", - "resolved": "https://registry.npmjs.org/ws/-/ws-8.19.0.tgz", - "integrity": "sha512-blAT2mjOEIi0ZzruJfIhb3nps74PRWTCz1IjglWEEpQl5XS/UNama6u2/rjFkDDouqr4L67ry+1aGIALViWjDg==", + "version": "8.20.0", + "resolved": "https://registry.npmjs.org/ws/-/ws-8.20.0.tgz", + "integrity": "sha512-sAt8BhgNbzCtgGbt2OxmpuryO63ZoDk/sqaB/znQm94T4fCEsy/yV+7CdC1kJhOU9lboAEU7R3kquuycDoibVA==", "license": "MIT", "engines": { "node": ">=10.0.0" diff --git a/scripts/whatsapp-bridge/package.json b/scripts/whatsapp-bridge/package.json index cb2f6b22ed..d1c3ac113a 100644 --- a/scripts/whatsapp-bridge/package.json +++ b/scripts/whatsapp-bridge/package.json @@ -12,5 +12,8 @@ "express": "^4.21.0", "qrcode-terminal": "^0.12.0", "pino": "^9.0.0" + }, + "overrides": { + "protobufjs": "^7.5.5" } } From d87fd9f03958995ce8234ec359a13fceabbf9ebf Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Sun, 3 May 2026 05:49:12 -0700 Subject: [PATCH 53/61] fix(goals): make /goal work in TUI and fix gateway verdict delivery (#19209) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit /goal was silently broken outside the classic CLI. TUI: /goal was routed through the HermesCLI slash-worker subprocess, which set the goal row in SessionDB but then called _pending_input.put(state.goal) — the subprocess has no reader for that queue, so the kickoff message was discarded. No post-turn judge was wired into prompt.submit either, so even a manual kickoff would not continue the goal loop. Intercept /goal in command.dispatch instead, drive GoalManager directly, and return {type: send, notice, message} so the TUI client renders the Goal-set notice and fires the kickoff. Run the judge in _run_prompt_submit after message.complete, surface the verdict via status.update {kind: goal}, and chain the continuation turn after the running guard is released. Gateway: _post_turn_goal_continuation was gated on hasattr(adapter, 'send_message'), but adapters only expose send(). That branch was dead on every platform — users never saw '✓ Goal achieved', 'Continuing toward goal', or budget-exhausted messages. Replace the dead call with adapter.send(chat_id, content, metadata) and drop a broken reference to self._loop. Tests: - tests/tui_gateway/test_goal_command.py — full /goal dispatch matrix (set / status / pause / resume / clear / stop / done / whitespace) plus regressions for slash.exec → 4018 and 'goal' staying in _PENDING_INPUT_COMMANDS. - tests/gateway/test_goal_verdict_send.py — locks in the adapter.send path for done / continue / budget-exhausted and verifies the hook no-ops when no goal is set or the adapter lacks send(). --- gateway/run.py | 29 ++- tests/gateway/test_goal_verdict_send.py | 217 ++++++++++++++++++++ tests/tui_gateway/test_goal_command.py | 196 ++++++++++++++++++ tui_gateway/server.py | 147 +++++++++++++ ui-tui/src/app/createGatewayEventHandler.ts | 5 + ui-tui/src/app/createSlashHandler.ts | 3 + ui-tui/src/gatewayTypes.ts | 2 +- ui-tui/src/lib/rpc.ts | 6 +- 8 files changed, 593 insertions(+), 12 deletions(-) create mode 100644 tests/gateway/test_goal_verdict_send.py create mode 100644 tests/tui_gateway/test_goal_command.py diff --git a/gateway/run.py b/gateway/run.py index aadb067dcb..86076bf0bf 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -7887,24 +7887,33 @@ class GatewayRunner: msg = decision.get("message") or "" # Send the status line back to the user so they see the judge's - # verdict. Fire-and-forget via the adapter. + # verdict. Fire-and-forget via the adapter's ``send()`` method — + # adapters expose ``send(chat_id, content, reply_to, metadata)``, + # not a ``send_message(source, msg)`` wrapper, so an earlier + # ``hasattr(adapter, "send_message")`` gate here was dead code and + # users never saw ``✓ Goal achieved`` / ``⏸ budget exhausted`` + # verdicts. if msg and source is not None: try: adapter = self.adapters.get(source.platform) - if adapter and hasattr(adapter, "send_message"): + if adapter is not None and hasattr(adapter, "send"): import asyncio as _asyncio - coro = adapter.send_message(source, msg) + thread_meta = ( + {"thread_id": source.thread_id} if source.thread_id else None + ) + coro = adapter.send( + chat_id=source.chat_id, + content=msg, + metadata=thread_meta, + ) if _asyncio.iscoroutine(coro): try: - loop = _asyncio.get_event_loop() - if loop.is_running(): - loop.create_task(coro) - else: - loop.run_until_complete(coro) + loop = _asyncio.get_running_loop() + loop.create_task(coro) except RuntimeError: - # No event loop in this thread — schedule on the main one. + # No running loop in this thread — best effort. try: - _asyncio.run_coroutine_threadsafe(coro, self._loop) + _asyncio.run(coro) except Exception: pass except Exception as exc: diff --git a/tests/gateway/test_goal_verdict_send.py b/tests/gateway/test_goal_verdict_send.py new file mode 100644 index 0000000000..bb66851608 --- /dev/null +++ b/tests/gateway/test_goal_verdict_send.py @@ -0,0 +1,217 @@ +"""Tests for gateway /goal verdict-message delivery. + +The judge verdict message ("✓ Goal achieved", "⏸ budget exhausted", etc.) +must reach the user after each turn. Before this fix the code checked +``hasattr(adapter, "send_message")`` — but adapters expose ``send()``, +never ``send_message``, so the check always evaluated False and users +never saw verdicts. This test locks in the fix. +""" + +from __future__ import annotations + +import asyncio +from datetime import datetime +from pathlib import Path +from unittest.mock import MagicMock, patch + +import pytest + +from gateway.config import GatewayConfig, Platform, PlatformConfig +from gateway.session import SessionEntry, SessionSource, build_session_key + + +@pytest.fixture() +def hermes_home(tmp_path, monkeypatch): + home = tmp_path / ".hermes" + home.mkdir() + monkeypatch.setattr(Path, "home", lambda: tmp_path) + monkeypatch.setenv("HERMES_HOME", str(home)) + + from hermes_cli import goals + + goals._DB_CACHE.clear() + yield home + goals._DB_CACHE.clear() + + +def _make_source() -> SessionSource: + return SessionSource( + platform=Platform.TELEGRAM, + user_id="u1", + chat_id="c1", + user_name="tester", + chat_type="dm", + ) + + +class _RecordingAdapter: + """Minimal adapter that records send() invocations.""" + + def __init__(self) -> None: + self._pending_messages: dict = {} + self.sends: list[dict] = [] + + async def send(self, chat_id: str, content: str, reply_to=None, metadata=None): + self.sends.append({"chat_id": chat_id, "content": content, "metadata": metadata}) + + class _R: + success = True + message_id = "mock-msg" + + return _R() + + +def _make_runner_with_adapter(): + from gateway.run import GatewayRunner + + runner = object.__new__(GatewayRunner) + runner.config = GatewayConfig( + platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")}, + ) + runner.adapters = {} + runner._running_agents = {} + runner._running_agents_ts = {} + runner._queued_events = {} + + src = _make_source() + session_entry = SessionEntry( + session_key=build_session_key(src), + session_id="goal-sess-1", + created_at=datetime.now(), + updated_at=datetime.now(), + platform=Platform.TELEGRAM, + chat_type="dm", + ) + + runner.session_store = MagicMock() + runner.session_store.get_or_create_session.return_value = session_entry + runner.session_store._generate_session_key.return_value = build_session_key(src) + + adapter = _RecordingAdapter() + runner.adapters[Platform.TELEGRAM] = adapter + return runner, adapter, session_entry, src + + +@pytest.mark.asyncio +async def test_goal_verdict_done_sent_via_adapter_send(hermes_home): + """When the judge says done, the '✓ Goal achieved' message must reach + the user through the adapter's ``send()`` method.""" + runner, adapter, session_entry, src = _make_runner_with_adapter() + + from hermes_cli.goals import GoalManager + + mgr = GoalManager(session_entry.session_id) + mgr.set("ship the feature") + + with patch("hermes_cli.goals.judge_goal", return_value=("done", "the feature shipped")): + runner._post_turn_goal_continuation( + session_entry=session_entry, + source=src, + final_response="I shipped the feature.", + ) + # fire-and-forget create_task — give the loop a tick + await asyncio.sleep(0.05) + + assert len(adapter.sends) == 1, f"expected 1 send, got {len(adapter.sends)}: {adapter.sends}" + msg = adapter.sends[0] + assert msg["chat_id"] == "c1" + assert "Goal achieved" in msg["content"] + assert "the feature shipped" in msg["content"] + + +@pytest.mark.asyncio +async def test_goal_verdict_continue_enqueues_continuation(hermes_home): + """When the judge says continue, both the 'continuing' status and the + continuation-prompt event must be delivered. The continuation prompt is + routed through the adapter's pending-messages FIFO so the goal loop + proceeds on the next turn.""" + runner, adapter, session_entry, src = _make_runner_with_adapter() + + from hermes_cli.goals import GoalManager + + mgr = GoalManager(session_entry.session_id) + mgr.set("polish the docs") + + with patch("hermes_cli.goals.judge_goal", return_value=("continue", "still needs work")): + runner._post_turn_goal_continuation( + session_entry=session_entry, + source=src, + final_response="here's a partial edit", + ) + await asyncio.sleep(0.05) + + # Status line sent back + assert len(adapter.sends) == 1 + assert "Continuing toward goal" in adapter.sends[0]["content"] + # Continuation prompt enqueued for next turn + assert adapter._pending_messages, "continuation prompt must be enqueued in pending_messages" + + +@pytest.mark.asyncio +async def test_goal_verdict_budget_exhausted_sends_pause(hermes_home): + """When the budget is exhausted, a '⏸ Goal paused' message must be sent + and no further continuation enqueued.""" + runner, adapter, session_entry, src = _make_runner_with_adapter() + + from hermes_cli.goals import GoalManager, save_goal + + mgr = GoalManager(session_entry.session_id, default_max_turns=2) + state = mgr.set("tiny goal", max_turns=2) + state.turns_used = 2 + save_goal(session_entry.session_id, state) + + with patch("hermes_cli.goals.judge_goal", return_value=("continue", "keep going")): + runner._post_turn_goal_continuation( + session_entry=session_entry, + source=src, + final_response="still partial", + ) + await asyncio.sleep(0.05) + + assert len(adapter.sends) == 1 + content = adapter.sends[0]["content"] + assert "paused" in content.lower() + assert "turns used" in content.lower() + # No continuation enqueued when budget is exhausted + assert not adapter._pending_messages + + +@pytest.mark.asyncio +async def test_goal_verdict_skipped_when_no_active_goal(hermes_home): + """No goal set → the hook is a no-op. Nothing is sent, nothing enqueued.""" + runner, adapter, session_entry, src = _make_runner_with_adapter() + + runner._post_turn_goal_continuation( + session_entry=session_entry, + source=src, + final_response="anything", + ) + await asyncio.sleep(0.05) + + assert adapter.sends == [] + assert adapter._pending_messages == {} + + +@pytest.mark.asyncio +async def test_goal_verdict_survives_adapter_without_send(hermes_home): + """Bad adapter (no ``send`` attribute) must not crash the judge hook.""" + runner, _adapter, session_entry, src = _make_runner_with_adapter() + + from hermes_cli.goals import GoalManager + + GoalManager(session_entry.session_id).set("survive missing send") + + class _NoSendAdapter: + def __init__(self): + self._pending_messages: dict = {} + + runner.adapters[Platform.TELEGRAM] = _NoSendAdapter() + + with patch("hermes_cli.goals.judge_goal", return_value=("done", "ok")): + # must not raise + runner._post_turn_goal_continuation( + session_entry=session_entry, + source=src, + final_response="whatever", + ) + await asyncio.sleep(0.05) diff --git a/tests/tui_gateway/test_goal_command.py b/tests/tui_gateway/test_goal_command.py new file mode 100644 index 0000000000..050b36bc87 --- /dev/null +++ b/tests/tui_gateway/test_goal_command.py @@ -0,0 +1,196 @@ +"""Tests for /goal handling in tui_gateway. + +The TUI routes ``/goal`` through ``command.dispatch`` (not ``slash.exec``) +because the CLI's ``_handle_goal_command`` queues the kickoff message onto +``_pending_input``, which the slash-worker subprocess has no reader for. +Instead we handle ``/goal`` directly in the server and return a +``{"type": "send", "notice": ..., "message": ...}`` payload the TUI client +uses to render a system line and fire the kickoff prompt. +""" + +from __future__ import annotations + +import importlib +import threading +from pathlib import Path +from unittest.mock import MagicMock, patch + +import pytest + + +@pytest.fixture() +def hermes_home(tmp_path, monkeypatch): + home = tmp_path / ".hermes" + home.mkdir() + monkeypatch.setattr(Path, "home", lambda: tmp_path) + monkeypatch.setenv("HERMES_HOME", str(home)) + + # Bust the goal-module DB cache so it re-resolves HERMES_HOME. + from hermes_cli import goals + + goals._DB_CACHE.clear() + yield home + goals._DB_CACHE.clear() + + +@pytest.fixture() +def server(hermes_home): + with patch.dict( + "sys.modules", + { + "hermes_cli.env_loader": MagicMock(), + "hermes_cli.banner": MagicMock(), + }, + ): + mod = importlib.import_module("tui_gateway.server") + yield mod + mod._sessions.clear() + mod._pending.clear() + mod._answers.clear() + mod._methods.clear() + importlib.reload(mod) + + +@pytest.fixture() +def session(server): + sid = "sid-test" + session_key = "tui-goal-session-1" + s = { + "session_key": session_key, + "history": [], + "history_lock": threading.Lock(), + "history_version": 0, + "running": False, + "attached_images": [], + "cols": 120, + } + server._sessions[sid] = s + return sid, session_key, s + + +def _call(server, method, **params): + handler = server._methods[method] + return handler(1, params) + + +# ── command.dispatch /goal ──────────────────────────────────────────── + + +def test_goal_bare_shows_status_when_none_set(server, session): + sid, _, _ = session + r = _call(server, "command.dispatch", name="goal", arg="", session_id=sid) + assert r["result"]["type"] == "exec" + assert "No active goal" in r["result"]["output"] + + +def test_goal_whitespace_only_shows_status(server, session): + sid, _, _ = session + r = _call(server, "command.dispatch", name="goal", arg=" ", session_id=sid) + assert r["result"]["type"] == "exec" + assert "No active goal" in r["result"]["output"] + + +def test_goal_status_alias_shows_status(server, session): + sid, _, _ = session + r = _call(server, "command.dispatch", name="goal", arg="status", session_id=sid) + assert r["result"]["type"] == "exec" + assert "No active goal" in r["result"]["output"] + + +def test_goal_set_returns_send_with_notice(server, session): + sid, session_key, _ = session + r = _call(server, "command.dispatch", name="goal", arg="build a rocket", session_id=sid) + result = r["result"] + assert result["type"] == "send" + assert result["message"] == "build a rocket" + assert "notice" in result + assert "Goal set" in result["notice"] + assert "20-turn budget" in result["notice"] + + # Persisted in SessionDB + from hermes_cli.goals import GoalManager + + mgr = GoalManager(session_key) + assert mgr.state is not None + assert mgr.state.goal == "build a rocket" + assert mgr.state.status == "active" + + +def test_goal_pause_after_set(server, session): + sid, session_key, _ = session + _call(server, "command.dispatch", name="goal", arg="write a story", session_id=sid) + r = _call(server, "command.dispatch", name="goal", arg="pause", session_id=sid) + assert r["result"]["type"] == "exec" + assert "paused" in r["result"]["output"].lower() + + from hermes_cli.goals import GoalManager + + assert GoalManager(session_key).state.status == "paused" + + +def test_goal_resume_reactivates(server, session): + sid, session_key, _ = session + _call(server, "command.dispatch", name="goal", arg="write a story", session_id=sid) + _call(server, "command.dispatch", name="goal", arg="pause", session_id=sid) + r = _call(server, "command.dispatch", name="goal", arg="resume", session_id=sid) + assert r["result"]["type"] == "exec" + assert "resumed" in r["result"]["output"].lower() + + from hermes_cli.goals import GoalManager + + assert GoalManager(session_key).state.status == "active" + + +def test_goal_clear_removes_active_goal(server, session): + sid, session_key, _ = session + _call(server, "command.dispatch", name="goal", arg="write a story", session_id=sid) + r = _call(server, "command.dispatch", name="goal", arg="clear", session_id=sid) + assert r["result"]["type"] == "exec" + assert "cleared" in r["result"]["output"].lower() + + from hermes_cli.goals import GoalManager + + # After clear the row is marked status=cleared (kept for audit); + # ``has_goal()`` / ``is_active()`` return False so the goal loop + # stays off and ``status`` reports "No active goal". + mgr = GoalManager(session_key) + assert not mgr.has_goal() + assert not mgr.is_active() + assert "No active goal" in mgr.status_line() + + +def test_goal_stop_and_done_are_clear_aliases(server, session): + sid, _, _ = session + _call(server, "command.dispatch", name="goal", arg="first goal", session_id=sid) + r = _call(server, "command.dispatch", name="goal", arg="stop", session_id=sid) + assert "cleared" in r["result"]["output"].lower() + + _call(server, "command.dispatch", name="goal", arg="second goal", session_id=sid) + r = _call(server, "command.dispatch", name="goal", arg="done", session_id=sid) + assert "cleared" in r["result"]["output"].lower() + + +def test_goal_requires_session(server): + r = _call(server, "command.dispatch", name="goal", arg="nope", session_id="unknown") + assert "error" in r + assert r["error"]["code"] == 4001 + + +# ── slash.exec /goal routing ────────────────────────────────────────── + + +def test_slash_exec_rejects_goal_routes_to_command_dispatch(server, session): + """slash.exec must reject /goal with 4018 so the TUI client falls through + to command.dispatch. Without this, the HermesCLI slash-worker subprocess + would set the goal but silently drop the kickoff — the queue is in-proc.""" + sid, _, _ = session + r = _call(server, "slash.exec", command="goal status", session_id=sid) + assert "error" in r + assert r["error"]["code"] == 4018 + assert "command.dispatch" in r["error"]["message"] + + +def test_pending_input_commands_includes_goal(server): + """Guard: _PENDING_INPUT_COMMANDS must list 'goal' — removing it would + silently re-break the TUI.""" + assert "goal" in server._PENDING_INPUT_COMMANDS diff --git a/tui_gateway/server.py b/tui_gateway/server.py index 724fb542e6..fe66d3798d 100644 --- a/tui_gateway/server.py +++ b/tui_gateway/server.py @@ -2822,6 +2822,7 @@ def _run_prompt_submit(rid, sid: str, session: dict, text: Any) -> None: def run(): approval_token = None session_tokens = [] + goal_followup = None # set by the post-turn goal hook below try: from tools.approval import ( reset_current_session_key, @@ -2981,6 +2982,55 @@ def _run_prompt_submit(rid, sid: str, session: dict, text: Any) -> None: payload["rendered"] = rendered _emit("message.complete", sid, payload) + # ── /goal continuation (Ralph-style loop) ───────────────── + # After every TUI turn, if a /goal is active, ask the judge + # whether the goal is done and — if not and we're still under + # budget — queue a continuation prompt to run after this + # thread releases session["running"]. The verdict message + # ("✓ Goal achieved" / "⏸ budget exhausted") is surfaced as + # a system line so the user sees progress regardless of + # outcome. Mirrors gateway/run._post_turn_goal_continuation. + if ( + status == "complete" + and isinstance(raw, str) + and raw.strip() + ): + try: + from hermes_cli.goals import GoalManager + + sid_key = session.get("session_key") or "" + if sid_key: + try: + goals_cfg = (_load_cfg().get("goals") or {}) + goal_max_turns = int(goals_cfg.get("max_turns", 20) or 20) + except Exception: + goal_max_turns = 20 + goal_mgr = GoalManager( + session_id=sid_key, + default_max_turns=goal_max_turns, + ) + if goal_mgr.is_active(): + decision = goal_mgr.evaluate_after_turn( + raw, user_initiated=True, + ) + verdict_msg = decision.get("message") or "" + if verdict_msg: + _emit( + "status.update", + sid, + {"kind": "goal", "text": verdict_msg}, + ) + if decision.get("should_continue"): + cont_prompt = decision.get("continuation_prompt") or "" + if cont_prompt: + goal_followup = cont_prompt + except Exception as _goal_exc: + print( + f"[tui_gateway] goal continuation hook failed: " + f"{type(_goal_exc).__name__}: {_goal_exc}", + file=sys.stderr, + ) + # Apply pending_title now that the DB row exists. _pending = session.get("pending_title") if _pending and status == "complete": @@ -3061,6 +3111,31 @@ def _run_prompt_submit(rid, sid: str, session: dict, text: Any) -> None: with session["history_lock"]: session["running"] = False + # Chain a goal-continuation turn if the judge said so. We do + # this AFTER the finally releases session["running"], so the + # nested _run_prompt_submit doesn't deadlock on the busy + # guard. A real user prompt that races us wins because + # prompt.submit sets running=True under the history_lock and + # we check that guard before re-firing. + if goal_followup: + with session["history_lock"]: + if session.get("running"): + # User already sent something — their turn wins, + # the judge will re-run on the next turn anyway. + return + session["running"] = True + try: + _emit("message.start", sid) + _run_prompt_submit(rid, sid, session, goal_followup) + except Exception as _cont_exc: + print( + f"[tui_gateway] goal continuation dispatch failed: " + f"{type(_cont_exc).__name__}: {_cont_exc}", + file=sys.stderr, + ) + with session["history_lock"]: + session["running"] = False + threading.Thread(target=run, daemon=True).start() @@ -3928,6 +4003,7 @@ _PENDING_INPUT_COMMANDS: frozenset[str] = frozenset( "q", "steer", "plan", + "goal", } ) @@ -4240,6 +4316,77 @@ def _(rid, params: dict) -> dict: # Fallback: no active run, treat as next-turn message return _ok(rid, {"type": "send", "message": arg}) + if name == "goal": + if not session: + return _err(rid, 4001, "no active session") + try: + from hermes_cli.goals import GoalManager + except Exception as exc: + return _err(rid, 5030, f"goals unavailable: {exc}") + + sid_key = session.get("session_key") or "" + if not sid_key: + return _err(rid, 4001, "no session key") + + try: + goals_cfg = (_load_cfg().get("goals") or {}) + max_turns = int(goals_cfg.get("max_turns", 20) or 20) + except Exception: + max_turns = 20 + mgr = GoalManager(session_id=sid_key, default_max_turns=max_turns) + + lower = arg.strip().lower() + if not arg.strip() or lower == "status": + return _ok(rid, {"type": "exec", "output": mgr.status_line()}) + if lower == "pause": + state = mgr.pause(reason="user-paused") + out = "No goal set." if state is None else f"⏸ Goal paused: {state.goal}" + return _ok(rid, {"type": "exec", "output": out}) + if lower == "resume": + state = mgr.resume() + if state is None: + return _ok(rid, {"type": "exec", "output": "No goal to resume."}) + return _ok( + rid, + { + "type": "exec", + "output": ( + f"▶ Goal resumed: {state.goal}\n" + "Send any message to continue, or wait — I'll take the next step on the next turn." + ), + }, + ) + if lower in ("clear", "stop", "done"): + had = mgr.has_goal() + mgr.clear() + return _ok( + rid, + { + "type": "exec", + "output": "✓ Goal cleared." if had else "No active goal.", + }, + ) + + # Otherwise — treat the remaining text as the new goal. + try: + state = mgr.set(arg) + except ValueError as exc: + return _err(rid, 4004, f"invalid goal: {exc}") + + notice = ( + f"⊙ Goal set ({state.max_turns}-turn budget): {state.goal}\n" + "I'll keep working until the goal is done, you pause/clear it, or the budget is exhausted.\n" + "Controls: /goal status · /goal pause · /goal resume · /goal clear" + ) + # Send the goal text as the kickoff prompt. The TUI client sees + # {type: send, notice, message} → renders `notice` as a sys line, + # then submits `message` as a user turn. The post-turn judge + # wired in _run_prompt_submit takes over from there. + return _ok( + rid, + {"type": "send", "notice": notice, "message": state.goal}, + ) + return _err(rid, 4018, f"not a quick/plugin/skill command: {name}") diff --git a/ui-tui/src/app/createGatewayEventHandler.ts b/ui-tui/src/app/createGatewayEventHandler.ts index 7230d1cf92..270024a8ef 100644 --- a/ui-tui/src/app/createGatewayEventHandler.ts +++ b/ui-tui/src/app/createGatewayEventHandler.ts @@ -287,6 +287,11 @@ export function createGatewayEventHandler(ctx: GatewayEventHandlerContext): (ev: return } + if (p.kind === 'goal') { + sys(p.text) + return + } + if (!p.kind || p.kind === 'status') { return } diff --git a/ui-tui/src/app/createSlashHandler.ts b/ui-tui/src/app/createSlashHandler.ts index 7bd19431ed..0164ef0d56 100644 --- a/ui-tui/src/app/createSlashHandler.ts +++ b/ui-tui/src/app/createSlashHandler.ts @@ -114,6 +114,9 @@ export function createSlashHandler(ctx: SlashHandlerContext): (cmd: string) => b } if (d.type === 'send') { + if (d.notice?.trim()) { + sys(d.notice) + } return d.message?.trim() ? send(d.message) : sys(`/${parsed.name}: empty message`) } }) diff --git a/ui-tui/src/gatewayTypes.ts b/ui-tui/src/gatewayTypes.ts index 390e7af3e0..a1513d2a6e 100644 --- a/ui-tui/src/gatewayTypes.ts +++ b/ui-tui/src/gatewayTypes.ts @@ -47,7 +47,7 @@ export type CommandDispatchResponse = | { output?: string; type: 'exec' | 'plugin' } | { target: string; type: 'alias' } | { message?: string; name: string; type: 'skill' } - | { message: string; type: 'send' } + | { message: string; notice?: string; type: 'send' } // ── Config ─────────────────────────────────────────────────────────── diff --git a/ui-tui/src/lib/rpc.ts b/ui-tui/src/lib/rpc.ts index 70faa4bbbe..81dc703186 100644 --- a/ui-tui/src/lib/rpc.ts +++ b/ui-tui/src/lib/rpc.ts @@ -27,7 +27,11 @@ export const asCommandDispatch = (value: unknown): CommandDispatchResponse | nul } if (t === 'send' && typeof o.message === 'string') { - return { type: 'send', message: o.message } + return { + type: 'send', + message: o.message, + notice: typeof o.notice === 'string' ? o.notice : undefined, + } } return null From b59bb4e351c4cdca97e906accb4bbe9193c381b6 Mon Sep 17 00:00:00 2001 From: leprincep35700 <61830395+leprincep35700@users.noreply.github.com> Date: Fri, 1 May 2026 15:19:25 +0000 Subject: [PATCH 54/61] fix(gateway): preserve home-channel thread targets across restart notifications --- cron/scheduler.py | 17 +- gateway/config.py | 31 ++- gateway/run.py | 205 +++++++++++++++--- tests/cron/test_scheduler.py | 10 + tests/gateway/restart_test_helpers.py | 18 +- tests/gateway/test_home_target_env_var.py | 8 +- tests/gateway/test_restart_notification.py | 213 ++++++++++++++++++- tests/gateway/test_restart_resume_pending.py | 81 ++++++- 8 files changed, 544 insertions(+), 39 deletions(-) diff --git a/cron/scheduler.py b/cron/scheduler.py index 4672b24ba7..fafcbfab95 100644 --- a/cron/scheduler.py +++ b/cron/scheduler.py @@ -147,6 +147,19 @@ def _get_home_target_chat_id(platform_name: str) -> str: return value +def _get_home_target_thread_id(platform_name: str) -> Optional[str]: + """Return the optional thread/topic ID for a platform home target.""" + env_var = _HOME_TARGET_ENV_VARS.get(platform_name.lower()) + if not env_var: + return None + value = os.getenv(f"{env_var}_THREAD_ID", "").strip() + if not value: + legacy = _LEGACY_HOME_TARGET_ENV_VARS.get(env_var) + if legacy: + value = os.getenv(f"{legacy}_THREAD_ID", "").strip() + return value or None + + def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[dict]: """Resolve one concrete auto-delivery target for a cron job.""" @@ -175,7 +188,7 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d return { "platform": platform_name, "chat_id": chat_id, - "thread_id": None, + "thread_id": _get_home_target_thread_id(platform_name), } return None @@ -229,7 +242,7 @@ def _resolve_single_delivery_target(job: dict, deliver_value: str) -> Optional[d return { "platform": platform_name, "chat_id": chat_id, - "thread_id": None, + "thread_id": _get_home_target_thread_id(platform_name), } diff --git a/gateway/config.py b/gateway/config.py index 6db8e55d84..6527accec4 100644 --- a/gateway/config.py +++ b/gateway/config.py @@ -186,18 +186,24 @@ class HomeChannel: Default destination for a platform. When a cron job specifies deliver="telegram" without a specific chat ID, - messages are sent to this home channel. + messages are sent to this home channel. Thread-aware platforms may also + store a thread/topic ID so the bare platform target routes to the exact + conversation where /sethome was run. """ platform: Platform chat_id: str name: str # Human-readable name for display + thread_id: Optional[str] = None def to_dict(self) -> Dict[str, Any]: - return { + result = { "platform": self.platform.value, "chat_id": self.chat_id, "name": self.name, } + if self.thread_id: + result["thread_id"] = self.thread_id + return result @classmethod def from_dict(cls, data: Dict[str, Any]) -> "HomeChannel": @@ -205,6 +211,7 @@ class HomeChannel: platform=Platform(data["platform"]), chat_id=str(data["chat_id"]), name=data.get("name", "Home"), + thread_id=str(data["thread_id"]) if data.get("thread_id") else None, ) @@ -1071,6 +1078,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.TELEGRAM, chat_id=telegram_home, name=os.getenv("TELEGRAM_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("TELEGRAM_HOME_CHANNEL_THREAD_ID") or None, ) # Discord @@ -1087,6 +1095,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.DISCORD, chat_id=discord_home, name=os.getenv("DISCORD_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("DISCORD_HOME_CHANNEL_THREAD_ID") or None, ) # Reply threading mode for Discord (off/first/all) @@ -1108,6 +1117,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.WHATSAPP, chat_id=whatsapp_home, name=os.getenv("WHATSAPP_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("WHATSAPP_HOME_CHANNEL_THREAD_ID") or None, ) # Slack @@ -1135,6 +1145,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.SLACK, chat_id=slack_home, name=os.getenv("SLACK_HOME_CHANNEL_NAME", ""), + thread_id=os.getenv("SLACK_HOME_CHANNEL_THREAD_ID") or None, ) # Signal @@ -1155,6 +1166,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.SIGNAL, chat_id=signal_home, name=os.getenv("SIGNAL_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("SIGNAL_HOME_CHANNEL_THREAD_ID") or None, ) # Mattermost @@ -1174,6 +1186,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.MATTERMOST, chat_id=mattermost_home, name=os.getenv("MATTERMOST_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("MATTERMOST_HOME_CHANNEL_THREAD_ID") or None, ) # Matrix @@ -1205,6 +1218,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.MATRIX, chat_id=matrix_home, name=os.getenv("MATRIX_HOME_ROOM_NAME", "Home"), + thread_id=os.getenv("MATRIX_HOME_ROOM_THREAD_ID") or None, ) # Home Assistant @@ -1238,6 +1252,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.EMAIL, chat_id=email_home, name=os.getenv("EMAIL_HOME_ADDRESS_NAME", "Home"), + thread_id=os.getenv("EMAIL_HOME_ADDRESS_THREAD_ID") or None, ) # SMS (Twilio) @@ -1253,6 +1268,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.SMS, chat_id=sms_home, name=os.getenv("SMS_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("SMS_HOME_CHANNEL_THREAD_ID") or None, ) # API Server @@ -1315,6 +1331,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.DINGTALK, chat_id=dingtalk_home, name=os.getenv("DINGTALK_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("DINGTALK_HOME_CHANNEL_THREAD_ID") or None, ) # Feishu / Lark @@ -1342,6 +1359,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.FEISHU, chat_id=feishu_home, name=os.getenv("FEISHU_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("FEISHU_HOME_CHANNEL_THREAD_ID") or None, ) # WeCom (Enterprise WeChat) @@ -1364,6 +1382,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.WECOM, chat_id=wecom_home, name=os.getenv("WECOM_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("WECOM_HOME_CHANNEL_THREAD_ID") or None, ) # WeCom callback mode (self-built apps) @@ -1422,6 +1441,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.WEIXIN, chat_id=weixin_home, name=os.getenv("WEIXIN_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("WEIXIN_HOME_CHANNEL_THREAD_ID") or None, ) # BlueBubbles (iMessage) @@ -1445,6 +1465,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.BLUEBUBBLES, chat_id=bluebubbles_home, name=os.getenv("BLUEBUBBLES_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("BLUEBUBBLES_HOME_CHANNEL_THREAD_ID") or None, ) # QQ (Official Bot API v2) @@ -1482,6 +1503,11 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.QQBOT, chat_id=qq_home, name=os.getenv("QQBOT_HOME_CHANNEL_NAME") or os.getenv(qq_home_name_env, "Home"), + thread_id=( + os.getenv("QQBOT_HOME_CHANNEL_THREAD_ID") + or os.getenv("QQ_HOME_CHANNEL_THREAD_ID") + or None + ), ) # Yuanbao — YUANBAO_APP_ID preferred @@ -1512,6 +1538,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None: platform=Platform.YUANBAO, chat_id=yuanbao_home, name=os.getenv("YUANBAO_HOME_CHANNEL_NAME", "Home"), + thread_id=os.getenv("YUANBAO_HOME_CHANNEL_THREAD_ID") or None, ) yuanbao_dm_policy = os.getenv("YUANBAO_DM_POLICY") if yuanbao_dm_policy: diff --git a/gateway/run.py b/gateway/run.py index 86076bf0bf..d604947e99 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -283,6 +283,16 @@ def _home_target_env_var(platform_name: str) -> str: ) +def _home_thread_env_var(platform_name: str) -> str: + """Return the optional thread/topic env var for a platform home target.""" + return f"{_home_target_env_var(platform_name)}_THREAD_ID" + + +def _restart_notification_pending() -> bool: + """Return True when a /restart completion marker is waiting to be delivered.""" + return (_hermes_home / ".restart_notify.json").exists() + + _ensure_ssl_certs() # Add parent directory to path @@ -507,6 +517,8 @@ from gateway.config import ( Platform, _BUILTIN_PLATFORM_VALUES, GatewayConfig, + HomeChannel, + PlatformConfig, load_gateway_config, ) from gateway.session import ( @@ -2257,15 +2269,13 @@ class GatewayRunner: logger.debug("Failed interrupting agent during shutdown: %s", e) async def _notify_active_sessions_of_shutdown(self) -> None: - """Send a notification to every chat with an active agent. + """Send shutdown/restart notifications to active chats and home channels. Called at the very start of stop() — adapters are still connected so - messages can be delivered. Best-effort: individual send failures are + messages can be delivered. Best-effort: individual send failures are logged and swallowed so they never block the shutdown sequence. """ active = self._snapshot_running_agents() - if not active: - return action = "restarting" if self._restart_requested else "shutting down" hint = ( @@ -2276,7 +2286,7 @@ class GatewayRunner: ) msg = f"⚠️ Gateway {action} — {hint}" - notified: set = set() + notified: set[tuple[str, str, Optional[str]]] = set() for session_key in active: source = None try: @@ -2293,7 +2303,7 @@ class GatewayRunner: if source is not None: platform_str = source.platform.value - chat_id = source.chat_id + chat_id = str(source.chat_id) thread_id = source.thread_id else: # Fall back to parsing the session key when no persisted @@ -2305,9 +2315,10 @@ class GatewayRunner: chat_id = _parsed["chat_id"] thread_id = _parsed.get("thread_id") - # Deduplicate: one notification per chat, even if multiple - # sessions (different users/threads) share the same chat. - dedup_key = (platform_str, chat_id) + # Deduplicate only identical delivery targets. Thread/topic-aware + # platforms can share a parent chat while still routing to distinct + # destinations via metadata. + dedup_key = (platform_str, chat_id, str(thread_id) if thread_id else None) if dedup_key in notified: continue @@ -2321,10 +2332,19 @@ class GatewayRunner: # correct forum topic / thread. metadata = {"thread_id": thread_id} if thread_id else None - await adapter.send(chat_id, msg, metadata=metadata) + result = await adapter.send(chat_id, msg, metadata=metadata) + if result is not None and getattr(result, "success", True) is False: + logger.debug( + "Failed to send shutdown notification to %s:%s: %s", + platform_str, + chat_id, + getattr(result, "error", "send returned success=False"), + ) + continue + notified.add(dedup_key) logger.info( - "Sent shutdown notification to %s:%s", + "Sent shutdown notification to active chat %s:%s", platform_str, chat_id, ) except Exception as e: @@ -2333,6 +2353,44 @@ class GatewayRunner: platform_str, chat_id, e, ) + for platform, adapter in self.adapters.items(): + home = self.config.get_home_channel(platform) + if not home or not home.chat_id: + continue + + dedup_key = (platform.value, str(home.chat_id), str(home.thread_id) if home.thread_id else None) + if dedup_key in notified: + continue + + try: + metadata = {"thread_id": home.thread_id} if home.thread_id else None + if metadata: + result = await adapter.send(str(home.chat_id), msg, metadata=metadata) + else: + result = await adapter.send(str(home.chat_id), msg) + if result is not None and getattr(result, "success", True) is False: + logger.debug( + "Failed to send shutdown notification to home channel %s:%s: %s", + platform.value, + home.chat_id, + getattr(result, "error", "send returned success=False"), + ) + continue + + notified.add(dedup_key) + logger.info( + "Sent shutdown notification to home channel %s:%s", + platform.value, + home.chat_id, + ) + except Exception as e: + logger.debug( + "Failed to send shutdown notification to home channel %s:%s: %s", + platform.value, + home.chat_id, + e, + ) + def _finalize_shutdown_agents(self, active_agents: Dict[str, Any]) -> None: for agent in active_agents.values(): try: @@ -2953,8 +3011,28 @@ class GatewayRunner: ): self._schedule_update_notification_watch() + # Give freshly connected platform adapters a brief moment to settle + # before sending restart/startup lifecycle messages. In practice this + # helps Discord thread deliveries right after reconnect. + if connected_count > 0: + await asyncio.sleep(1.0) + # Notify the chat that initiated /restart that the gateway is back. - await self._send_restart_notification() + restart_notification_pending = _restart_notification_pending() + delivered_restart_target = await self._send_restart_notification() + + # Broadcast a lightweight "gateway is back" message to configured + # home channels only when this startup is resuming from /restart. If a + # /restart requester already received a direct completion notice in the + # same chat, skip the generic broadcast there to avoid duplicates while + # still allowing a home-channel fallback when the direct send fails. + if restart_notification_pending or delivered_restart_target is not None: + skip_home_targets = ( + {delivered_restart_target} if delivered_restart_target else None + ) + await self._send_home_channel_startup_notifications( + skip_targets=skip_home_targets, + ) # Drain any recovered process watchers (from crash recovery checkpoint) try: @@ -7976,14 +8054,33 @@ class GatewayRunner: chat_name = source.chat_name or chat_id env_key = _home_target_env_var(platform_name) + thread_env_key = _home_thread_env_var(platform_name) + thread_id = source.thread_id # Save to .env so it persists across restarts try: from hermes_cli.config import save_env_value save_env_value(env_key, str(chat_id)) + # Keep thread/topic routing explicit and clear stale values when + # /sethome is run from the parent chat instead of a thread. + save_env_value(thread_env_key, str(thread_id or "")) except Exception as e: return f"Failed to save home channel: {e}" + # Keep the running gateway config in sync too. The pre-restart + # notification path reads self.config before the process reloads env. + if source.platform: + platform_config = self.config.platforms.setdefault( + source.platform, + PlatformConfig(enabled=True), + ) + platform_config.home_channel = HomeChannel( + platform=source.platform, + chat_id=str(chat_id), + name=chat_name, + thread_id=str(thread_id) if thread_id else None, + ) + return ( f"✅ Home channel set to **{chat_name}** (ID: {chat_id}).\n" f"Cron jobs and cross-platform messages will be delivered here." @@ -10467,11 +10564,11 @@ class GatewayRunner: return True - async def _send_restart_notification(self) -> None: + async def _send_restart_notification(self) -> Optional[tuple[str, str, Optional[str]]]: """Notify the chat that initiated /restart that the gateway is back.""" notify_path = _hermes_home / ".restart_notify.json" if not notify_path.exists(): - return + return None try: data = json.loads(notify_path.read_text()) @@ -10480,7 +10577,7 @@ class GatewayRunner: thread_id = data.get("thread_id") if not platform_str or not chat_id: - return + return None platform = Platform(platform_str) adapter = self.adapters.get(platform) @@ -10489,11 +10586,11 @@ class GatewayRunner: "Restart notification skipped: %s adapter not connected", platform_str, ) - return + return None metadata = {"thread_id": thread_id} if thread_id else None result = await adapter.send( - chat_id, + str(chat_id), "♻ Gateway restarted successfully. Your session continues.", metadata=metadata, ) @@ -10501,24 +10598,82 @@ class GatewayRunner: # and returns SendResult(success=False) rather than raising, so # we must inspect the result before claiming success — otherwise # the log line is misleading and hides real delivery failures. - if getattr(result, "success", False): - logger.info( - "Sent restart notification to %s:%s", - platform_str, - chat_id, - ) - else: + if result is not None and getattr(result, "success", True) is False: logger.warning( "Restart notification to %s:%s was not delivered: %s", platform_str, chat_id, - getattr(result, "error", "unknown error"), + getattr(result, "error", "send returned success=False"), ) + return None + + logger.info( + "Sent restart notification to %s:%s", + platform_str, + chat_id, + ) + return str(platform_str), str(chat_id), str(thread_id) if thread_id else None except Exception as e: logger.warning("Restart notification failed: %s", e) + return None finally: notify_path.unlink(missing_ok=True) + async def _send_home_channel_startup_notifications( + self, + *, + skip_targets: Optional[set[tuple[str, str, Optional[str]]]] = None, + ) -> set[tuple[str, str, Optional[str]]]: + """Notify configured home channels that the gateway is back online. + + The notification is best-effort and sent once per connected platform + home channel. ``skip_targets`` lets startup avoid duplicate messages + when a more specific restart notification is queued for the same chat. + """ + delivered: set[tuple[str, str, Optional[str]]] = set() + skipped = skip_targets or set() + message = "♻️ Gateway online — Hermes is back and ready." + + for platform, adapter in self.adapters.items(): + home = self.config.get_home_channel(platform) + if not home or not home.chat_id: + continue + + target = (platform.value, str(home.chat_id), str(home.thread_id) if home.thread_id else None) + if target in skipped or target in delivered: + continue + + try: + metadata = {"thread_id": home.thread_id} if home.thread_id else None + if metadata: + result = await adapter.send(str(home.chat_id), message, metadata=metadata) + else: + result = await adapter.send(str(home.chat_id), message) + if result is not None and getattr(result, "success", True) is False: + logger.warning( + "Home-channel startup notification failed for %s:%s: %s", + platform.value, + home.chat_id, + getattr(result, "error", "send returned success=False"), + ) + continue + + delivered.add(target) + logger.info( + "Sent home-channel startup notification to %s:%s", + platform.value, + home.chat_id, + ) + except Exception as exc: + logger.warning( + "Home-channel startup notification failed for %s:%s: %s", + platform.value, + home.chat_id, + exc, + ) + + return delivered + def _set_session_env(self, context: SessionContext) -> list: """Set session context variables for the current async task. diff --git a/tests/cron/test_scheduler.py b/tests/cron/test_scheduler.py index a5bcd4bf9b..8c204d9a51 100644 --- a/tests/cron/test_scheduler.py +++ b/tests/cron/test_scheduler.py @@ -118,6 +118,16 @@ class TestResolveDeliveryTarget: "thread_id": None, } + def test_bare_platform_delivery_preserves_home_thread_id(self, monkeypatch): + monkeypatch.setenv("DISCORD_HOME_CHANNEL", "parent-42") + monkeypatch.setenv("DISCORD_HOME_CHANNEL_THREAD_ID", "topic-7") + + assert _resolve_delivery_target({"deliver": "discord"}) == { + "platform": "discord", + "chat_id": "parent-42", + "thread_id": "topic-7", + } + def test_explicit_telegram_topic_target_with_thread_id(self): """deliver: 'telegram:chat_id:thread_id' parses correctly.""" job = { diff --git a/tests/gateway/restart_test_helpers.py b/tests/gateway/restart_test_helpers.py index 6332a194fe..4c5dab9960 100644 --- a/tests/gateway/restart_test_helpers.py +++ b/tests/gateway/restart_test_helpers.py @@ -12,6 +12,7 @@ class RestartTestAdapter(BasePlatformAdapter): def __init__(self): super().__init__(PlatformConfig(enabled=True, token="***"), Platform.TELEGRAM) self.sent: list[str] = [] + self.sent_calls: list[tuple[str, str, object]] = [] async def connect(self): return True @@ -21,6 +22,7 @@ class RestartTestAdapter(BasePlatformAdapter): async def send(self, chat_id, content, reply_to=None, metadata=None): self.sent.append(content) + self.sent_calls.append((chat_id, content, metadata)) return SendResult(success=True, message_id="1") async def send_typing(self, chat_id, metadata=None): @@ -30,12 +32,17 @@ class RestartTestAdapter(BasePlatformAdapter): return {"id": chat_id} -def make_restart_source(chat_id: str = "123456", chat_type: str = "dm") -> SessionSource: +def make_restart_source( + chat_id: str = "123456", + chat_type: str = "dm", + thread_id: str | None = None, +) -> SessionSource: return SessionSource( platform=Platform.TELEGRAM, chat_id=chat_id, chat_type=chat_type, user_id="u1", + thread_id=thread_id, ) @@ -81,6 +88,15 @@ def make_restart_runner( runner._handle_restart_command = GatewayRunner._handle_restart_command.__get__( runner, GatewayRunner ) + runner._handle_set_home_command = GatewayRunner._handle_set_home_command.__get__( + runner, GatewayRunner + ) + runner._send_restart_notification = GatewayRunner._send_restart_notification.__get__( + runner, GatewayRunner + ) + runner._send_home_channel_startup_notifications = ( + GatewayRunner._send_home_channel_startup_notifications.__get__(runner, GatewayRunner) + ) runner._status_action_label = GatewayRunner._status_action_label.__get__( runner, GatewayRunner ) diff --git a/tests/gateway/test_home_target_env_var.py b/tests/gateway/test_home_target_env_var.py index 27a7e8919b..2e0dee0c20 100644 --- a/tests/gateway/test_home_target_env_var.py +++ b/tests/gateway/test_home_target_env_var.py @@ -8,7 +8,7 @@ to env vars nothing read on startup — the home channel appeared to set successfully but was lost on every new gateway session. """ -from gateway.run import _home_target_env_var +from gateway.run import _home_target_env_var, _home_thread_env_var def test_matrix_home_target_env_var_uses_home_room(): @@ -34,3 +34,9 @@ def test_unknown_platform_home_target_env_var_falls_back_to_home_channel(): def test_case_insensitive_platform_name(): assert _home_target_env_var("MATRIX") == "MATRIX_HOME_ROOM" assert _home_target_env_var("Email") == "EMAIL_HOME_ADDRESS" + + +def test_home_thread_env_var_uses_home_target_name_plus_thread_id(): + assert _home_thread_env_var("discord") == "DISCORD_HOME_CHANNEL_THREAD_ID" + assert _home_thread_env_var("matrix") == "MATRIX_HOME_ROOM_THREAD_ID" + assert _home_thread_env_var("email") == "EMAIL_HOME_ADDRESS_THREAD_ID" diff --git a/tests/gateway/test_restart_notification.py b/tests/gateway/test_restart_notification.py index 254917897f..e97216072a 100644 --- a/tests/gateway/test_restart_notification.py +++ b/tests/gateway/test_restart_notification.py @@ -8,8 +8,8 @@ from unittest.mock import AsyncMock, MagicMock import pytest import gateway.run as gateway_run -from gateway.config import Platform -from gateway.platforms.base import MessageEvent, MessageType +from gateway.config import HomeChannel, Platform +from gateway.platforms.base import MessageEvent, MessageType, SendResult from gateway.session import build_session_key from tests.gateway.restart_test_helpers import ( make_restart_runner, @@ -17,6 +17,22 @@ from tests.gateway.restart_test_helpers import ( ) +# ── restart marker helpers ─────────────────────────────────────────────── + + +def test_restart_notification_pending_false_without_marker(tmp_path, monkeypatch): + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + assert gateway_run._restart_notification_pending() is False + + +def test_restart_notification_pending_true_with_marker(tmp_path, monkeypatch): + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + (tmp_path / ".restart_notify.json").write_text("{}") + + assert gateway_run._restart_notification_pending() is True + + # ── _handle_restart_command writes .restart_notify.json ────────────────── @@ -143,6 +159,184 @@ async def test_restart_command_uses_atomic_json_writes_for_marker_files(tmp_path assert calls[1][1]["platform"] == "telegram" +@pytest.mark.asyncio +async def test_sethome_updates_running_config_for_same_process_restart(tmp_path, monkeypatch): + """/sethome persists to env and updates in-memory config before restart.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + saved = {} + + def _fake_save_env_value(key, value): + saved[key] = value + + monkeypatch.setattr("hermes_cli.config.save_env_value", _fake_save_env_value) + + runner, _adapter = make_restart_runner() + source = make_restart_source(chat_id="home-42") + source.chat_name = "Ops Home" + event = MessageEvent( + text="/sethome", + message_type=MessageType.TEXT, + source=source, + message_id="m-home", + ) + + result = await runner._handle_set_home_command(event) + + home = runner.config.get_home_channel(Platform.TELEGRAM) + assert "Home channel set" in result + assert saved["TELEGRAM_HOME_CHANNEL"] == "home-42" + assert home is not None + assert home.chat_id == "home-42" + assert home.name == "Ops Home" + + +@pytest.mark.asyncio +async def test_sethome_preserves_thread_target_for_same_process_restart(tmp_path, monkeypatch): + """/sethome from a topic/thread stores the thread-aware home target.""" + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + saved = {} + + def _fake_save_env_value(key, value): + saved[key] = value + + monkeypatch.setattr("hermes_cli.config.save_env_value", _fake_save_env_value) + + runner, _adapter = make_restart_runner() + source = make_restart_source(chat_id="parent-42", thread_id="topic-7") + source.chat_name = "Ops Topic" + event = MessageEvent( + text="/sethome", + message_type=MessageType.TEXT, + source=source, + message_id="m-home-thread", + ) + + result = await runner._handle_set_home_command(event) + + home = runner.config.get_home_channel(Platform.TELEGRAM) + assert "Home channel set" in result + assert saved["TELEGRAM_HOME_CHANNEL"] == "parent-42" + assert saved["TELEGRAM_HOME_CHANNEL_THREAD_ID"] == "topic-7" + assert home is not None + assert home.chat_id == "parent-42" + assert home.thread_id == "topic-7" + + +# ── home-channel startup notifications ───────────────────────────────────── + + +@pytest.mark.asyncio +async def test_send_home_channel_startup_notification_to_configured_home(tmp_path, monkeypatch): + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + runner, adapter = make_restart_runner() + runner.config.platforms[Platform.TELEGRAM].home_channel = HomeChannel( + platform=Platform.TELEGRAM, + chat_id="home-42", + name="Ops Home", + ) + adapter.send = AsyncMock() + + delivered = await runner._send_home_channel_startup_notifications() + + assert delivered == {("telegram", "home-42", None)} + adapter.send.assert_called_once_with( + "home-42", + "♻️ Gateway online — Hermes is back and ready.", + ) + + +@pytest.mark.asyncio +async def test_send_home_channel_startup_notification_preserves_thread_metadata( + tmp_path, monkeypatch +): + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + runner, adapter = make_restart_runner() + runner.config.platforms[Platform.TELEGRAM].home_channel = HomeChannel( + platform=Platform.TELEGRAM, + chat_id="parent-42", + name="Ops Topic", + thread_id="topic-7", + ) + adapter.send = AsyncMock(return_value=SendResult(success=True, message_id="home")) + + delivered = await runner._send_home_channel_startup_notifications() + + assert delivered == {("telegram", "parent-42", "topic-7")} + adapter.send.assert_called_once_with( + "parent-42", + "♻️ Gateway online — Hermes is back and ready.", + metadata={"thread_id": "topic-7"}, + ) + + +@pytest.mark.asyncio +async def test_send_home_channel_startup_notification_skips_restart_target( + tmp_path, monkeypatch +): + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + runner, adapter = make_restart_runner() + runner.config.platforms[Platform.TELEGRAM].home_channel = HomeChannel( + platform=Platform.TELEGRAM, + chat_id="42", + name="Ops Home", + ) + adapter.send = AsyncMock() + + delivered = await runner._send_home_channel_startup_notifications( + skip_targets={("telegram", "42", None)} + ) + + assert delivered == set() + adapter.send.assert_not_called() + + +@pytest.mark.asyncio +async def test_send_home_channel_startup_notification_does_not_skip_different_thread( + tmp_path, monkeypatch +): + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + runner, adapter = make_restart_runner() + runner.config.platforms[Platform.TELEGRAM].home_channel = HomeChannel( + platform=Platform.TELEGRAM, + chat_id="42", + name="Ops Home", + ) + adapter.send = AsyncMock(return_value=SendResult(success=True, message_id="home")) + + delivered = await runner._send_home_channel_startup_notifications( + skip_targets={("telegram", "42", "topic-7")} + ) + + assert delivered == {("telegram", "42", None)} + adapter.send.assert_called_once() + + +@pytest.mark.asyncio +async def test_send_home_channel_startup_notification_ignores_false_send_result( + tmp_path, monkeypatch +): + monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path) + + runner, adapter = make_restart_runner() + runner.config.platforms[Platform.TELEGRAM].home_channel = HomeChannel( + platform=Platform.TELEGRAM, + chat_id="home-42", + name="Ops Home", + ) + adapter.send = AsyncMock(return_value=SendResult(success=False, error="network down")) + + delivered = await runner._send_home_channel_startup_notifications() + + assert delivered == set() + adapter.send.assert_called_once() + + # ── _send_restart_notification ─────────────────────────────────────────── @@ -160,8 +354,9 @@ async def test_send_restart_notification_delivers_and_cleans_up(tmp_path, monkey runner, adapter = make_restart_runner() adapter.send = AsyncMock() - await runner._send_restart_notification() + delivered_target = await runner._send_restart_notification() + assert delivered_target == ("telegram", "42", None) adapter.send.assert_called_once() call_args = adapter.send.call_args assert call_args[0][0] == "42" # chat_id @@ -185,8 +380,9 @@ async def test_send_restart_notification_with_thread(tmp_path, monkeypatch): runner, adapter = make_restart_runner() adapter.send = AsyncMock() - await runner._send_restart_notification() + delivered_target = await runner._send_restart_notification() + assert delivered_target == ("telegram", "99", "topic_7") call_args = adapter.send.call_args assert call_args[1]["metadata"] == {"thread_id": "topic_7"} assert not notify_path.exists() @@ -240,9 +436,10 @@ async def test_send_restart_notification_cleans_up_on_send_failure( runner, adapter = make_restart_runner() adapter.send = AsyncMock(side_effect=RuntimeError("network down")) - await runner._send_restart_notification() + delivered_target = await runner._send_restart_notification() # File cleaned up even though send raised. + assert delivered_target is None assert not notify_path.exists() @@ -274,7 +471,7 @@ async def test_send_restart_notification_logs_warning_on_sendresult_failure( ) with caplog.at_level("DEBUG", logger="gateway.run"): - await runner._send_restart_notification() + delivered_target = await runner._send_restart_notification() success_lines = [ r for r in caplog.records @@ -286,6 +483,7 @@ async def test_send_restart_notification_logs_warning_on_sendresult_failure( and "was not delivered" in r.getMessage() and "Chat not found" in r.getMessage() ] + assert delivered_target is None assert not success_lines, ( "Expected no INFO 'Sent restart notification' line when send failed, " f"got: {[r.getMessage() for r in success_lines]}" @@ -317,12 +515,13 @@ async def test_send_restart_notification_logs_info_on_sendresult_success( adapter.send = AsyncMock(return_value=SendResult(success=True, message_id="m-1")) with caplog.at_level("DEBUG", logger="gateway.run"): - await runner._send_restart_notification() + delivered_target = await runner._send_restart_notification() success_lines = [ r for r in caplog.records if r.levelname == "INFO" and "Sent restart notification" in r.getMessage() ] + assert delivered_target == ("telegram", "42", None) assert success_lines, ( "Expected INFO 'Sent restart notification' when send succeeded; " f"got records: {[(r.levelname, r.getMessage()) for r in caplog.records]}" diff --git a/tests/gateway/test_restart_resume_pending.py b/tests/gateway/test_restart_resume_pending.py index bda6c7a412..0b9e7c894d 100644 --- a/tests/gateway/test_restart_resume_pending.py +++ b/tests/gateway/test_restart_resume_pending.py @@ -32,7 +32,8 @@ from unittest.mock import AsyncMock, MagicMock, patch import pytest -from gateway.config import GatewayConfig, Platform, PlatformConfig +from gateway.config import GatewayConfig, HomeChannel, Platform, PlatformConfig +from gateway.platforms.base import SendResult from gateway.run import ( _auto_continue_freshness_window, _coerce_gateway_timestamp, @@ -931,6 +932,84 @@ async def test_restart_banner_uses_try_to_resume_wording(): assert "try to resume" in msg +@pytest.mark.asyncio +async def test_restart_notifies_home_channel_even_without_active_sessions(): + runner, adapter = make_restart_runner() + runner._restart_requested = True + runner.config.platforms[Platform.TELEGRAM].home_channel = HomeChannel( + platform=Platform.TELEGRAM, + chat_id="home-42", + name="Ops Home", + ) + + await runner._notify_active_sessions_of_shutdown() + + assert adapter.sent == [ + "⚠️ Gateway restarting — Your current task will be interrupted. " + "Send any message after restart and I'll try to resume where you left off." + ] + + +@pytest.mark.asyncio +async def test_restart_home_channel_notification_dedupes_active_chat(): + runner, adapter = make_restart_runner() + runner._restart_requested = True + runner._running_agents["agent:main:telegram:dm:999"] = MagicMock() + runner.config.platforms[Platform.TELEGRAM].home_channel = HomeChannel( + platform=Platform.TELEGRAM, + chat_id="999", + name="Ops Home", + ) + + await runner._notify_active_sessions_of_shutdown() + + assert len(adapter.sent) == 1 + + +@pytest.mark.asyncio +async def test_restart_home_channel_notification_not_deduped_across_threads(): + runner, adapter = make_restart_runner() + runner._restart_requested = True + session_key = "agent:main:telegram:group:999" + runner.session_store._entries[session_key] = MagicMock( + origin=SessionSource( + platform=Platform.TELEGRAM, + chat_id="999", + chat_type="group", + user_id="u1", + thread_id="topic-7", + ) + ) + runner._running_agents[session_key] = MagicMock() + runner.config.platforms[Platform.TELEGRAM].home_channel = HomeChannel( + platform=Platform.TELEGRAM, + chat_id="999", + name="Ops Home", + ) + + await runner._notify_active_sessions_of_shutdown() + + assert len(adapter.sent) == 2 + assert adapter.sent_calls[0][2] == {"thread_id": "topic-7"} + assert adapter.sent_calls[1][2] is None + + +@pytest.mark.asyncio +async def test_restart_home_channel_notification_ignores_false_send_result(): + runner, adapter = make_restart_runner() + runner._restart_requested = True + runner.config.platforms[Platform.TELEGRAM].home_channel = HomeChannel( + platform=Platform.TELEGRAM, + chat_id="home-42", + name="Ops Home", + ) + adapter.send = AsyncMock(return_value=SendResult(success=False, error="network down")) + + await runner._notify_active_sessions_of_shutdown() + + adapter.send.assert_called_once() + + # --------------------------------------------------------------------------- # Stuck-loop escalation integration # --------------------------------------------------------------------------- From 3c59566cc5129d85281f518bc446c06fab6ab767 Mon Sep 17 00:00:00 2001 From: teknium1 <127238744+teknium1@users.noreply.github.com> Date: Sun, 3 May 2026 08:16:17 -0700 Subject: [PATCH 55/61] chore(release): map leprincep35700 email for PR #18440 salvage --- scripts/release.py | 1 + 1 file changed, 1 insertion(+) diff --git a/scripts/release.py b/scripts/release.py index 7c84fb9c03..a752ffb98e 100755 --- a/scripts/release.py +++ b/scripts/release.py @@ -68,6 +68,7 @@ AUTHOR_MAP = { "274096618+hermes-agent-dhabibi@users.noreply.github.com": "dhabibi", "dejie.guo@gmail.com": "JayGwod", "maxence@groine.fr": "MaxyMoos", + "61830395+leprincep35700@users.noreply.github.com": "leprincep35700", # OpenViking viking_read salvage (April 2026) "hitesh@gmail.com": "htsh", "pty819@outlook.com": "pty819", From 69dd0f7cf1f4df03e8b8e80aecc906dbd2b22d12 Mon Sep 17 00:00:00 2001 From: JasonOA888 Date: Sun, 3 May 2026 21:49:15 +0800 Subject: [PATCH 56/61] fix(approval): extend sensitive write target to cover shell RC and credential files MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Terminal commands can write to shell RC files (~/.bashrc, ~/.zshrc, ~/.profile) and credential files (~/.netrc, ~/.pgpass, ~/.npmrc, ~/.pypirc) via redirection or tee without triggering approval, even though write_file already blocks these paths in file_safety.py. This creates an inconsistency: write_file protects these paths but terminal shell redirections bypass the same protection. An agent prompted via indirect injection could install persistent backdoors (e.g. PATH manipulation, alias overrides) or write credential entries without user approval. Extend _SENSITIVE_WRITE_TARGET with two new regex groups matching the same paths that file_safety.py's WRITE_DENIED_PATHS already covers: _SHELL_RC_FILES — ~/.bashrc, ~/.zshrc, ~/.profile, ~/.bash_profile, ~/.zprofile _CREDENTIAL_FILES — ~/.netrc, ~/.pgpass, ~/.npmrc, ~/.pypirc All 130 existing tests pass. --- tools/approval.py | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/tools/approval.py b/tools/approval.py index e13c019c0a..4ece3e5be4 100644 --- a/tools/approval.py +++ b/tools/approval.py @@ -94,10 +94,20 @@ _HERMES_ENV_PATH = ( ) _PROJECT_ENV_PATH = r'(?:(?:/|\.{1,2}/)?(?:[^\s/"\'`]+/)*\.env(?:\.[^/\s"\'`]+)*)' _PROJECT_CONFIG_PATH = r'(?:(?:/|\.{1,2}/)?(?:[^\s/"\'`]+/)*config\.yaml)' +_SHELL_RC_FILES = ( + r'(?:~|\$home|\$\{home\})/\.' + r'(?:bashrc|zshrc|profile|bash_profile|zprofile)\b' +) +_CREDENTIAL_FILES = ( + r'(?:~|\$home|\$\{home\})/\.' + r'(?:netrc|pgpass|npmrc|pypirc)\b' +) _SENSITIVE_WRITE_TARGET = ( r'(?:/etc/|/dev/sd|' rf'{_SSH_SENSITIVE_PATH}|' - rf'{_HERMES_ENV_PATH})' + rf'{_HERMES_ENV_PATH}|' + rf'{_SHELL_RC_FILES}|' + rf'{_CREDENTIAL_FILES})' ) _PROJECT_SENSITIVE_WRITE_TARGET = rf'(?:{_PROJECT_ENV_PATH}|{_PROJECT_CONFIG_PATH})' _COMMAND_TAIL = r'(?:\s*(?:&&|\|\||;).*)?$' From 6b4fb9f8789717e5dad8d920cbd6cd02e53c5175 Mon Sep 17 00:00:00 2001 From: Tranquil-Flow Date: Sun, 3 May 2026 09:00:34 +1000 Subject: [PATCH 57/61] fix(cron): treat non-dict origin as missing instead of crashing tick MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ``_resolve_origin`` called ``origin.get('platform')`` on whatever ``job.get('origin')`` returned. The leading ``if not origin: return None`` short-circuited the falsy cases (None, empty dict, "") but a non-empty string passed that guard and then crashed with ``AttributeError: 'str' object has no attribute 'get'`` on every fire attempt. Observed in the wild after a migration script tagged jobs with free-form provenance strings (e.g. ``"combined-digest-replaces-x-and-y-20260503"``). ``mark_job_run`` did record ``last_status: error, last_error: "'str' object has no attribute 'get'"`` once, but the next tick re-loaded the same poisoned origin and crashed identically. The job stayed enabled, fired every tick, and accumulated cascading errors in the log until ``origin`` was patched manually. Replace the falsy guard with ``isinstance(origin, dict)``. Non-dict origins (string, int, list, tuple, float — anything that survived a hand-edit, JSON-script write, or migration) are now treated the same as a missing origin: the job continues with ``deliver`` falling back through its normal home-channel path instead of crashing the scheduler loop. Test parametrises the non-dict shapes that can appear in jobs.json through external writers and asserts ``_resolve_origin`` returns None for each. Note: this fix scope is the non-dict-``origin`` crash only. The ``next_run_at: null`` recurring-job recovery (the second sub-bug in #18722) is independently addressed by the in-flight #18825, which extends the never-silently-disable defense from #16265 to ``get_due_jobs()`` — that approach is well-aligned with the existing recovery pattern and ships fine without a competing change here. Fixes #18722 (non-dict origin crash; recurring-job recovery covered by #18825) --- cron/scheduler.py | 14 ++++++++++++-- tests/cron/test_scheduler.py | 23 +++++++++++++++++++++++ 2 files changed, 35 insertions(+), 2 deletions(-) diff --git a/cron/scheduler.py b/cron/scheduler.py index fafcbfab95..2cb1547ad3 100644 --- a/cron/scheduler.py +++ b/cron/scheduler.py @@ -123,9 +123,19 @@ _LOCK_FILE = _LOCK_DIR / ".tick.lock" def _resolve_origin(job: dict) -> Optional[dict]: - """Extract origin info from a job, preserving any extra routing metadata.""" + """Extract origin info from a job, preserving any extra routing metadata. + + Treats non-dict origins (free-form provenance strings, ints, lists from + migration scripts or hand-edited jobs.json) as missing instead of + crashing with ``AttributeError`` on ``origin.get(...)``. Without this + guard, a job tagged with e.g. ``"combined-digest-replaces-x-and-y"`` + crashed every fire attempt with + ``'str' object has no attribute 'get'`` — ``mark_job_run`` recorded the + failure, but the next tick re-loaded the same poisoned origin and + crashed identically until the field was patched manually (#18722). + """ origin = job.get("origin") - if not origin: + if not isinstance(origin, dict): return None platform = origin.get("platform") chat_id = origin.get("chat_id") diff --git a/tests/cron/test_scheduler.py b/tests/cron/test_scheduler.py index 8c204d9a51..b12bb578a3 100644 --- a/tests/cron/test_scheduler.py +++ b/tests/cron/test_scheduler.py @@ -46,6 +46,29 @@ class TestResolveOrigin: job = {"origin": {}} assert _resolve_origin(job) is None + @pytest.mark.parametrize( + "non_dict_origin", + [ + "combined-digest-replaces-x-and-y-20260503", + 123, + ["telegram", "12345"], + ("platform", "chat_id"), + 42.0, + ], + ) + def test_non_dict_origin_returns_none_instead_of_crashing(self, non_dict_origin): + """Non-dict origins (provenance strings from hand-edited or migrated + jobs.json) must be treated as missing instead of crashing the + scheduler tick on ``origin.get('platform')`` with + ``'str' object has no attribute 'get'`` (#18722). + + Before this guard a job in this state crashed every fire attempt + forever; ``mark_job_run`` recorded the error but the next tick + re-loaded the poisoned origin and crashed identically. + """ + job = {"origin": non_dict_origin} + assert _resolve_origin(job) is None + class TestResolveDeliveryTarget: def test_origin_delivery_preserves_thread_id(self): From e527240b2700cc44f467a00e538f55c99d98eb23 Mon Sep 17 00:00:00 2001 From: Bartok9 Date: Sun, 3 May 2026 03:32:32 -0400 Subject: [PATCH 58/61] fix(tools): write_file handler now rejects missing 'content'/'path' args instead of silently writing zero-byte files (#19096) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Under context pressure, frontier models sometimes emit tool calls with required fields dropped. Previously _handle_write_file() used args.get('content', '') which substituted an empty string for the missing key, returned success with bytes_written=0, and created a zero-byte file on disk. The model had no way to detect the failure. Changes: - Reject calls where 'path' is absent or not a non-empty string - Reject calls where 'content' key is entirely absent (key-presence check, not truthiness) — distinguishing a legitimately empty file from a dropped arg - Reject calls where 'content' is a non-string type - All error messages include guidance to re-emit the tool call or switch to execute_code with hermes_tools.write_file() for large payloads - Explicit empty string content (file truncation) continues to work Regression tests added for all four cases: missing path, missing content, explicit-empty content, and wrong content type. Fixes #19096 --- tests/tools/test_file_tools.py | 38 ++++++++++++++++++++++++++++++++++ tools/file_tools.py | 20 +++++++++++++++++- 2 files changed, 57 insertions(+), 1 deletion(-) diff --git a/tests/tools/test_file_tools.py b/tests/tools/test_file_tools.py index 5a215df14a..0ee0270fdf 100644 --- a/tests/tools/test_file_tools.py +++ b/tests/tools/test_file_tools.py @@ -104,6 +104,44 @@ class TestWriteFileHandler: assert result["error"] == "boom" assert any("write_file error" in r.getMessage() for r in caplog.records) + def test_missing_content_key_returns_error(self): + """#19096 — handler must reject tool calls where 'content' key is absent.""" + from tools.file_tools import _handle_write_file + + result = json.loads(_handle_write_file({"path": "/tmp/oops.md"})) + assert "error" in result + assert "content" in result["error"] + assert "path" not in result.get("error", "").lower() or "missing" not in result.get("error", "").lower() or True # just check error present + + def test_missing_path_key_returns_error(self): + """#19096 — handler must reject tool calls where 'path' key is absent.""" + from tools.file_tools import _handle_write_file + + result = json.loads(_handle_write_file({"content": "hello"})) + assert "error" in result + + def test_explicit_empty_content_is_allowed(self): + """#19096 — explicit empty string content (file truncation) must still work.""" + from tools.file_tools import _handle_write_file + + with patch("tools.file_tools._get_file_ops") as mock_get: + mock_ops = MagicMock() + result_obj = MagicMock() + result_obj.to_dict.return_value = {"status": "ok", "path": "/tmp/empty.txt", "bytes": 0} + mock_ops.write_file.return_value = result_obj + mock_get.return_value = mock_ops + + result = json.loads(_handle_write_file({"path": "/tmp/empty.txt", "content": ""})) + assert result["status"] == "ok" + + def test_non_string_content_returns_error(self): + """#19096 — content must be a string, not a dict or list.""" + from tools.file_tools import _handle_write_file + + result = json.loads(_handle_write_file({"path": "/tmp/x.txt", "content": {"nested": "dict"}})) + assert "error" in result + assert "string" in result["error"].lower() or "content" in result["error"].lower() + class TestPatchHandler: @patch("tools.file_tools._get_file_ops") diff --git a/tools/file_tools.py b/tools/file_tools.py index 7a7f092954..a4187b6aa9 100644 --- a/tools/file_tools.py +++ b/tools/file_tools.py @@ -1097,7 +1097,25 @@ def _handle_read_file(args, **kw): def _handle_write_file(args, **kw): tid = kw.get("task_id") or "default" - return write_file_tool(path=args.get("path", ""), content=args.get("content", ""), task_id=tid) + if not args.get("path") or not isinstance(args.get("path"), str): + return tool_error( + "write_file: missing required field 'path'. Re-emit the tool call with " + "both 'path' and 'content' set." + ) + if "content" not in args: + return tool_error( + "write_file: missing required field 'content'. The tool call included a " + "path but no content argument — this is almost always a dropped-arg bug " + "under context pressure. Re-emit the tool call with the full content " + "payload, or use execute_code with hermes_tools.write_file() for very " + "large files." + ) + if not isinstance(args["content"], str): + return tool_error( + f"write_file: 'content' must be a string, got " + f"{type(args['content']).__name__}." + ) + return write_file_tool(path=args["path"], content=args["content"], task_id=tid) def _handle_patch(args, **kw): From 279b656adc3c64db7529fa85bbd744f1aa28cfbe Mon Sep 17 00:00:00 2001 From: Brooklyn Nicholson Date: Sun, 3 May 2026 12:11:24 -0500 Subject: [PATCH 59/61] fix(tui): clear Apple Terminal resize artifacts Use a deeper alt-screen clear for Apple Terminal resize repaints so host reflow artifacts do not survive the recovery frame. --- ui-tui/packages/hermes-ink/src/ink/ink.tsx | 30 +++++++++++++------ .../hermes-ink/src/ink/terminal.test.ts | 15 ++++++++++ .../packages/hermes-ink/src/ink/terminal.ts | 4 +++ 3 files changed, 40 insertions(+), 9 deletions(-) create mode 100644 ui-tui/packages/hermes-ink/src/ink/terminal.test.ts diff --git a/ui-tui/packages/hermes-ink/src/ink/ink.tsx b/ui-tui/packages/hermes-ink/src/ink/ink.tsx index fec8b8ad04..c4669847e6 100644 --- a/ui-tui/packages/hermes-ink/src/ink/ink.tsx +++ b/ui-tui/packages/hermes-ink/src/ink/ink.tsx @@ -73,7 +73,13 @@ import { startSelection, updateSelection } from './selection.js' -import { supportsExtendedKeys, SYNC_OUTPUT_SUPPORTED, type Terminal, writeDiffToTerminal } from './terminal.js' +import { + needsAltScreenResizeScrollbackClear, + supportsExtendedKeys, + SYNC_OUTPUT_SUPPORTED, + type Terminal, + writeDiffToTerminal +} from './terminal.js' import { CURSOR_HOME, cursorMove, @@ -82,7 +88,8 @@ import { DISABLE_MODIFY_OTHER_KEYS, ENABLE_KITTY_KEYBOARD, ENABLE_MODIFY_OTHER_KEYS, - ERASE_SCREEN + ERASE_SCREEN, + ERASE_SCROLLBACK } from './termio/csi.js' import { DBP, @@ -121,6 +128,11 @@ const ERASE_THEN_HOME_PATCH = Object.freeze({ content: ERASE_SCREEN + CURSOR_HOME }) +const DEEP_ERASE_THEN_HOME_PATCH = Object.freeze({ + type: 'stdout' as const, + content: ERASE_SCREEN + ERASE_SCROLLBACK + CURSOR_HOME +}) + // Cached per-Ink-instance, invalidated on resize. frame.cursor.y for // alt-screen is always terminalRows - 1 (renderer.ts). function makeAltScreenParkPatch(terminalRows: number) { @@ -863,17 +875,17 @@ export default class Ink { // position independently. Parking at bottom (not 0,0) keeps the guide // where the user's attention is. // - // After resize, prepend ERASE_SCREEN too. The diff only writes cells + // After resize, prepend a clear too. The diff only writes cells // that changed; cells where new=blank and prev-buffer=blank get skipped // — but the physical terminal still has stale content there (shorter - // lines at new width leave old-width text tails visible). ERASE inside - // BSU/ESU is atomic: old content stays visible until the whole - // erase+paint lands, then swaps in one go. Writing ERASE_SCREEN - // synchronously in handleResize would blank the screen for the ~80ms - // render() takes. + // lines at new width leave old-width text tails visible). Apple Terminal + // can also preserve alt-screen reflow artifacts in scrollback during + // resize, so it gets CSI 3J in this one recovery path. When BSU/ESU is + // supported, the clear+paint lands atomically; otherwise the final state + // is still healed even if the repaint is visible. if (this.needsEraseBeforePaint) { this.needsEraseBeforePaint = false - optimized.unshift(ERASE_THEN_HOME_PATCH) + optimized.unshift(needsAltScreenResizeScrollbackClear() ? DEEP_ERASE_THEN_HOME_PATCH : ERASE_THEN_HOME_PATCH) } else { optimized.unshift(CURSOR_HOME_PATCH) } diff --git a/ui-tui/packages/hermes-ink/src/ink/terminal.test.ts b/ui-tui/packages/hermes-ink/src/ink/terminal.test.ts new file mode 100644 index 0000000000..6c4f117f92 --- /dev/null +++ b/ui-tui/packages/hermes-ink/src/ink/terminal.test.ts @@ -0,0 +1,15 @@ +import { describe, expect, it } from 'vitest' + +import { needsAltScreenResizeScrollbackClear } from './terminal.js' + +describe('terminal resize quirks', () => { + it('uses a deeper alt-screen resize clear for Apple Terminal', () => { + expect(needsAltScreenResizeScrollbackClear({ TERM_PROGRAM: 'Apple_Terminal' })).toBe(true) + expect(needsAltScreenResizeScrollbackClear({ TERM_PROGRAM: ' Apple_Terminal ' })).toBe(true) + }) + + it('keeps the normal resize repaint path for modern terminals', () => { + expect(needsAltScreenResizeScrollbackClear({ TERM_PROGRAM: 'vscode' })).toBe(false) + expect(needsAltScreenResizeScrollbackClear({ TERM_PROGRAM: 'iTerm.app' })).toBe(false) + }) +}) diff --git a/ui-tui/packages/hermes-ink/src/ink/terminal.ts b/ui-tui/packages/hermes-ink/src/ink/terminal.ts index a0aaa0beac..16e30e5e35 100644 --- a/ui-tui/packages/hermes-ink/src/ink/terminal.ts +++ b/ui-tui/packages/hermes-ink/src/ink/terminal.ts @@ -168,6 +168,10 @@ export function isXtermJs(): boolean { return xtversionName?.startsWith('xterm.js') ?? false } +export function needsAltScreenResizeScrollbackClear(env: NodeJS.ProcessEnv = process.env): boolean { + return (env.TERM_PROGRAM ?? '').trim() === 'Apple_Terminal' +} + // Terminals known to correctly implement the Kitty keyboard protocol // (CSI >1u) and/or xterm modifyOtherKeys (CSI >4;2m) for ctrl+shift+ // disambiguation. We previously enabled unconditionally (#23350), assuming From 511add724987eeb03c10a69ae17d0b0e93765d1f Mon Sep 17 00:00:00 2001 From: SHL0MS Date: Sun, 3 May 2026 11:40:34 -0400 Subject: [PATCH 60/61] feat(skill): add video-orchestrator optional creative skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Meta-pipeline that wraps any video request — narrative film, product / marketing, music video, explainer, ASCII, generative, comic, 3D, real-time/installation — in a Hermes Kanban pipeline. Performs adaptive discovery, designs an appropriate team for the requested style, generates the setup script that creates Hermes profiles + initial kanban task, and helps monitor execution. Routes scenes to whichever existing Hermes skill fits each beat (`ascii-video`, `manim-video`, `p5js`, `comfyui`, `touchdesigner-mcp`, `blender-mcp`, `pixel-art`, `baoyu-comic`, `claude-design`, `excalidraw`, `songsee`, `heartmula`, …) plus external APIs for TTS, image-gen, and image-to-video. Kanban orchestration uses the `kanban-orchestrator` and `kanban-worker` skills. The single-project workspace layout, profile-config patching pattern, SOUL.md-per-profile model, and `--workspace dir:` discipline are adapted from alt-glitch's original kanban-video-pipeline at https://github.com/NousResearch/kanban-video-pipeline. This skill generalizes those patterns across video styles and replaces the original string-replacement config patcher with a PyYAML-based one that touches only `toolsets` and `skills.always_load` (preserving security-sensitive fields like `approvals.mode`). Includes: - SKILL.md — workflow + critical rules - references/ — intake, role archetypes, tool matrix, kanban setup, monitoring, six worked examples - assets/ — brief / setup.sh / soul.md templates - scripts/ — bootstrap_pipeline.py (plan.json -> setup.sh) and monitor.py (poll + issue detection) Co-authored-by: alt-glitch --- .../creative/video-orchestrator/SKILL.md | 206 +++++++ .../video-orchestrator/assets/brief.md.tmpl | 79 +++ .../video-orchestrator/assets/setup.sh.tmpl | 185 +++++++ .../video-orchestrator/assets/soul.md.tmpl | 38 ++ .../video-orchestrator/references/examples.md | 227 ++++++++ .../video-orchestrator/references/intake.md | 166 ++++++ .../references/kanban-setup.md | 276 ++++++++++ .../references/monitoring.md | 180 +++++++ .../references/role-archetypes.md | 298 +++++++++++ .../references/tool-matrix.md | 305 +++++++++++ .../scripts/bootstrap_pipeline.py | 501 ++++++++++++++++++ .../video-orchestrator/scripts/monitor.py | 195 +++++++ 12 files changed, 2656 insertions(+) create mode 100644 optional-skills/creative/video-orchestrator/SKILL.md create mode 100644 optional-skills/creative/video-orchestrator/assets/brief.md.tmpl create mode 100644 optional-skills/creative/video-orchestrator/assets/setup.sh.tmpl create mode 100644 optional-skills/creative/video-orchestrator/assets/soul.md.tmpl create mode 100644 optional-skills/creative/video-orchestrator/references/examples.md create mode 100644 optional-skills/creative/video-orchestrator/references/intake.md create mode 100644 optional-skills/creative/video-orchestrator/references/kanban-setup.md create mode 100644 optional-skills/creative/video-orchestrator/references/monitoring.md create mode 100644 optional-skills/creative/video-orchestrator/references/role-archetypes.md create mode 100644 optional-skills/creative/video-orchestrator/references/tool-matrix.md create mode 100755 optional-skills/creative/video-orchestrator/scripts/bootstrap_pipeline.py create mode 100755 optional-skills/creative/video-orchestrator/scripts/monitor.py diff --git a/optional-skills/creative/video-orchestrator/SKILL.md b/optional-skills/creative/video-orchestrator/SKILL.md new file mode 100644 index 0000000000..53b75cda4a --- /dev/null +++ b/optional-skills/creative/video-orchestrator/SKILL.md @@ -0,0 +1,206 @@ +--- +name: video-orchestrator +description: Plan, set up, and monitor a multi-agent video production pipeline backed by Hermes Kanban. Use when the user wants to make ANY video — narrative film, product/marketing, music video, explainer, ASCII/terminal art, abstract/generative loop, comic, 3D, real-time/installation — and the work warrants decomposition into specialized profiles (writer, designer, animator, renderer, voice, editor, etc.) coordinated through a kanban board. Performs adaptive discovery to scope the brief, designs an appropriate team for the requested style, generates the setup script that creates Hermes profiles + initial kanban task, then helps monitor execution and intervene when tasks stall or fail. Routes scenes to whichever Hermes rendering / audio / design skill fits each beat (`ascii-video`, `manim-video`, `p5js`, `comfyui`, `touchdesigner-mcp`, `blender-mcp`, `pixel-art`, `baoyu-comic`, `claude-design`, `excalidraw`, `songsee`, `heartmula`, …) plus external APIs for TTS, image-gen, and image-to-video as needed. +version: 1.0.0 +author: [SHL0MS, alt-glitch] +license: MIT +metadata: + hermes: + tags: [video, kanban, multi-agent, orchestration, production-pipeline] + related_skills: [kanban-orchestrator, kanban-worker, ascii-video, manim-video, p5js, comfyui, touchdesigner-mcp, blender-mcp, pixel-art, ascii-art, songwriting-and-ai-music, heartmula, songsee, spotify, youtube-content, claude-design, excalidraw, architecture-diagram, concept-diagrams, baoyu-comic, baoyu-infographic, humanizer, gif-search, meme-generation] + credits: | + The single-project workspace layout, profile-config patching pattern, + SOUL.md-per-profile model, TEAM.md task-graph convention, and + `--workspace dir:` discipline are adapted from alt-glitch's + original multi-agent video pipeline at + https://github.com/NousResearch/kanban-video-pipeline. +--- + +# Video Orchestrator + +Wrap any video request — from a 15-second product teaser to a 5-minute narrative +short to a music video to an ASCII loop — in a Hermes Kanban pipeline that +decomposes the work to specialized agent profiles. + +This skill does **not** render anything itself. It is a meta-pipeline that: + +1. **Scopes** the request through targeted discovery +2. **Designs** an appropriate team (which roles, which tools per role) based on the style +3. **Generates** a setup script that creates Hermes profiles, project workspace, and the initial kanban task +4. **Hands off** to the director profile, which decomposes via the kanban +5. **Monitors** execution, helps intervene when tasks stall or fail + +The actual rendering happens inside the kanban once it's running, via whichever +existing skills + tools fit the scenes — `ascii-video`, `manim-video`, `p5js`, +`comfyui`, `touchdesigner-mcp`, `blender-mcp`, `songwriting-and-ai-music`, +`heartmula`, external APIs, or plain Python with PIL + ffmpeg. + +## When NOT to use this skill + +- The video is one continuous procedural project that needs no specialists. Just write the code directly. +- The user wants a quick one-shot conversion (e.g. "convert this mp4 to a GIF") — use ffmpeg directly. +- The output is a static image, GIF, or audio-only artifact — use the matching specific skill (`ascii-art`, `gifs`, `meme-generation`, `songwriting-and-ai-music`). +- The work fits a single existing skill cleanly (e.g. a pure ASCII video — just use `ascii-video`). + +## Workflow + +``` +DISCOVER → BRIEF → TEAM DESIGN → SETUP → EXECUTE → MONITOR +``` + +### Step 1 — Discover (ask the right questions) + +The discovery process is **adaptive**: ask only what is actually needed. Always +start with three questions to identify the broad shape: + +- **What is the video?** (one-sentence brief) +- **How long?** (5-30s teaser / 30-90s short / 90s-3min explainer / 3-10min film / longer) +- **What aspect ratio + target platform?** (1:1 / 9:16 / 16:9; X, IG, YouTube, internal, etc.) + +From the answer, classify the style category. The style determines which +follow-up questions to ask. **Do not ask all questions at once.** Ask 2-4 at a +time, listen, then proceed. Make reasonable assumptions whenever the user +implies an answer. + +For complete intake patterns and per-style question banks, see +**[references/intake.md](references/intake.md)**. + +### Step 2 — Brief + +Once enough is known, produce a structured `brief.md` using the template in +`assets/brief.md.tmpl`. Stages: + +1. **Concept** — the one-sentence pitch + emotional north star +2. **Scope** — duration, aspect, platform, deadline +3. **Style** — visual references, brand constraints, tone +4. **Scenes** — beat-by-beat breakdown (durations, content, target tool) +5. **Audio** — narration / music / SFX / silent (per scene if needed) +6. **Deliverables** — file format, resolution, optional alternates (vertical cut, GIF, etc.) + +Show the brief to the user for confirmation before designing the team. **The +brief is the contract** — every downstream task references it. + +### Step 3 — Team design + +Pick role archetypes from the library that fit this video. **Compose, don't +clone.** Most videos need 4-7 profiles. The director is always present; the +rest are picked by what the brief actually requires. + +For the role library and per-style team compositions, see +**[references/role-archetypes.md](references/role-archetypes.md)**. + +For mapping role → which Hermes skills + toolsets it loads, see +**[references/tool-matrix.md](references/tool-matrix.md)**. + +### Step 4 — Setup + +Generate a setup script (`setup.sh`) and run it. The script: + +1. Creates the project workspace (`~/projects/video-pipeline//`) +2. Copies any provided assets into `taste/`, `audio/`, `assets/` +3. Creates each Hermes profile via `hermes profile create --clone` +4. Writes per-profile `SOUL.md` (personality + role definition) +5. Configures profile YAML (toolsets, always_load skills, cwd) +6. Writes `brief.md`, `TEAM.md`, and `taste/` content +7. Fires the initial `hermes kanban create` task assigned to the director + +Use `scripts/bootstrap_pipeline.py` to generate setup.sh from a brief + +team-design JSON. See **[references/kanban-setup.md](references/kanban-setup.md)** +for the setup script structure, profile config patterns, and the critical +"shared workspace" rule. + +### Step 5 — Execute + +Run `setup.sh`. Then provide the user with monitoring commands: + +```bash +hermes kanban watch --tenant # live events +hermes kanban list --tenant # board snapshot +hermes dashboard # visual board UI +``` + +The director profile takes over from here, decomposing the work and routing +tasks to specialist profiles via the kanban toolset. + +### Step 6 — Monitor and intervene + +Stay engaged — the kanban runs autonomously but a stuck task or bad output +needs human (or AI) judgment. + +Monitoring patterns: poll `kanban list` periodically, inspect any RUNNING task +that exceeds its expected duration with `kanban show `, and check +heartbeats. When a worker's output fails review, the standard interventions are: + +1. Comment on the worker's task with specific feedback (`kanban_comment`) +2. Create a re-run task with the original as parent +3. Adjust the brief's scope and let the director re-decompose + +For diagnostic patterns, intervention recipes, and the "task is stuck" +playbook, see **[references/monitoring.md](references/monitoring.md)**. + +## Reference: worked examples + +Six concrete pipelines covering very different video styles — narrative film, +product/marketing, music video, math/algorithm explainer, ASCII video, real-time +installation — showing how the same workflow yields very different teams and +task graphs. See **[references/examples.md](references/examples.md)**. + +## Critical rules + +1. **Discovery before action.** Never start generating a brief or team without + asking at least the three baseline questions. A bad brief cascades through + the entire pipeline. + +2. **Match the team to the video.** Don't reuse the same 4-profile setup for + every job. A music video that doesn't have a beat-analysis profile will + misfire. A narrative film that doesn't have a writer profile will produce + incoherent scenes. See `references/role-archetypes.md`. + +3. **One workspace per project.** All profiles for a given video share the same + `dir:` workspace. Tasks pass artifacts via shared filesystem and structured + handoffs. **Every** `kanban_create` call passes + `workspace_kind="dir"` + `workspace_path=""`. + +4. **Tenant every project.** Use a project-specific tenant + (`--tenant `). Keeps the dashboard scoped and prevents + cross-pollination with other ongoing kanbans. + +5. **Respect existing skills.** When a scene fits an existing skill, the + relevant renderer should load that skill via `--skill ` on its task + or `always_load` in its profile. Do not re-derive what a skill already + provides. + +6. **The director never executes.** Even with the full `kanban + terminal + + file` toolset, the director's `SOUL.md` rules forbid it from executing + work itself. It decomposes and routes only — every concrete task becomes + a `hermes kanban create` call to a specialist profile. The + `kanban-orchestrator` skill spells this out further. + +7. **Don't over-decompose.** A 30-second product video does NOT need 20 tasks. + Aim for the smallest task graph that still parallelizes well and exposes the + right human-review gates. + +8. **Verify API keys BEFORE firing.** External APIs (TTS, image-gen, + image-to-video) need keys in `~/.hermes/.env` or the user's secret store. + A worker that hits a missing-key error wastes a task slot. The setup + script's `check_key` helper aborts cleanly if a required key is missing. + +## File map + +``` +SKILL.md ← this file (workflow + rules) +references/ + intake.md ← discovery question banks per style + role-archetypes.md ← role library (writer, designer, animator, …) + tool-matrix.md ← skill + toolset mapping per role + kanban-setup.md ← setup script structure & profile config + monitoring.md ← watch + intervene patterns + examples.md ← six worked pipelines +assets/ + brief.md.tmpl ← brief skeleton + setup.sh.tmpl ← setup script skeleton + soul.md.tmpl ← profile personality skeleton +scripts/ + bootstrap_pipeline.py ← generate setup.sh from brief + team JSON + monitor.py ← polling + intervention helpers +``` diff --git a/optional-skills/creative/video-orchestrator/assets/brief.md.tmpl b/optional-skills/creative/video-orchestrator/assets/brief.md.tmpl new file mode 100644 index 0000000000..fbe8d8cbfb --- /dev/null +++ b/optional-skills/creative/video-orchestrator/assets/brief.md.tmpl @@ -0,0 +1,79 @@ +# Video Brief — {{TITLE}} + +> Slug: `{{SLUG}}` · Tenant: `{{TENANT}}` · Project workspace: `{{WORKSPACE}}` + +## 1. Concept + +**One-line pitch.** {{ONE_LINE_PITCH}} + +**Emotional north star.** {{EMOTIONAL_NORTH_STAR}} +*(What should the viewer feel walking away?)* + +## 2. Scope + +| | | +|---|---| +| Duration | {{DURATION_S}} seconds | +| Aspect ratio | {{ASPECT}} | +| Resolution | {{RESOLUTION}} | +| Frame rate | {{FPS}} fps | +| Target platforms | {{PLATFORMS}} | +| Deadline | {{DEADLINE}} | +| Quality bar | {{QUALITY_BAR}} *(rough draft / polished / archival)* | + +## 3. Style + +**Visual references.** {{VISUAL_REFS}} + +**Tone.** {{TONE}} + +**Brand constraints.** {{BRAND_CONSTRAINTS}} +*(colors, typography, motion language; or "n/a")* + +**Aesthetic rules.** +{{AESTHETIC_RULES}} + +## 4. Scenes + +Beat-by-beat breakdown. Each scene gets a row. + +| # | Time | Content | Target tool / skill | Audio | Notes | +|---|------|---------|---------------------|-------|-------| +| 1 | 0:00–0:0X | {{SCENE_1_CONTENT}} | {{SCENE_1_TOOL}} | {{SCENE_1_AUDIO}} | {{SCENE_1_NOTES}} | +| 2 | 0:0X–0:0Y | ... | ... | ... | ... | + +## 5. Audio + +**Approach.** {{AUDIO_APPROACH}} +*(narration / music-only / synced to track / silent / mixed)* + +**Voiceover.** {{VO_DETAILS}} +*(provider, voice, language, script source — "n/a" if no VO)* + +**Music.** {{MUSIC_DETAILS}} +*(provided track path / commission via Suno / commission via heartmula / +license-free / "n/a")* + +**SFX.** {{SFX_DETAILS}} +*(generated, library, or "n/a")* + +## 6. Deliverables + +| Format | Resolution | Notes | +|--------|-----------|-------| +| {{PRIMARY_FORMAT}} | {{PRIMARY_RES}} | The main output | +| {{ALT_FORMAT_1}} | {{ALT_RES_1}} | {{ALT_NOTES_1}} | + +**Final filename.** `output/final.mp4` +*(plus optional `output/final-9x16.mp4`, `output/captions.srt`, etc.)* + +## 7. Constraints + +- API keys required: {{API_KEYS_REQUIRED}} +- External dependencies: {{EXT_DEPS}} +- Source assets to incorporate: {{SOURCE_ASSETS}} + +--- + +**This brief is the contract. The director and every downstream profile read +it. If the brief changes, the kanban must be re-fired — don't edit live.** diff --git a/optional-skills/creative/video-orchestrator/assets/setup.sh.tmpl b/optional-skills/creative/video-orchestrator/assets/setup.sh.tmpl new file mode 100644 index 0000000000..bab87ea972 --- /dev/null +++ b/optional-skills/creative/video-orchestrator/assets/setup.sh.tmpl @@ -0,0 +1,185 @@ +#!/usr/bin/env bash +# ═══════════════════════════════════════════════════════════════════════ +# Video Pipeline Setup — {{TITLE}} +# +# Generated by video-orchestrator skill. +# +# Slug: {{SLUG}} +# Workspace: {{WORKSPACE}} +# Tenant: {{TENANT}} +# ═══════════════════════════════════════════════════════════════════════ +set -euo pipefail + +PROJECT_SLUG="{{SLUG}}" +WORKSPACE="$HOME/projects/video-pipeline/${PROJECT_SLUG}" +TENANT="{{TENANT}}" + +# ───────────────────────────────────────────────────────────────────── +# 1. Verify required API keys +# ───────────────────────────────────────────────────────────────────── +echo "═══ Checking required API keys ═══" + +check_key() { + local var="$1" + local kc_account="${2:-hermes}" + local kc_service="${3:-$1}" + if grep -q "^${var}=" "$HOME/.hermes/.env" 2>/dev/null && \ + [ -n "$(grep "^${var}=" "$HOME/.hermes/.env" | cut -d= -f2-)" ]; then + echo " ✓ ${var} (env)" + return 0 + fi + if command -v security >/dev/null 2>&1 && \ + security find-generic-password -a "${kc_account}" -s "${kc_service}" -w >/dev/null 2>&1; then + echo " ✓ ${var} (Keychain ${kc_account}/${kc_service})" + return 0 + fi + echo " ✗ ${var} not set in ~/.hermes/.env or Keychain (${kc_account}/${kc_service})" + return 1 +} + +# Customize this list per project — only check keys actually used: +{{KEY_CHECKS}} + +# ───────────────────────────────────────────────────────────────────── +# 2. Create project workspace +# ───────────────────────────────────────────────────────────────────── +echo "═══ Creating project workspace ═══" +mkdir -p "$WORKSPACE"/{taste,audio/{voiceover,sfx},assets,scenes,checkpoints,tools,output} +{{SCENE_DIRS}} +echo " ✓ $WORKSPACE" + +# ───────────────────────────────────────────────────────────────────── +# 3. Create Hermes profiles +# ───────────────────────────────────────────────────────────────────── +echo "═══ Creating Hermes profiles ═══" + +{{PROFILE_CREATE_COMMANDS}} + +# ───────────────────────────────────────────────────────────────────── +# 4. Configure profiles (toolsets, skills, cwd) +# ───────────────────────────────────────────────────────────────────── +echo "═══ Configuring profiles ═══" + +configure_profile() { + local profile="$1" + local toolsets_json="$2" # JSON array string, e.g. '["kanban","terminal","file"]' + local skills_json="$3" # JSON array string, e.g. '["kanban-worker","ascii-video"]' + python3 - "$profile" "$toolsets_json" "$skills_json" "$WORKSPACE" <<'PY' +"""Patch a Hermes profile config.yaml using PyYAML so we don't depend on the +exact default-config string format. Validates the patch took effect and exits +non-zero if anything's off.""" +import json +import os +import sys + +try: + import yaml +except ImportError: + print("ERROR: PyYAML required. pip install pyyaml", file=sys.stderr) + sys.exit(1) + +profile, toolsets_json, skills_json, workspace = sys.argv[1:5] +toolsets = json.loads(toolsets_json) +skills = json.loads(skills_json) + +p = os.path.expanduser(f"~/.hermes/profiles/{profile}/config.yaml") +if not os.path.exists(p): + print(f" ✗ profile config not found: {p}", file=sys.stderr) + sys.exit(1) + +with open(p) as f: + cfg = yaml.safe_load(f) or {} + +# Apply our changes — only the keys we actually want to set. +cfg["toolsets"] = toolsets +cfg.setdefault("skills", {}) +cfg["skills"]["always_load"] = skills + +# Note: we do NOT touch cfg["approvals"] — that's a security-sensitive +# setting (manual confirmation of tool calls). Workspace cwd is overridden +# per-task by `--workspace dir:` on `hermes kanban create`, so we +# don't need to mutate cfg["terminal"]["cwd"] either. + +with open(p, "w") as f: + yaml.safe_dump(cfg, f, sort_keys=False) + +# Validate +with open(p) as f: + after = yaml.safe_load(f) +errors = [] +if after.get("toolsets") != toolsets: + errors.append(f"toolsets mismatch: {after.get('toolsets')!r}") +if after.get("skills", {}).get("always_load") != skills: + errors.append(f"skills.always_load mismatch: {after.get('skills', {}).get('always_load')!r}") +if errors: + print(f" ✗ {profile}: " + "; ".join(errors), file=sys.stderr) + sys.exit(1) +PY + if [ $? -ne 0 ]; then + echo " ✗ failed to configure ${profile}" >&2 + exit 1 + fi + echo " ✓ ${profile}" +} + +{{PROFILE_CONFIG_COMMANDS}} + +# ───────────────────────────────────────────────────────────────────── +# 5. Write SOUL.md per profile +# ───────────────────────────────────────────────────────────────────── +echo "═══ Writing profile personalities ═══" + +{{SOUL_WRITES}} + +# ───────────────────────────────────────────────────────────────────── +# 6. Copy brief, TEAM.md, and any provided assets +# ───────────────────────────────────────────────────────────────────── +echo "═══ Writing brief + taste ═══" + +cat > "$WORKSPACE/brief.md" <<'BRIEF_EOF' +{{BRIEF_CONTENTS}} +BRIEF_EOF + +cat > "$WORKSPACE/TEAM.md" <<'TEAM_EOF' +{{TEAM_CONTENTS}} +TEAM_EOF + +{{TASTE_WRITES}} + +{{ASSET_COPIES}} + +# ───────────────────────────────────────────────────────────────────── +# 7. Fire the initial kanban task +# ───────────────────────────────────────────────────────────────────── +echo "═══ Firing initial kanban task ═══" + +hermes kanban create "Direct production of {{TITLE}}" \ + --assignee director \ + --workspace dir:"$WORKSPACE" \ + --tenant "$TENANT" \ + --priority 2 \ + --max-runtime 4h \ + --body "$(cat < **Credit:** the single-project-workspace layout, profile-config patching +> approach, SOUL.md-per-profile convention, and `--workspace dir:` rule +> are adapted from alt-glitch's original multi-agent video pipeline: +> [NousResearch/kanban-video-pipeline](https://github.com/NousResearch/kanban-video-pipeline). +> This skill generalizes those patterns across video styles and replaces the +> string-replacement config patcher with a PyYAML-based one. + +## Project workspace structure + +Every video project gets one workspace under `~/projects/video-pipeline//`: + +``` +~/projects/video-pipeline// +├── brief.md ← the contract; all tasks reference +├── TEAM.md ← team composition + task graph (director reads this) +├── taste/ +│ ├── brand-guide.md ← color, typography, motion rules +│ ├── emotional-dna.md ← what the piece should FEEL like +│ └── style-frames/ ← optional: visual references +├── audio/ +│ ├── track.mp3 ← provided music (if any) +│ ├── voiceover/ ← per-line TTS clips +│ └── sfx/ ← sound effects +├── assets/ +│ ├── logos/ +│ ├── fonts/ +│ └── existing-footage/ ← reusable provided clips +├── scenes/ +│ ├── scene-01/ +│ │ ├── VISUAL_SPEC.md ← cinematographer's per-scene spec +│ │ ├── render.py ← renderer's code (or sketch.html, etc.) +│ │ ├── checkpoints/ ← preview frames for QA +│ │ └── clip.mp4 ← the deliverable for this scene +│ ├── scene-02/... +│ └── ... +├── checkpoints/ ← global review frames +├── tools/ ← optional project-local helpers +└── output/ + ├── final.mp4 ← stitched + audio + ├── final-noaudio.mp4 + ├── final-9x16.mp4 ← optional: vertical alternate + └── captions.srt ← optional: subtitle file +``` + +**The slug** is derived from the brief title: lowercase, hyphen-separated. +Example: `q3-product-teaser`, `ascii-mood-loop`, `interview-cut-2026-q1`. + +## The setup.sh script + +The setup script does six things in order: + +1. **Create workspace tree** — all directories above +2. **Create profiles** — `hermes profile create --clone` +3. **Configure profiles** — patch each profile's + `~/.hermes/profiles//config.yaml` to set toolsets, always_load skills, + and `cwd` +4. **Write SOUL.md per profile** — the personality + role definition +5. **Copy any provided assets + write `brief.md`, `TEAM.md`, and `taste/`** +6. **Fire the initial kanban task** — `hermes kanban create` assigned to the director + +See `assets/setup.sh.tmpl` for the skeleton. + +### Profile creation pattern + +```bash +hermes profile create director --clone 2>/dev/null || true +``` + +The `--clone` flag clones from the active profile (preserving model, base +config). The `|| true` makes the script idempotent — re-running won't error if +the profile already exists. + +### Profile config patching + +Each profile has a YAML config at `~/.hermes/profiles//config.yaml`. The +setup script edits exactly two keys: + +1. `toolsets:` — replace the default with the role's required toolsets +2. `skills.always_load:` — list the role's must-load skills (may be empty) + +**Do NOT** modify `approvals.mode` (controls user-confirmation of tool calls +— a security setting that must stay as the user configured it). **Do NOT** +modify `terminal.cwd` — the kanban dispatcher overrides cwd per-task via +`--workspace dir:`, so the profile's cwd is irrelevant to the kanban +work and changing it could break the user's interactive use of the profile. + +Use **PyYAML**, not string replacement, so the patch is robust against +default-config schema drift: + +```bash +configure_profile() { + local profile="$1" + local toolsets_json="$2" # JSON array, e.g. '["kanban","terminal","file"]' + local skills_json="$3" # JSON array, e.g. '["kanban-worker","ascii-video"]' + python3 - "$profile" "$toolsets_json" "$skills_json" <<'PY' +import json, os, sys, yaml +profile, ts_json, sk_json = sys.argv[1:4] +p = os.path.expanduser(f"~/.hermes/profiles/{profile}/config.yaml") +with open(p) as f: + cfg = yaml.safe_load(f) or {} +cfg["toolsets"] = json.loads(ts_json) +cfg.setdefault("skills", {})["always_load"] = json.loads(sk_json) +with open(p, "w") as f: + yaml.safe_dump(cfg, f, sort_keys=False) +PY +} +``` + +PyYAML must be installed in the user's Python (it ships with most Hermes +installs). If absent: `pip install pyyaml`. + +The setup script should also **validate** the patch by re-reading the file +and comparing — see `assets/setup.sh.tmpl` for the validation pattern. + +### SOUL.md per profile + +Each profile gets a `SOUL.md` at `~/.hermes/profiles//SOUL.md` that +defines its role, voice, and rules. See `assets/soul.md.tmpl` for the +template. Customize per role and per project. + +The director's SOUL.md should be the most opinionated — its voice flavors +the entire production. **Critical content for the director's SOUL.md:** + +- **Anti-temptation rules:** "Do not execute the work yourself. For every + concrete task, create a kanban task and assign it. Decompose, route, comment, + approve — that's the whole job." (The `kanban-orchestrator` skill provides + the deeper playbook; load it.) +- **Decomposition steps:** Read `brief.md`, `TEAM.md`, `taste/`. Use the team + graph in `TEAM.md` to fan out tasks. +- **The workspace_path rule** (see below). + +Other profiles' SOUL.md is briefer; mostly mechanical: who you are, what you +read, what you produce, what skills/tools to use, where to write outputs. +Most non-director profiles should `always_load: kanban-worker` for the +deeper-than-baseline kanban guidance. + +### Initial kanban task + +The final action of setup.sh is firing the kanban: + +```bash +hermes kanban create "Direct production of