hermes-agent/tests/agent/test_curator_classification.py
Teknium 97acd66b4c
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete

Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.

The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
  - absorbed_into='<umbrella>'  -> consolidated, target must exist on disk
  - absorbed_into=''            -> explicit prune, no forwarding target
  - missing                     -> legacy path, falls through to heuristic/YAML

The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.

Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
  + _delete_skill. Validate target exists when non-empty. Reject
  absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
  tool calls for skill_manage(delete) with the arg. _reconcile_classification
  accepts absorbed_declarations= and treats them as authoritative. Curator
  prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
  target, empty string, nonexistent target, self-reference, whitespace,
  backward compat, dispatcher plumbing). 11 new curator tests covering
  the extractor + authoritative reconciler path + mixed-legacy-and-
  declared runs.

Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
  all 3. Model emits NO YAML block. Heuristic misses (patch prose
  doesn't name old slugs). Delete calls carry absorbed_into. Result:
  both PR skills correctly classified 'consolidated' + cron rewritten
  ['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
  ['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
  -> routed via existing 'model' source, cron still rewritten correctly.

* feat(curator): capture + restore cron skill links across snapshot/rollback

Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.

Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.

Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.

Reconciliation rules:
- Job in backup AND live, skills differ  → skills restored.
- Job in backup AND live, skills match   → no-op.
- Job in backup, NOT in live             → skipped (user deleted it
                                              after snapshot; their choice
                                              is later than the snapshot).
- Job in live, NOT in backup             → untouched (user created it
                                              after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
                                              reports 'not captured'
                                              (older pre-feature snapshots
                                              keep working).

Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().

Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
  'cron jobs: N (will be restored for skill-link fields only)' when the
  snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
  summarizing the reconciliation outcome.

Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
  captured-as-raw, full rollback-restores-skills-and-cron, rollback
  touches only skill fields, rollback skips user-deleted jobs, rollback
  leaves user-created jobs untouched, rollback still works with
  pre-feature snapshot that has no cron-jobs.json, standalone unit test
  on _restore_cron_skill_links exercising the full report shape.

Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
  ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
  After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
  'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
  name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00

813 lines
30 KiB
Python

"""Tests for the curator consolidated-vs-pruned classifier.
The classifier splits skills that disappeared between the before/after
snapshots into two buckets:
- "consolidated" — absorbed into an umbrella; content still lives
under another skill's files
- "pruned" — archived for staleness; content not preserved elsewhere
Without the split the report lumped everything under "Skills archived",
which misled users into thinking consolidated skills had been pruned.
"""
from __future__ import annotations
import json
from datetime import datetime, timezone
from pathlib import Path
import pytest
@pytest.fixture
def curator_env(tmp_path, monkeypatch):
home = tmp_path / ".hermes"
home.mkdir()
(home / "skills").mkdir()
(home / "logs").mkdir()
monkeypatch.setenv("HERMES_HOME", str(home))
monkeypatch.setattr(Path, "home", lambda: tmp_path)
import importlib
import hermes_constants
importlib.reload(hermes_constants)
from agent import curator
importlib.reload(curator)
yield curator
def test_classify_consolidated_via_write_file_evidence(curator_env):
"""skill_manage write_file on umbrella references/<removed>.md = consolidated."""
result = curator_env._classify_removed_skills(
removed=["axolotl-training"],
added=[],
after_names={"training-platforms", "keeper"},
tool_calls=[
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "write_file",
"name": "training-platforms",
"file_path": "references/axolotl-training.md",
"file_content": "# Axolotl\n...",
}),
},
],
)
assert len(result["consolidated"]) == 1
assert result["consolidated"][0]["name"] == "axolotl-training"
assert result["consolidated"][0]["into"] == "training-platforms"
assert result["pruned"] == []
def test_classify_pruned_when_no_destination_reference(curator_env):
"""Removed skill with no referencing tool call = pruned."""
result = curator_env._classify_removed_skills(
removed=["old-stale-thing"],
added=[],
after_names={"keeper"},
tool_calls=[
{"name": "skills_list", "arguments": "{}"},
{"name": "skill_manage", "arguments": json.dumps({
"action": "patch", "name": "keeper",
"old_string": "foo", "new_string": "bar",
})},
],
)
assert result["consolidated"] == []
assert len(result["pruned"]) == 1
assert result["pruned"][0]["name"] == "old-stale-thing"
def test_classify_consolidated_into_newly_created_umbrella(curator_env):
"""Removed skill absorbed into a skill that was created THIS run."""
result = curator_env._classify_removed_skills(
removed=["anthropic-api"],
added=["llm-providers"], # new umbrella
after_names={"llm-providers"},
tool_calls=[
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "create",
"name": "llm-providers",
"content": "# LLM Providers\n\n## anthropic-api\nMerged from the old anthropic-api skill.\n",
}),
},
],
)
assert len(result["consolidated"]) == 1
assert result["consolidated"][0]["name"] == "anthropic-api"
assert result["consolidated"][0]["into"] == "llm-providers"
def test_classify_handles_underscore_hyphen_variants(curator_env):
"""Names with hyphens match underscore forms in paths/content and vice versa."""
result = curator_env._classify_removed_skills(
removed=["open-webui-setup"],
added=[],
after_names={"webui"},
tool_calls=[
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "write_file",
"name": "webui",
"file_path": "references/open_webui_setup.md",
"file_content": "...",
}),
},
],
)
assert len(result["consolidated"]) == 1
assert result["consolidated"][0]["into"] == "webui"
def test_classify_self_reference_does_not_count(curator_env):
"""A tool call that targets the removed skill itself is NOT consolidation."""
# e.g. the curator patched the skill once and later archived it
result = curator_env._classify_removed_skills(
removed=["doomed"],
added=[],
after_names={"keeper"},
tool_calls=[
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "patch",
"name": "doomed", # same as removed
"old_string": "x",
"new_string": "y",
}),
},
],
)
assert result["consolidated"] == []
assert result["pruned"][0]["name"] == "doomed"
def test_classify_destination_must_exist_after_run(curator_env):
"""A reference to a skill that doesn't exist after the run can't be the umbrella."""
result = curator_env._classify_removed_skills(
removed=["thing"],
added=[],
after_names={"keeper"}, # "ghost" not in here
tool_calls=[
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "write_file",
"name": "ghost", # not in after_names
"file_path": "references/thing.md",
"file_content": "...",
}),
},
],
)
assert result["consolidated"] == []
assert result["pruned"][0]["name"] == "thing"
def test_classify_mixed_run_produces_both_buckets(curator_env):
"""A realistic run: one skill consolidated, one skill pruned."""
result = curator_env._classify_removed_skills(
removed=["absorbed-skill", "dead-skill"],
added=["umbrella"],
after_names={"umbrella", "keeper"},
tool_calls=[
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "write_file",
"name": "umbrella",
"file_path": "references/absorbed-skill.md",
"file_content": "...",
}),
},
],
)
assert len(result["consolidated"]) == 1
assert result["consolidated"][0]["name"] == "absorbed-skill"
assert result["consolidated"][0]["into"] == "umbrella"
assert len(result["pruned"]) == 1
assert result["pruned"][0]["name"] == "dead-skill"
def test_classify_handles_malformed_arguments_string(curator_env):
"""Truncated/malformed JSON in arguments falls back to substring match."""
# Arguments truncated to 400 chars may not parse as JSON.
truncated_raw = (
'{"action":"write_file","name":"umbrella","file_path":"references/'
'absorbed-skill.md","file_content":"long content that was cut off mid'
)
result = curator_env._classify_removed_skills(
removed=["absorbed-skill"],
added=[],
after_names={"umbrella"},
tool_calls=[
{"name": "skill_manage", "arguments": truncated_raw},
],
)
# Fallback substring match finds "absorbed-skill" in the raw truncated string
# even though json.loads fails — but it can't identify target="umbrella"
# because _raw is the only haystack and there's no dict access. The
# classifier only promotes to "consolidated" if it can identify a target
# skill from args.get("name"). Ensure we fail safe: no false positive.
# (This is a correctness floor — better to prune-label than hallucinate
# an umbrella that wasn't really used.)
assert result["consolidated"] == []
assert len(result["pruned"]) == 1
def test_report_md_splits_consolidated_and_pruned_sections(curator_env):
"""End-to-end: REPORT.md shows both sections distinctly."""
curator = curator_env
start = datetime.now(timezone.utc)
before = [
{"name": "absorbed-skill", "state": "active", "pinned": False},
{"name": "dead-skill", "state": "stale", "pinned": False},
{"name": "keeper", "state": "active", "pinned": False},
]
after = [
{"name": "keeper", "state": "active", "pinned": False},
{"name": "umbrella", "state": "active", "pinned": False},
]
run_dir = curator._write_run_report(
started_at=start,
elapsed_seconds=60.0,
auto_counts={"checked": 3, "marked_stale": 0, "archived": 0, "reactivated": 0},
auto_summary="no auto changes",
before_report=before,
before_names={r["name"] for r in before},
after_report=after,
llm_meta={
"final": "Consolidated absorbed-skill into umbrella. Pruned dead-skill.",
"summary": "1 consolidated, 1 pruned",
"model": "m",
"provider": "p",
"error": None,
"tool_calls": [
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "create",
"name": "umbrella",
"content": "# umbrella\n\nAbsorbed absorbed-skill.",
}),
},
],
},
)
payload = json.loads((run_dir / "run.json").read_text())
# Both lists exist and are disjoint
consolidated_names = {e["name"] for e in payload["consolidated"]}
assert consolidated_names == {"absorbed-skill"}
# `pruned` holds full dicts {name, source, reason}; `pruned_names` is the
# flat list for quick scans / legacy compat.
pruned_names = payload["pruned_names"]
assert pruned_names == ["dead-skill"]
assert all(isinstance(e, dict) and "name" in e for e in payload["pruned"])
# The union still matches the legacy "archived" field for backward compat
assert set(payload["archived"]) == consolidated_names | set(pruned_names)
# counts exposed
assert payload["counts"]["consolidated_this_run"] == 1
assert payload["counts"]["pruned_this_run"] == 1
md = (run_dir / "REPORT.md").read_text()
# Two separate sections, not a single "Skills archived" lump
assert "Consolidated into umbrella skills" in md
assert "Pruned — archived for staleness" in md
assert "`absorbed-skill` → merged into `umbrella`" in md
assert "`dead-skill`" in md
# The old single-lump section should not appear
assert "### Skills archived" not in md
# ---------------------------------------------------------------------------
# _parse_structured_summary — extracting the model's required YAML block
# ---------------------------------------------------------------------------
def test_parse_structured_summary_happy_path(curator_env):
text = (
"Long human summary here. I processed clusters X, Y, Z.\n\n"
"## Structured summary (required)\n"
"```yaml\n"
"consolidations:\n"
" - from: anthropic-api\n"
" into: llm-providers\n"
" reason: duplicate of the generic llm-providers skill\n"
" - from: openai-api\n"
" into: llm-providers\n"
" reason: same — merged with sibling\n"
"prunings:\n"
" - name: random-old-notes\n"
" reason: pre-curator garbage, no overlap\n"
"```\n"
)
out = curator_env._parse_structured_summary(text)
assert len(out["consolidations"]) == 2
assert out["consolidations"][0] == {
"from": "anthropic-api",
"into": "llm-providers",
"reason": "duplicate of the generic llm-providers skill",
}
assert len(out["prunings"]) == 1
assert out["prunings"][0]["reason"] == "pre-curator garbage, no overlap"
def test_parse_structured_summary_missing_block(curator_env):
out = curator_env._parse_structured_summary("No block in this text.")
assert out == {"consolidations": [], "prunings": []}
def test_parse_structured_summary_malformed_yaml(curator_env):
text = "```yaml\nthis: is\n not: [valid yaml\n```"
out = curator_env._parse_structured_summary(text)
assert out == {"consolidations": [], "prunings": []}
def test_parse_structured_summary_empty_lists(curator_env):
text = "```yaml\nconsolidations: []\nprunings: []\n```"
out = curator_env._parse_structured_summary(text)
assert out == {"consolidations": [], "prunings": []}
def test_parse_structured_summary_ignores_bare_strings(curator_env):
"""Entries that aren't dicts (e.g. a model wrote bare names) are skipped."""
text = (
"```yaml\n"
"consolidations:\n"
" - just-a-bare-string\n"
" - from: real-entry\n"
" into: umbrella\n"
" reason: valid\n"
"prunings: []\n"
"```"
)
out = curator_env._parse_structured_summary(text)
assert len(out["consolidations"]) == 1
assert out["consolidations"][0]["from"] == "real-entry"
def test_parse_structured_summary_missing_required_fields(curator_env):
"""Consolidation entries without from+into are skipped."""
text = (
"```yaml\n"
"consolidations:\n"
" - from: only-from\n"
" reason: no into\n"
" - into: only-into\n"
" - from: good\n"
" into: umbrella\n"
"prunings: []\n"
"```"
)
out = curator_env._parse_structured_summary(text)
assert len(out["consolidations"]) == 1
assert out["consolidations"][0]["from"] == "good"
# ---------------------------------------------------------------------------
# _reconcile_classification — merging model block with heuristic
# ---------------------------------------------------------------------------
def test_reconcile_model_wins_when_umbrella_exists(curator_env):
"""Model claim + umbrella in destinations → model authority (with reason)."""
out = curator_env._reconcile_classification(
removed=["anthropic-api"],
heuristic={"consolidated": [], "pruned": [{"name": "anthropic-api"}]},
model_block={
"consolidations": [{
"from": "anthropic-api",
"into": "llm-providers",
"reason": "duplicate",
}],
"prunings": [],
},
destinations={"llm-providers"},
)
assert len(out["consolidated"]) == 1
e = out["consolidated"][0]
assert e["name"] == "anthropic-api"
assert e["into"] == "llm-providers"
assert e["reason"] == "duplicate"
assert e["source"] == "model"
assert out["pruned"] == []
def test_reconcile_model_hallucinates_umbrella(curator_env):
"""Model names a non-existent umbrella — downgrade, prefer heuristic if any."""
out = curator_env._reconcile_classification(
removed=["thing"],
heuristic={
"consolidated": [{"name": "thing", "into": "real-umbrella", "evidence": "..."}],
"pruned": [],
},
model_block={
"consolidations": [{
"from": "thing",
"into": "nonexistent-umbrella",
"reason": "confused",
}],
"prunings": [],
},
destinations={"real-umbrella"},
)
assert len(out["consolidated"]) == 1
e = out["consolidated"][0]
assert e["into"] == "real-umbrella"
assert "tool-call audit" in e["source"]
assert e["model_claimed_into"] == "nonexistent-umbrella"
def test_reconcile_model_hallucinates_with_no_heuristic_evidence(curator_env):
"""Model names a non-existent umbrella AND no tool-call evidence → prune."""
out = curator_env._reconcile_classification(
removed=["ghost"],
heuristic={"consolidated": [], "pruned": [{"name": "ghost"}]},
model_block={
"consolidations": [{
"from": "ghost",
"into": "nonexistent",
"reason": "wrong",
}],
"prunings": [],
},
destinations={"real-umbrella"},
)
assert out["consolidated"] == []
assert len(out["pruned"]) == 1
assert "fallback" in out["pruned"][0]["source"]
def test_reconcile_heuristic_catches_model_omission(curator_env):
"""Model forgot to list a consolidation, heuristic found it."""
out = curator_env._reconcile_classification(
removed=["forgotten"],
heuristic={
"consolidated": [{
"name": "forgotten",
"into": "umbrella",
"evidence": "write_file on umbrella referenced forgotten.md",
}],
"pruned": [],
},
model_block={"consolidations": [], "prunings": []},
destinations={"umbrella"},
)
assert len(out["consolidated"]) == 1
e = out["consolidated"][0]
assert e["into"] == "umbrella"
assert "model omitted" in e["source"]
def test_reconcile_model_prunes_with_reason(curator_env):
"""Model says pruned, heuristic agrees, we surface the reason."""
out = curator_env._reconcile_classification(
removed=["stale-skill"],
heuristic={"consolidated": [], "pruned": [{"name": "stale-skill"}]},
model_block={
"consolidations": [],
"prunings": [{"name": "stale-skill", "reason": "superseded by bundled skill"}],
},
destinations=set(),
)
assert len(out["pruned"]) == 1
e = out["pruned"][0]
assert e["reason"] == "superseded by bundled skill"
assert e["source"] == "model"
def test_reconcile_model_block_visible_in_full_report(curator_env):
"""End-to-end: LLM final response with the YAML block → reasons in REPORT.md."""
import json as _json
from datetime import datetime as _dt, timezone as _tz
start = _dt.now(_tz.utc)
before = [
{"name": "anthropic-api", "state": "active", "pinned": False},
{"name": "stale-thing", "state": "stale", "pinned": False},
]
after = [{"name": "llm-providers", "state": "active", "pinned": False}]
llm_final_text = (
"Processed 3 clusters. Absorbed anthropic-api into llm-providers.\n\n"
"## Structured summary (required)\n"
"```yaml\n"
"consolidations:\n"
" - from: anthropic-api\n"
" into: llm-providers\n"
" reason: duplicate content, now a subsection\n"
"prunings:\n"
" - name: stale-thing\n"
" reason: pre-curator junk, no overlap with anything\n"
"```\n"
)
run_dir = curator_env._write_run_report(
started_at=start,
elapsed_seconds=30.0,
auto_counts={"checked": 2, "marked_stale": 0, "archived": 0, "reactivated": 0},
auto_summary="none",
before_report=before,
before_names={r["name"] for r in before},
after_report=after,
llm_meta={
"final": llm_final_text,
"summary": "1 consolidated, 1 pruned",
"model": "m",
"provider": "p",
"error": None,
"tool_calls": [
{"name": "skill_manage", "arguments": _json.dumps({
"action": "create",
"name": "llm-providers",
"content": "# llm-providers\nIncludes anthropic-api",
})},
],
},
)
payload = _json.loads((run_dir / "run.json").read_text())
cons = payload["consolidated"][0]
assert cons["name"] == "anthropic-api"
assert cons["into"] == "llm-providers"
assert cons["reason"] == "duplicate content, now a subsection"
assert cons["source"] == "model+audit" # model AND heuristic both had it
pruned = payload["pruned"][0]
assert pruned["name"] == "stale-thing"
assert pruned["reason"] == "pre-curator junk, no overlap with anything"
md = (run_dir / "REPORT.md").read_text()
assert "duplicate content, now a subsection" in md
assert "pre-curator junk" in md
# ---------------------------------------------------------------------------
# _extract_absorbed_into_declarations — authoritative signal from delete calls
# ---------------------------------------------------------------------------
def test_extract_absorbed_into_picks_up_consolidation(curator_env):
"""Delete call with absorbed_into=<umbrella> yields a declaration."""
declarations = curator_env._extract_absorbed_into_declarations([
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "delete",
"name": "narrow-skill",
"absorbed_into": "umbrella",
}),
},
])
assert declarations == {
"narrow-skill": {"into": "umbrella", "declared": True},
}
def test_extract_absorbed_into_empty_string_is_explicit_prune(curator_env):
"""absorbed_into='' is recorded as an explicit prune declaration."""
declarations = curator_env._extract_absorbed_into_declarations([
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "delete",
"name": "stale",
"absorbed_into": "",
}),
},
])
assert declarations == {"stale": {"into": "", "declared": True}}
def test_extract_absorbed_into_missing_arg_ignored(curator_env):
"""Delete call without absorbed_into is skipped — fallback to heuristic."""
declarations = curator_env._extract_absorbed_into_declarations([
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "delete",
"name": "legacy-skill",
}),
},
])
assert declarations == {}
def test_extract_absorbed_into_ignores_non_delete_actions(curator_env):
"""Patch, create, write_file etc. must not leak into declarations."""
declarations = curator_env._extract_absorbed_into_declarations([
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "patch",
"name": "umbrella",
"old_string": "...",
"new_string": "...",
"absorbed_into": "something", # bogus on non-delete, must be ignored
}),
},
])
assert declarations == {}
def test_extract_absorbed_into_accepts_dict_arguments(curator_env):
"""arguments can arrive as a dict (defensive path) — still works."""
declarations = curator_env._extract_absorbed_into_declarations([
{
"name": "skill_manage",
"arguments": {
"action": "delete",
"name": "narrow",
"absorbed_into": "umbrella",
},
},
])
assert declarations == {"narrow": {"into": "umbrella", "declared": True}}
def test_extract_absorbed_into_strips_whitespace(curator_env):
declarations = curator_env._extract_absorbed_into_declarations([
{
"name": "skill_manage",
"arguments": json.dumps({
"action": "delete",
"name": " narrow ",
"absorbed_into": " umbrella ",
}),
},
])
assert declarations == {"narrow": {"into": "umbrella", "declared": True}}
def test_extract_absorbed_into_ignores_non_skill_manage_calls(curator_env):
declarations = curator_env._extract_absorbed_into_declarations([
{"name": "terminal", "arguments": json.dumps({"command": "ls"})},
{"name": "read_file", "arguments": json.dumps({"path": "/tmp/x"})},
])
assert declarations == {}
def test_extract_absorbed_into_handles_malformed_arguments(curator_env):
"""Garbage JSON in arguments must not crash the extractor."""
declarations = curator_env._extract_absorbed_into_declarations([
{"name": "skill_manage", "arguments": "{not json"},
{"name": "skill_manage", "arguments": None},
{"name": "skill_manage"}, # no arguments key at all
])
assert declarations == {}
# ---------------------------------------------------------------------------
# _reconcile_classification with absorbed_into declarations (authoritative)
# ---------------------------------------------------------------------------
def test_reconcile_absorbed_into_beats_everything_else(curator_env):
"""Model declared absorbed_into at delete; YAML/heuristic disagree — declaration wins.
This is the exact #18671 regression: the model forgets to emit the YAML
summary block, the heuristic's substring match misses because the
umbrella's patch content doesn't literally contain the old skill's
slug. Previously this fell through to 'no-evidence fallback' prune,
which dropped the cron ref instead of rewriting. With absorbed_into
declared, the model tells us directly.
"""
out = curator_env._reconcile_classification(
removed=["pr-review-format"],
heuristic={"consolidated": [], "pruned": [{"name": "pr-review-format"}]},
model_block={"consolidations": [], "prunings": []}, # model forgot YAML block
destinations={"hermes-agent-dev"},
absorbed_declarations={
"pr-review-format": {"into": "hermes-agent-dev", "declared": True},
},
)
assert len(out["consolidated"]) == 1
assert out["pruned"] == []
e = out["consolidated"][0]
assert e["name"] == "pr-review-format"
assert e["into"] == "hermes-agent-dev"
assert "absorbed_into" in e["source"]
def test_reconcile_absorbed_into_empty_is_explicit_prune(curator_env):
"""absorbed_into='' takes precedence and routes to pruned, not fallback."""
out = curator_env._reconcile_classification(
removed=["stale"],
heuristic={"consolidated": [], "pruned": [{"name": "stale"}]},
model_block={"consolidations": [], "prunings": []},
destinations=set(),
absorbed_declarations={
"stale": {"into": "", "declared": True},
},
)
assert out["consolidated"] == []
assert len(out["pruned"]) == 1
assert "model-declared prune" in out["pruned"][0]["source"]
def test_reconcile_absorbed_into_nonexistent_target_falls_through(curator_env):
"""If the declared umbrella doesn't exist in destinations, fall through to
heuristic/YAML logic. Shouldn't happen in practice (the tool validates at
delete time) but the reconciler is defensive."""
out = curator_env._reconcile_classification(
removed=["thing"],
heuristic={
"consolidated": [{"name": "thing", "into": "real-umbrella", "evidence": "..."}],
"pruned": [],
},
model_block={"consolidations": [], "prunings": []},
destinations={"real-umbrella"},
absorbed_declarations={
"thing": {"into": "ghost-umbrella", "declared": True},
},
)
assert len(out["consolidated"]) == 1
assert out["consolidated"][0]["into"] == "real-umbrella"
assert "tool-call audit" in out["consolidated"][0]["source"]
def test_reconcile_declaration_preserves_yaml_reason(curator_env):
"""When the model both declared absorbed_into AND emitted YAML with reason,
the reason carries through so REPORT.md still has it."""
out = curator_env._reconcile_classification(
removed=["narrow"],
heuristic={"consolidated": [], "pruned": []},
model_block={
"consolidations": [{
"from": "narrow",
"into": "umbrella",
"reason": "duplicate of umbrella's main content",
}],
"prunings": [],
},
destinations={"umbrella"},
absorbed_declarations={
"narrow": {"into": "umbrella", "declared": True},
},
)
assert len(out["consolidated"]) == 1
e = out["consolidated"][0]
assert e["into"] == "umbrella"
assert "absorbed_into" in e["source"]
assert e["reason"] == "duplicate of umbrella's main content"
def test_reconcile_without_declarations_preserves_legacy_behavior(curator_env):
"""Backward compat: no absorbed_declarations arg → all existing logic intact."""
out = curator_env._reconcile_classification(
removed=["thing"],
heuristic={
"consolidated": [{"name": "thing", "into": "umbrella", "evidence": "..."}],
"pruned": [],
},
model_block={"consolidations": [], "prunings": []},
destinations={"umbrella"},
# no absorbed_declarations — defaults to None → behaves identically to pre-change
)
assert len(out["consolidated"]) == 1
assert out["consolidated"][0]["into"] == "umbrella"
def test_reconcile_mixed_declarations_and_legacy_calls(curator_env):
"""Real-world run: some deletes declared absorbed_into, some didn't.
Declared ones use the authoritative path; others fall through to YAML/heuristic.
"""
out = curator_env._reconcile_classification(
removed=["declared-cons", "declared-prune", "legacy-cons", "legacy-prune"],
heuristic={
"consolidated": [
{"name": "legacy-cons", "into": "umbrella-a", "evidence": "..."},
],
"pruned": [{"name": "legacy-prune"}],
},
model_block={"consolidations": [], "prunings": []},
destinations={"umbrella-a", "umbrella-b"},
absorbed_declarations={
"declared-cons": {"into": "umbrella-b", "declared": True},
"declared-prune": {"into": "", "declared": True},
},
)
cons_by_name = {e["name"]: e for e in out["consolidated"]}
pruned_by_name = {e["name"]: e for e in out["pruned"]}
assert "declared-cons" in cons_by_name
assert cons_by_name["declared-cons"]["into"] == "umbrella-b"
assert "absorbed_into" in cons_by_name["declared-cons"]["source"]
assert "legacy-cons" in cons_by_name
assert cons_by_name["legacy-cons"]["into"] == "umbrella-a"
assert "tool-call audit" in cons_by_name["legacy-cons"]["source"]
assert "declared-prune" in pruned_by_name
assert "model-declared prune" in pruned_by_name["declared-prune"]["source"]
assert "legacy-prune" in pruned_by_name
assert "no-evidence fallback" in pruned_by_name["legacy-prune"]["source"]