hermes-agent

mirrors/hermes-agent

Fork 0

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-18 04:41:56 +00:00

Commit graph

Author	SHA1	Message	Date
Teknium	e5af1dd633	fix(review): tell background reviewer not to capture transient env failures as skills (#23004 ) Closes #6051. Reported failure mode: agent migrated to WSL2, browser launch failed because Playwright wasn't installed yet. Background reviewer captured the failure as a durable skill (`browser-tool-launch-issue`) and the agent kept refusing the browser tool for weeks after Playwright was installed and verified working. Negative claims also propagated into unrelated skills ("browser tools do not work", "cannot use Y from execute_code"). Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both lean hard on "be active, save things, a pass that does nothing is a missed learning opportunity." Neither distinguished durable knowledge from transient environment state. The reviewer was doing what it was told. Fix at the write site — both prompts now carry a "Do NOT capture" section calling out: • Environment-dependent failures (missing binaries, fresh-install errors, post-migration path mismatches, 'command not found', unconfigured credentials, uninstalled packages) • Negative claims about tools or features ("X does not work") that harden into self-cited refusals • Session-specific transient errors that resolved before the conversation ended • One-off task narratives ("summarize today's market", "analyze this PR") — also addresses the #12812 / #4538 family Plus a positive-reframing line: when a tool fails because of setup state, capture the FIX (install command, config step, env var) under an existing setup/troubleshooting skill — never "this tool doesn't work" as a standalone constraint. Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py (2 new + all existing review-prompt assertions). Substring-based checks so future prompt edits don't false-fail.	2026-05-09 22:51:25 -07:00
Teknium	1d4218be56	feat(review): active-update bias, loaded-skill-first, support-file variants (#17213 ) The background skill-review prompts (_SKILL_REVIEW_PROMPT and the Skills half of _COMBINED_REVIEW_PROMPT) steered the reviewer toward passive behavior — most passes concluded 'Nothing to save.' even when the session produced real lessons. User-preference corrections (style, format, legibility, verbosity) were especially lost: they were read as memory signals only, so skills never carried the fix. This rewrite changes the stance: - Active-update bias. The reviewer now treats inaction as a missed learning opportunity. 'Nothing to save.' remains an explicit escape but is no longer framed as the most-common outcome. - User-preference corrections are first-class skill signals. Style, tone, format, legibility, verbosity complaints — and the actual phrasings users use ('stop doing X', 'this is too verbose', 'I hate when you Y', 'remember this') — now warrant patching the skill that governs the task, not just writing to memory. - Loaded-skill-first preference order. When a skill was loaded via /skill-name or skill_view during the session, the reviewer patches THAT one first. It was in play; it's the right place. - Four-step ladder: patch-loaded → patch-umbrella → support-file → create. Support files are explicitly enumerated as three kinds: * references/<topic>.md — session-specific detail OR condensed knowledge banks (quoted research, API docs excerpts, domain notes) * templates/<name>.<ext> — starter files to copy and modify * scripts/<name>.<ext> — statically re-runnable actions - Name-veto for CREATE. New skill names MUST be class-level — no PR numbers, error strings, codenames, library-alone names, or session artifacts ('fix-X / debug-Y / audit-Z-today'). If the proposed name only fits today's task, fall back to one of the patch/support-file options. - Memory scope clarified. 'who the user is and what the current situation and state of your operations are' — MEMORY.md is situational/state, USER.md is identity/preferences. - Curator handoff. Reviewer flags overlap; the background curator handles consolidation at scale. Single-session reviewer doesn't attempt umbrella-rebalancing. Tests: tests/run_agent/test_review_prompt_class_first.py upgraded to assert the new behavioral contracts (active bias, user-correction signals, loaded-skill-first, support-file kinds, name-veto, memory framing, curator handoff). 17 tests, all pass. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 21:11:48 -07:00
Teknium	76042f5867	feat(review): class-first skill review prompt (#16026 ) The background skill-review prompt (spawned after N user turns) now instructs the reviewer to SURVEY existing skills first, identify the CLASS of task, and PREFER updating/generalizing an existing skill over creating a new narrow one. This reduces near-duplicate skill accumulation at the source. Catches the common failure mode where repeated tasks of the same class each spawn their own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead of a single class-level skill ("desktop-app-build-troubleshooting"). Applied to both _SKILL_REVIEW_PROMPT and the Skills half of _COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged. Groundwork for the Curator feature (issue #7816) — the creation-side fix. Curator handles the retirement/consolidation side in a follow-up PR. Tests assert the behavioral instructions are present (survey, class, update- over-create, overlap-flagging, opt-out clause) rather than snapshotting the full prompt text.	2026-04-26 05:17:10 -07:00

Author

SHA1

Message

Date

Teknium

e5af1dd633

fix(review): tell background reviewer not to capture transient env failures as skills (#23004 )

Closes #6051.

Reported failure mode: agent migrated to WSL2, browser launch failed
because Playwright wasn't installed yet. Background reviewer captured
the failure as a durable skill (`browser-tool-launch-issue`) and the
agent kept refusing the browser tool for weeks after Playwright was
installed and verified working. Negative claims also propagated into
unrelated skills ("browser tools do not work", "cannot use Y from
execute_code").

Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both
lean hard on "be active, save things, a pass that does nothing is a
missed learning opportunity." Neither distinguished durable knowledge
from transient environment state. The reviewer was doing what it was
told.

Fix at the write site — both prompts now carry a "Do NOT capture"
section calling out:
  • Environment-dependent failures (missing binaries, fresh-install
    errors, post-migration path mismatches, 'command not found',
    unconfigured credentials, uninstalled packages)
  • Negative claims about tools or features ("X does not work")
    that harden into self-cited refusals
  • Session-specific transient errors that resolved before the
    conversation ended
  • One-off task narratives ("summarize today's market", "analyze
    this PR") — also addresses the #12812 / #4538 family

Plus a positive-reframing line: when a tool fails because of setup
state, capture the FIX (install command, config step, env var)
under an existing setup/troubleshooting skill — never "this tool
doesn't work" as a standalone constraint.

Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py
(2 new + all existing review-prompt assertions). Substring-based
checks so future prompt edits don't false-fail.

2026-05-09 22:51:25 -07:00

Teknium

1d4218be56

feat(review): active-update bias, loaded-skill-first, support-file variants (#17213 )

The background skill-review prompts (_SKILL_REVIEW_PROMPT and the **Skills**
half of _COMBINED_REVIEW_PROMPT) steered the reviewer toward passive
behavior — most passes concluded 'Nothing to save.' even when the session
produced real lessons. User-preference corrections (style, format,
legibility, verbosity) were especially lost: they were read as memory
signals only, so skills never carried the fix.

This rewrite changes the stance:

- **Active-update bias.** The reviewer now treats inaction as a missed
  learning opportunity. 'Nothing to save.' remains an explicit escape
  but is no longer framed as the most-common outcome.

- **User-preference corrections are first-class skill signals.** Style,
  tone, format, legibility, verbosity complaints — and the actual
  phrasings users use ('stop doing X', 'this is too verbose', 'I hate
  when you Y', 'remember this') — now warrant patching the skill that
  governs the task, not just writing to memory.

- **Loaded-skill-first preference order.** When a skill was loaded via
  /skill-name or skill_view during the session, the reviewer patches
  THAT one first. It was in play; it's the right place.

- **Four-step ladder: patch-loaded → patch-umbrella → support-file →
  create.** Support files are explicitly enumerated as three kinds:
    * references/<topic>.md — session-specific detail OR condensed
      knowledge banks (quoted research, API docs excerpts, domain notes)
    * templates/<name>.<ext> — starter files to copy and modify
    * scripts/<name>.<ext>  — statically re-runnable actions

- **Name-veto for CREATE.** New skill names MUST be class-level — no PR
  numbers, error strings, codenames, library-alone names, or session
  artifacts ('fix-X / debug-Y / audit-Z-today'). If the proposed name
  only fits today's task, fall back to one of the patch/support-file
  options.

- **Memory scope clarified.** 'who the user is and what the current
  situation and state of your operations are' — MEMORY.md is
  situational/state, USER.md is identity/preferences.

- **Curator handoff.** Reviewer flags overlap; the background curator
  handles consolidation at scale. Single-session reviewer doesn't
  attempt umbrella-rebalancing.

Tests: tests/run_agent/test_review_prompt_class_first.py upgraded to
assert the new behavioral contracts (active bias, user-correction
signals, loaded-skill-first, support-file kinds, name-veto, memory
framing, curator handoff). 17 tests, all pass.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>

2026-04-28 21:11:48 -07:00

Teknium

76042f5867

feat(review): class-first skill review prompt (#16026 )

The background skill-review prompt (spawned after N user turns) now instructs
the reviewer to SURVEY existing skills first, identify the CLASS of task, and
PREFER updating/generalizing an existing skill over creating a new narrow one.

This reduces near-duplicate skill accumulation at the source. Catches the
common failure mode where repeated tasks of the same class each spawn their
own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead
of a single class-level skill ("desktop-app-build-troubleshooting").

Applied to both _SKILL_REVIEW_PROMPT and the **Skills** half of
_COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged.

Groundwork for the Curator feature (issue #7816) — the creation-side fix.
Curator handles the retirement/consolidation side in a follow-up PR.

Tests assert the behavioral instructions are present (survey, class, update-
over-create, overlap-flagging, opt-out clause) rather than snapshotting the
full prompt text.

2026-04-26 05:17:10 -07:00

3 commits