feat(agent): per-turn file-mutation verifier footer (#24498)

Detect when write_file / patch calls fail during a turn and are never
superseded by a successful write to the same path.  When the final
text response is delivered, append an advisory footer listing the
files that did NOT change — so models that over-claim 'patched 5 files'
after 4 silent failures can't hide the lie.

Catches the failure mode reported in Ben Eng's llm-wiki session:
grok-4.1-fast issued batches of parallel patches, half failed with
'Could not find old_string', and the agent summarised the turn
claiming every file was edited.  The user had to manually run
'git status' each turn to catch it.

The verifier is a pure post-hoc check on tool results — no new LLM
calls, no synthetic messages injected into history (prompt cache
preserved), no changes to tool argument dispatch.  Per-turn state is
keyed by path; a later successful write to the same path clears the
failure entry so single-file retry recovery is not flagged.

Wired into both _execute_tool_calls_concurrent and
_execute_tool_calls_sequential, so batched parallel patches and one-at-
a-time edits are both covered.  Footer emission happens after the
agent loop exits, before transform_llm_output / post_llm_call plugin
hooks run, so plugins still see (and can modify) the augmented text.

Config: display.file_mutation_verifier (bool, default true) +
HERMES_FILE_MUTATION_VERIFIER env override.

31 unit tests in tests/run_agent/test_file_mutation_verifier.py cover
target extraction (write_file, patch-replace, patch-v4a single and
multi-file), error-preview extraction (JSON .error field and plain
string), per-turn state transitions (first-error-wins on repeated
failure, success supersedes failure), footer rendering (truncation
at 10 entries, user-actionable hint), and env/config precedence.

Companion docs updated: user-guide/configuration.md +
reference/environment-variables.md.
This commit is contained in:
Teknium 2026-05-12 11:54:13 -07:00 committed by GitHub
parent dd0923bb89
commit c594a23047
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 552 additions and 0 deletions

View file

@ -1204,9 +1204,25 @@ display:
runtime_footer: # Gateway: append a runtime-context footer to final replies
enabled: false
fields: ["model", "context_pct", "cwd"]
file_mutation_verifier: true # Append an advisory footer when write_file/patch calls failed this turn
language: en # UI language for static messages (approval prompts, some gateway replies). en | zh | ja | de | es | fr | tr | uk
```
### File-mutation verifier
When `display.file_mutation_verifier` is `true` (default), Hermes appends a one-line advisory to the assistant's final response whenever a `write_file` or `patch` call failed during the turn and was never superseded by a successful write to the same path. This catches the "batch of parallel patches, half silently fail, model summarises success" class of over-claim without requiring you to manually run `git status` after every edit.
Example footer:
```
⚠️ File-mutation verifier: 3 file(s) were NOT modified this turn despite any wording above that may suggest otherwise. Run `git status` or `read_file` to confirm.
• concepts/automatic-organization.md — [patch] Could not find match for old_string
• concepts/lora.md — [patch] Could not find match for old_string
• concepts/rag-pipeline.md — [patch] Could not find match for old_string
```
Set `file_mutation_verifier: false` (or `HERMES_FILE_MUTATION_VERIFIER=0`) to suppress the footer. The verifier only fires when real failures are outstanding at turn end — a model that retries a failed patch and succeeds within the same turn will not trigger it for that file.
### UI language for static messages
The `display.language` setting translates a small set of static user-facing messages — the CLI approval prompt, a handful of gateway slash-command replies (e.g. restart-drain notices, "approval expired", "goal cleared"). It does **not** translate agent responses, log lines, tool output, error tracebacks, or slash-command descriptions — those stay in English. If you want the agent itself to reply in another language, just tell it in your prompt or system message.