read_file's gutter used a fixed-width zero/space-padded prefix
(" 1|content"). The padding is pure token overhead: measured with
cl100k on real Hermes source, the padded gutter costs ~48% more tokens
than bare content and ~16% more than a compact "<n>|content" gutter,
because the leading spaces tokenize into extra tokens on every line.
Switched the default to the compact "<n>|content" form. An A/B
(Sonnet 4.6 via OpenRouter, 2 passes, 4-task battery, every claim
verified against ground truth) showed:
- padded : 4/4 PASS both passes
- compact : 4/4 PASS both passes ← keeps line-referencing + patch
- none : 3/4 PASS both passes ← dropping numbers entirely made
the model hand-count lines and answer off-by-one (33 vs 34)
So we keep the line numbers (the model genuinely uses them to reference
lines) but drop the wasteful padding — capturing ~14% of the read-token
cost with zero measured accuracy change. Dropping numbers entirely
(the larger 33% saving) is rejected: it regresses line-referencing.
patch/fuzzy_match never consumed the gutter (they match old_string text
and compute char offsets internally), so editing is unaffected. No
downstream parser keys on the fixed-width columns. HERMES_READ_GUTTER=
padded restores the legacy format for anyone relying on alignment.
Tests: updated the 3 format assertions to the compact gutter; added an
env-override test for the legacy padded format. 209 file-tool tests green.
Closes the gap where write_file skipped the post-edit syntax check that
patch already ran, so silent file corruption (bad quote escaping,
truncated writes, etc.) would persist on disk until a later read.
## Changes
tools/file_operations.py:
- Add in-process linters for .py, .json, .yaml, .toml (LINTERS_INPROC).
Python uses ast.parse, JSON/YAML/TOML use stdlib/PyYAML parsers.
Zero subprocess overhead; preferred over shell linters when both apply.
- _check_lint() now accepts optional content and routes to in-process
linter first. Shell linter (py_compile, node --check, tsc, go vet,
rustfmt) remains the fallback for languages without an in-process
equivalent.
- New _check_lint_delta() implements the post-first/pre-lazy pattern
borrowed from Cline and OpenCode: lint post-write state first; only
if errors are found AND pre-content was captured does it lint the
pre-state and diff. If the pre-existing file had the SAME errors the
edit didn't introduce anything new, so the file is reported as 'still
broken, pre-existing' with success=False but a message explaining the
errors were pre-existing. If the edit introduced genuinely new errors,
those are surfaced and pre-existing ones are filtered out.
- WriteResult gains a lint field.
- write_file() captures pre-content for in-process-lintable extensions
and calls _check_lint_delta after a successful write.
- patch_replace() switches from _check_lint to _check_lint_delta,
reusing the pre-edit content it already has in scope.
tools/file_tools.py:
- Update write_file schema description to mention the post-write lint.
tests/tools/test_file_operations_edge_cases.py:
- Update existing brace-path tests to use .js (shell linter) now that
.py is in-process.
- Add TestCheckLintInproc (9 tests) covering Python/JSON/YAML/TOML
in-process linters.
- Add TestCheckLintDelta (5 tests) covering the post-first/pre-lazy
short-circuit, new-file path, and the single-error-parser caveat.
## Performance
In-process linters are microseconds per call (ast.parse, json.loads).
The hot path (clean write) runs exactly one lint — matches main's cost
for patch. Pre-state capture is skipped when the file has no applicable
linter. Measured 4.89ms/write average over 100 .py writes including lint.
## Inspiration
- Cline's DiffViewProvider.getNewDiagnosticProblems() — filters pre-write
diagnostics from post-write diagnostics (src/integrations/editor/DiffViewProvider.ts).
- OpenCode's WriteTool — runs lsp.diagnostics() after write and appends
errors to tool output (packages/opencode/src/tool/write.ts).
- Claude Code's DiagnosticTrackingService — captures baseline via
beforeFileEdited() and returns new-diagnostics-only from
getNewDiagnostics() (src/services/diagnosticTracking.ts).
## Validation
- tests/tools/test_file_operations.py + test_file_operations_edge_cases.py
+ test_file_tools.py + test_file_tools_live.py + test_file_write_safety.py
+ test_write_deny.py + test_patch_parser.py + test_file_ops_cwd_tracking.py:
228 passed locally.
- Live E2E reproduction of the tips.py corruption incident: broken
content written; lint field surfaces 'SyntaxError: invalid syntax.
Perhaps you forgot a comma? (line 6, column 5)' — the exact error
that would have self-corrected the bug on the next turn.
- Remove unreachable `if not content_sample` branch inside the truthy
`if content_sample` block in `_is_likely_binary()` (dead code that
could never execute).
- Replace `linter_cmd.format(file=...)` with `linter_cmd.replace("{file}", ...)`
in `_check_lint()` so file paths containing curly braces (e.g.
`src/{test}.py`) no longer raise KeyError/ValueError.
- Add 16 unit tests covering both fixes and edge cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>