mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168)
* feat(lsp): semantic diagnostics from real language servers in write_file/patch
Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.
LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.
The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.
New module agent/lsp/ split into:
- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}
Wired into tools/file_operations.py:
- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged
Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).
Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.
Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.
Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.
* feat(lsp): structured logging, backend gate, defensive walk caps
Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.
agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.
agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.
tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.
agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.
agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.
Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
silence (clean writes stay DEBUG), state-transition INFO-once
semantics (active for, no project root), action-required
WARNING-once (server unavailable), per-call WARNING (timeouts,
spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
_lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
the LSP layer for non-local backends and route correctly for
LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
exercising the 8 KiB cap.
Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
... reportReturnType (Pyright)" through the full path, then patch
fix removes it on the next call.
* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field
Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:
1. atexit cleanup of spawned language servers
----------------------------------------------------------------
``agent/lsp/__init__.get_service`` now registers an ``atexit``
handler on first creation that tears down the LSPService on
Python exit. Without this, every ``hermes chat`` exit was
leaking pyright/gopls/etc. processes for a few seconds while
their stdout buffers drained -- they got reaped by the kernel
eventually but a watchful ``ps aux`` would catch them.
The handler runs once per process (gated by
``_atexit_registered``); idempotent ``shutdown_service``
ensures double-fire is a no-op. Errors during shutdown are
swallowed at debug level since by the time atexit fires the
user has already seen the agent's final response.
2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
----------------------------------------------------------------
Previously the LSP layer folded its diagnostic block into the
``lint.output`` string, conflating the syntax-check tier with
the semantic tier. The agent (and any downstream parsers) now
read syntax errors and semantic errors as independent signals:
{
"bytes_written": 42,
"lint": {"status": "ok", "output": ""},
"lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
}
``_check_lint_delta`` returns to its original two-tier shape
(syntax check + delta filter); ``write_file`` and
``patch_replace`` independently fetch LSP diagnostics via
``_maybe_lsp_diagnostics`` and pass them into the new field.
``patch_replace`` propagates the inner write_file's
``lsp_diagnostics`` so the outer PatchResult carries the patch's
delta correctly.
Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
fires once and only once across N get_service calls; the
registered callable is our internal shutdown wrapper;
shutdown_service is idempotent and safe when never started;
exceptions during shutdown are swallowed; inactive service is
cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
/ PatchResult dataclass shape, to_dict include/omit semantics,
channel separation (lint and lsp_diagnostics carry independent
signals), write_file populates the field via
_maybe_lsp_diagnostics only when the syntax tier is clean,
patch_replace propagates the field forward from its internal
write_file.
Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
error introduced -> lint clean (parses), lsp_diagnostics carries
the pyright reportReturnType block; patch fix -> both fields
clean again.
* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write
Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.
The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.
Fix:
- New ``_mark_broken_for_file`` helper at the service layer marks
the (server_id, workspace_root) pair broken from the OUTSIDE when
the outer timeout fires. Called from the except branches in
``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
+ generic Exception). Also kills any orphan client process that
survived the cancelled future, fire-and-forget with a 1s ceiling.
- ``enabled_for`` now consults the broken-set BEFORE returning True.
Files in already-broken (server_id, root) pairs short-circuit to
False, so the file_operations layer skips the LSP path entirely
with no spawn cost. Until the service is restarted (``hermes lsp
restart``) or the process exits.
- A single eventlog WARNING is emitted on first mark-broken so the
user knows which server gave up. Subsequent edits in the same
project stay silent.
Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.
Validation:
- Live retest of the wedged-binary scenario: 5 sequential writes,
first 8.88s (the one snapshot timeout), subsequent four ~0.84s
(no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
patch fix all behave correctly with the new broken-set logic.
Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
This commit is contained in:
parent
d89553c2d6
commit
83b93898c2
28 changed files with 6144 additions and 17 deletions
|
|
@ -120,6 +120,13 @@ class WriteResult:
|
|||
bytes_written: int = 0
|
||||
dirs_created: bool = False
|
||||
lint: Optional[Dict[str, Any]] = None
|
||||
# Semantic diagnostics from the LSP layer, when applicable. Kept in
|
||||
# its own field (not folded into ``lint``) so the model and any
|
||||
# downstream parsers can read syntax errors and semantic errors as
|
||||
# separate signals. ``None`` when LSP is disabled, when the file
|
||||
# isn't in a git workspace, or when no diagnostics were introduced
|
||||
# by this edit.
|
||||
lsp_diagnostics: Optional[str] = None
|
||||
error: Optional[str] = None
|
||||
warning: Optional[str] = None
|
||||
|
||||
|
|
@ -136,6 +143,8 @@ class PatchResult:
|
|||
files_created: List[str] = field(default_factory=list)
|
||||
files_deleted: List[str] = field(default_factory=list)
|
||||
lint: Optional[Dict[str, Any]] = None
|
||||
# See :class:`WriteResult.lsp_diagnostics`.
|
||||
lsp_diagnostics: Optional[str] = None
|
||||
error: Optional[str] = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
|
|
@ -150,6 +159,8 @@ class PatchResult:
|
|||
result["files_deleted"] = self.files_deleted
|
||||
if self.lint:
|
||||
result["lint"] = self.lint
|
||||
if self.lsp_diagnostics:
|
||||
result["lsp_diagnostics"] = self.lsp_diagnostics
|
||||
if self.error:
|
||||
result["error"] = self.error
|
||||
return result
|
||||
|
|
@ -867,6 +878,13 @@ class ShellFileOperations(FileOperations):
|
|||
if read_result.exit_code == 0 and read_result.stdout:
|
||||
pre_content = read_result.stdout
|
||||
|
||||
# Snapshot LSP diagnostics for this file (best-effort) so the
|
||||
# post-write LSP layer can return only diagnostics introduced
|
||||
# by this specific edit. Mirrors claude-code's
|
||||
# ``beforeFileEdited`` pattern but wired to the local LSP
|
||||
# rather than an external IDE.
|
||||
self._snapshot_lsp_baseline(path)
|
||||
|
||||
# Create parent directories
|
||||
parent = os.path.dirname(path)
|
||||
dirs_created = False
|
||||
|
|
@ -897,10 +915,21 @@ class ShellFileOperations(FileOperations):
|
|||
# Post-write lint with delta refinement.
|
||||
lint_result = self._check_lint_delta(path, pre_content=pre_content, post_content=content)
|
||||
|
||||
# Semantic diagnostics from the LSP layer — separate channel.
|
||||
# Only fired when the syntax tier reported clean (no point asking
|
||||
# an LSP for a file that won't even parse). Best-effort:
|
||||
# ``""`` is returned for any failure path.
|
||||
lsp_diagnostics: Optional[str] = None
|
||||
if lint_result.success or lint_result.skipped:
|
||||
block = self._maybe_lsp_diagnostics(path)
|
||||
if block:
|
||||
lsp_diagnostics = block
|
||||
|
||||
return WriteResult(
|
||||
bytes_written=bytes_written,
|
||||
dirs_created=dirs_created,
|
||||
lint=lint_result.to_dict() if lint_result else None,
|
||||
lsp_diagnostics=lsp_diagnostics,
|
||||
)
|
||||
|
||||
# =========================================================================
|
||||
|
|
@ -996,7 +1025,14 @@ class ShellFileOperations(FileOperations):
|
|||
success=True,
|
||||
diff=diff,
|
||||
files_modified=[path],
|
||||
lint=lint_result.to_dict() if lint_result else None
|
||||
lint=lint_result.to_dict() if lint_result else None,
|
||||
# Propagate the LSP diagnostics already captured by the
|
||||
# internal ``write_file`` call. Its baseline was the
|
||||
# pre-patch content (taken at the start of write_file via
|
||||
# ``_snapshot_lsp_baseline``) so the delta is correct for
|
||||
# the patch as a whole. Keep the field separate from the
|
||||
# syntax-check ``lint`` so the agent can read both signals.
|
||||
lsp_diagnostics=write_result.lsp_diagnostics,
|
||||
)
|
||||
|
||||
def patch_v4a(self, patch_content: str) -> PatchResult:
|
||||
|
|
@ -1089,21 +1125,25 @@ class ShellFileOperations(FileOperations):
|
|||
def _check_lint_delta(self, path: str, pre_content: Optional[str],
|
||||
post_content: Optional[str] = None) -> LintResult:
|
||||
"""
|
||||
Run post-write lint with pre-write baseline comparison.
|
||||
Run post-write syntax lint with pre-write baseline comparison.
|
||||
|
||||
Strategy (post-first, pre-lazy):
|
||||
1. Lint the post-write state. If clean → return clean immediately.
|
||||
This is the hot path and matches _check_lint() in cost.
|
||||
2. If post-lint found errors AND we have pre-write content, lint
|
||||
that too. If the pre-write file was already broken, return only
|
||||
the *new* errors introduced by this edit — errors that existed
|
||||
before aren't the agent's problem to chase right now.
|
||||
3. If pre_content is None (new file or unavailable), skip the delta
|
||||
step and return all post-write errors.
|
||||
Two-tier strategy:
|
||||
|
||||
This mirrors Cline's and OpenCode's post-edit LSP pattern: surface
|
||||
only the errors this specific edit introduced, so the agent doesn't
|
||||
get distracted by pre-existing problems.
|
||||
1. **Syntax check** (in-process or shell-based, microseconds).
|
||||
Catches the bug class that motivated this layer: corrupt
|
||||
writes, mashed quotes, truncated output. Hot path.
|
||||
|
||||
2. **Delta refinement against pre-write content** when the
|
||||
syntax tier reports errors. Filter out errors that already
|
||||
existed pre-edit so the agent isn't distracted by inherited
|
||||
state.
|
||||
|
||||
Semantic diagnostics from the LSP layer are fetched separately
|
||||
via :meth:`_maybe_lsp_diagnostics` and surfaced in the
|
||||
``lsp_diagnostics`` field on :class:`WriteResult` /
|
||||
:class:`PatchResult`. Keeping the two channels separate lets
|
||||
the agent (and any downstream parsers) read syntax errors and
|
||||
semantic errors as independent signals.
|
||||
|
||||
Args:
|
||||
path: File path (for linter selection).
|
||||
|
|
@ -1122,12 +1162,12 @@ class ShellFileOperations(FileOperations):
|
|||
"""
|
||||
post = self._check_lint(path, content=post_content)
|
||||
|
||||
# Hot path: clean post-write, no pre-lint needed.
|
||||
# Hot path: clean post-write syntactically.
|
||||
if post.success or post.skipped:
|
||||
return post
|
||||
|
||||
# Post-write has errors. If we have pre-content, run the delta
|
||||
# refinement to filter out pre-existing errors.
|
||||
# Post-write has syntax errors. If we have pre-content, run the
|
||||
# delta refinement to filter out pre-existing errors.
|
||||
if pre_content is None:
|
||||
return post
|
||||
|
||||
|
|
@ -1166,6 +1206,91 @@ class ShellFileOperations(FileOperations):
|
|||
"(pre-existing errors filtered out):\n" + "\n".join(post_lines)
|
||||
)
|
||||
)
|
||||
|
||||
def _lsp_local_only(self) -> bool:
|
||||
"""Return True iff this FileOperations is wired to a local backend.
|
||||
|
||||
LSP servers run on the host process — they need access to the
|
||||
files they're linting. Remote/sandboxed backends (Docker,
|
||||
Modal, SSH, Daytona) keep files inside the sandbox where the
|
||||
host-side LSP server can't reach them, so we skip the LSP
|
||||
path for those entirely.
|
||||
"""
|
||||
env = getattr(self, "env", None)
|
||||
if env is None:
|
||||
# Defensive: some tests construct ShellFileOperations via
|
||||
# ``__new__`` without going through ``__init__``, so
|
||||
# ``self.env`` may be missing. No env = no LSP path.
|
||||
return False
|
||||
try:
|
||||
from tools.environments.local import LocalEnvironment
|
||||
except Exception: # noqa: BLE001
|
||||
return False
|
||||
return isinstance(env, LocalEnvironment)
|
||||
|
||||
def _snapshot_lsp_baseline(self, path: str) -> None:
|
||||
"""Capture pre-edit LSP diagnostics so the post-write delta is correct.
|
||||
|
||||
Best-effort. Silent on every failure path — LSP is an
|
||||
enrichment layer and must never break a write.
|
||||
|
||||
Skipped entirely on non-local backends (Docker, Modal, SSH,
|
||||
etc.) — the server can't see files inside the sandbox.
|
||||
"""
|
||||
if not self._lsp_local_only():
|
||||
return
|
||||
try:
|
||||
from agent.lsp import get_service
|
||||
svc = get_service()
|
||||
except Exception: # noqa: BLE001
|
||||
return
|
||||
if svc is None:
|
||||
return
|
||||
try:
|
||||
svc.snapshot_baseline(path)
|
||||
except Exception: # noqa: BLE001
|
||||
pass
|
||||
|
||||
def _maybe_lsp_diagnostics(self, path: str) -> str:
|
||||
"""Best-effort LSP semantic diagnostics for ``path``.
|
||||
|
||||
Returns a formatted ``<diagnostics>`` block, or empty string
|
||||
when LSP is unavailable / disabled / produced no errors.
|
||||
|
||||
Wraps everything in a try/except so a misbehaving LSP server
|
||||
can't break a write. This intentionally swallows all errors
|
||||
— the calling tier already returned a clean syntax result, so
|
||||
``""`` here just means "no extra info to add".
|
||||
|
||||
Skipped entirely on non-local backends (Docker, Modal, SSH,
|
||||
etc.) — same reasoning as ``_snapshot_lsp_baseline``.
|
||||
"""
|
||||
if not self._lsp_local_only():
|
||||
return ""
|
||||
try:
|
||||
from agent.lsp import get_service
|
||||
except Exception: # noqa: BLE001
|
||||
return ""
|
||||
try:
|
||||
svc = get_service()
|
||||
except Exception: # noqa: BLE001
|
||||
return ""
|
||||
if svc is None or not svc.enabled_for(path):
|
||||
return ""
|
||||
try:
|
||||
diagnostics = svc.get_diagnostics_sync(path, delta=True)
|
||||
except Exception: # noqa: BLE001
|
||||
return ""
|
||||
if not diagnostics:
|
||||
return ""
|
||||
try:
|
||||
from agent.lsp.reporter import report_for_file, truncate
|
||||
block = report_for_file(path, diagnostics)
|
||||
if not block:
|
||||
return ""
|
||||
return truncate("LSP diagnostics introduced by this edit:\n" + block)
|
||||
except Exception: # noqa: BLE001
|
||||
return ""
|
||||
|
||||
# =========================================================================
|
||||
# SEARCH Implementation
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue