feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168)

* feat(lsp): semantic diagnostics from real language servers in write_file/patch Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server, clangd, bash-language-server, ...) into the post-write lint check used by write_file and patch. The model now sees type errors, undefined names, missing imports, and project-wide semantic issues introduced by its edits, not just syntax errors. LSP is gated on git workspace detection: when the agent's cwd or the file being edited is inside a git worktree, LSP runs against that workspace; otherwise the existing in-process syntax checks are the only tier. This keeps users on user-home cwds (Telegram/Discord gateway chats) from spawning daemons. The post-write check is layered: in-process syntax check first (microseconds), then LSP semantic diagnostics second when syntax is clean. Diagnostics are delta-filtered against a baseline captured at write start, so the agent only sees errors its edit introduced. A flaky/missing language server can never break a write -- every LSP failure path falls back silently to the syntax-only result. New module agent/lsp/ split into: - protocol.py: Content-Length JSON-RPC framer + envelope helpers - client.py: async LSPClient (spawn, initialize, didOpen/didChange, ContentModified retry, push/pull diagnostic stores) - workspace.py: git worktree walk-up + per-server NearestRoot resolver - servers.py: registry of 26 language servers (extension match, root resolver, spawn builder per language) - install.py: auto-install dispatch (npm install --prefix, go install with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/ - manager.py: LSPService (per-(server_id, root) client registry, lazy spawn, broken-set, in-flight dedupe, sync facade for tools layer) - reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file) - cli.py: hermes lsp {status,list,install,install-all,restart,which} Wired into tools/file_operations.py: - write_file/patch_replace now call _snapshot_lsp_baseline before write - _check_lint_delta gains a third tier: LSP semantic diagnostics when syntax is clean - All LSP code paths swallow exceptions; write_file's contract unchanged Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true), wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server overrides (disabled, command, env, initialization_options). Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and read_message round-trip, EOF/truncation/missing Content-Length), workspace gate (git walk-up, exclude markers, fallback to file location), reporter (severity filter, max-per-file cap, truncation), service-level delta filter, and an in-process mock LSP server that exercises the full client lifecycle including didChange version bumps, dedup, crash recovery, and idempotent teardown. Live E2E verified end-to-end through ShellFileOperations: pyright auto-installed via npm into HERMES_HOME, baseline captured, type error introduced, single delta diagnostic surfaced with correct line/column/code/ source, then patch fix removes the diagnostic from the output. Docs: new website/docs/user-guide/features/lsp.md page covering supported languages, configuration knobs, performance characteristics, and troubleshooting; cli-commands.md updated with the 'hermes lsp' reference; sidebar updated. * feat(lsp): structured logging, backend gate, defensive walk caps Cherry-picks the substantive ideas from #24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call. * feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field Two improvements salvaged from #24414's plugin-form alternative, keeping our core-integrated design: 1. atexit cleanup of spawned language servers ---------------------------------------------------------------- ``agent/lsp/__init__.get_service`` now registers an ``atexit`` handler on first creation that tears down the LSPService on Python exit. Without this, every ``hermes chat`` exit was leaking pyright/gopls/etc. processes for a few seconds while their stdout buffers drained -- they got reaped by the kernel eventually but a watchful ``ps aux`` would catch them. The handler runs once per process (gated by ``_atexit_registered``); idempotent ``shutdown_service`` ensures double-fire is a no-op. Errors during shutdown are swallowed at debug level since by the time atexit fires the user has already seen the agent's final response. 2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult ---------------------------------------------------------------- Previously the LSP layer folded its diagnostic block into the ``lint.output`` string, conflating the syntax-check tier with the semantic tier. The agent (and any downstream parsers) now read syntax errors and semantic errors as independent signals: { "bytes_written": 42, "lint": {"status": "ok", "output": ""}, "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..." } ``_check_lint_delta`` returns to its original two-tier shape (syntax check + delta filter); ``write_file`` and ``patch_replace`` independently fetch LSP diagnostics via ``_maybe_lsp_diagnostics`` and pass them into the new field. ``patch_replace`` propagates the inner write_file's ``lsp_diagnostics`` so the outer PatchResult carries the patch's delta correctly. Tests: 19 new - tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration fires once and only once across N get_service calls; the registered callable is our internal shutdown wrapper; shutdown_service is idempotent and safe when never started; exceptions during shutdown are swallowed; inactive service is cached so we don't rebuild on every check. - tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult / PatchResult dataclass shape, to_dict include/omit semantics, channel separation (lint and lsp_diagnostics carry independent signals), write_file populates the field via _maybe_lsp_diagnostics only when the syntax tier is clean, patch_replace propagates the field forward from its internal write_file. Validation: - 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field) - 217/217 pass with file_operations + LSP combined - Live E2E reverified: clean writes -> both fields empty/none; type error introduced -> lint clean (parses), lsp_diagnostics carries the pyright reportReturnType block; patch fix -> both fields clean again. * fix(lsp): broken-set short-circuit so a wedged server isn't paid every write Discovered while auditing failure paths: a language server binary that hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY subsequent write to re-pay the 8s snapshot_baseline timeout. Five writes = ~64s of dead time. The bug: ``_get_or_spawn`` adds the (server_id, root) pair to ``_broken`` inside its inner exception handler, but when the OUTER ``_loop.run`` timeout fires, it cancels the inner task before that handler runs. The pair never makes it to broken-set, so the next write re-enters the spawn path and re-pays the timeout. Fix: - New ``_mark_broken_for_file`` helper at the service layer marks the (server_id, workspace_root) pair broken from the OUTSIDE when the outer timeout fires. Called from the except branches in ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError + generic Exception). Also kills any orphan client process that survived the cancelled future, fire-and-forget with a 1s ceiling. - ``enabled_for`` now consults the broken-set BEFORE returning True. Files in already-broken (server_id, root) pairs short-circuit to False, so the file_operations layer skips the LSP path entirely with no spawn cost. Until the service is restarted (``hermes lsp restart``) or the process exits. - A single eventlog WARNING is emitted on first mark-broken so the user knows which server gave up. Subsequent edits in the same project stay silent. Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the key shape (server_id, per_server_root), enabled_for short-circuit, sibling-file skip in same project, project isolation (broken in A doesn't affect B), graceful no-op for missing-server / no-workspace, and an end-to-end test that snapshots after a failure and verifies the next ``enabled_for`` returns False. Validation: - Live retest of the wedged-binary scenario: 5 sequential writes, first 8.88s (the one snapshot timeout), subsequent four ~0.84s (no LSP cost). Down from 5x12.85s = 64s before this fix. - 99/99 LSP tests pass (92 prior + 7 broken-set) - 224/224 pass with file_operations + LSP combined - Happy path E2E reverified — clean write, type error introduced, patch fix all behave correctly with the new broken-set logic. Note: the FIRST write to a wedged binary still pays 8s (the snapshot_baseline timeout). We could shorten that, but pyright/ tsserver normally take 2-3s and slow CI rust-analyzer can need 5+ seconds, so 8s is the conservative ceiling. Subsequent writes are instant.
2026-05-18 04:41:56 +00:00 · 2026-05-12 16:31:54 -07:00 · 2026-05-12 16:31:54 -07:00 · 83b93898c2
commit 83b93898c2
parent d89553c2d6
28 changed files with 6144 additions and 17 deletions
--- a/tools/file_operations.py
+++ b/tools/file_operations.py
@ -120,6 +120,13 @@ class WriteResult:
    bytes_written: int = 0
    dirs_created: bool = False
    lint: Optional[Dict[str, Any]] = None
+    # Semantic diagnostics from the LSP layer, when applicable.  Kept in
+    # its own field (not folded into ``lint``) so the model and any
+    # downstream parsers can read syntax errors and semantic errors as
+    # separate signals.  ``None`` when LSP is disabled, when the file
+    # isn't in a git workspace, or when no diagnostics were introduced
+    # by this edit.
+    lsp_diagnostics: Optional[str] = None
    error: Optional[str] = None
    warning: Optional[str] = None

@ -136,6 +143,8 @@ class PatchResult:
    files_created: List[str] = field(default_factory=list)
    files_deleted: List[str] = field(default_factory=list)
    lint: Optional[Dict[str, Any]] = None
+    # See :class:`WriteResult.lsp_diagnostics`.
+    lsp_diagnostics: Optional[str] = None
    error: Optional[str] = None
    
    def to_dict(self) -> dict:
@ -150,6 +159,8 @@ class PatchResult:
            result["files_deleted"] = self.files_deleted
        if self.lint:
            result["lint"] = self.lint
+        if self.lsp_diagnostics:
+            result["lsp_diagnostics"] = self.lsp_diagnostics
        if self.error:
            result["error"] = self.error
        return result
@ -867,6 +878,13 @@ class ShellFileOperations(FileOperations):
            if read_result.exit_code == 0 and read_result.stdout:
                pre_content = read_result.stdout

+        # Snapshot LSP diagnostics for this file (best-effort) so the
+        # post-write LSP layer can return only diagnostics introduced
+        # by this specific edit.  Mirrors claude-code's
+        # ``beforeFileEdited`` pattern but wired to the local LSP
+        # rather than an external IDE.
+        self._snapshot_lsp_baseline(path)
+
        # Create parent directories
        parent = os.path.dirname(path)
        dirs_created = False
@ -897,10 +915,21 @@ class ShellFileOperations(FileOperations):
        # Post-write lint with delta refinement.
        lint_result = self._check_lint_delta(path, pre_content=pre_content, post_content=content)

+        # Semantic diagnostics from the LSP layer — separate channel.
+        # Only fired when the syntax tier reported clean (no point asking
+        # an LSP for a file that won't even parse).  Best-effort:
+        # ``""`` is returned for any failure path.
+        lsp_diagnostics: Optional[str] = None
+        if lint_result.success or lint_result.skipped:
+            block = self._maybe_lsp_diagnostics(path)
+            if block:
+                lsp_diagnostics = block
+
        return WriteResult(
            bytes_written=bytes_written,
            dirs_created=dirs_created,
            lint=lint_result.to_dict() if lint_result else None,
+            lsp_diagnostics=lsp_diagnostics,
        )
    
    # =========================================================================
@ -996,7 +1025,14 @@ class ShellFileOperations(FileOperations):
            success=True,
            diff=diff,
            files_modified=[path],
-            lint=lint_result.to_dict() if lint_result else None
+            lint=lint_result.to_dict() if lint_result else None,
+            # Propagate the LSP diagnostics already captured by the
+            # internal ``write_file`` call.  Its baseline was the
+            # pre-patch content (taken at the start of write_file via
+            # ``_snapshot_lsp_baseline``) so the delta is correct for
+            # the patch as a whole.  Keep the field separate from the
+            # syntax-check ``lint`` so the agent can read both signals.
+            lsp_diagnostics=write_result.lsp_diagnostics,
        )
    
    def patch_v4a(self, patch_content: str) -> PatchResult:
@ -1089,21 +1125,25 @@ class ShellFileOperations(FileOperations):
    def _check_lint_delta(self, path: str, pre_content: Optional[str],
                          post_content: Optional[str] = None) -> LintResult:
        """
-        Run post-write lint with pre-write baseline comparison.
+        Run post-write syntax lint with pre-write baseline comparison.

-        Strategy (post-first, pre-lazy):
-        1. Lint the post-write state.  If clean → return clean immediately.
-           This is the hot path and matches _check_lint() in cost.
-        2. If post-lint found errors AND we have pre-write content, lint
-           that too.  If the pre-write file was already broken, return only
-           the *new* errors introduced by this edit — errors that existed
-           before aren't the agent's problem to chase right now.
-        3. If pre_content is None (new file or unavailable), skip the delta
-           step and return all post-write errors.
+        Two-tier strategy:

-        This mirrors Cline's and OpenCode's post-edit LSP pattern: surface
-        only the errors this specific edit introduced, so the agent doesn't
-        get distracted by pre-existing problems.
+        1. **Syntax check** (in-process or shell-based, microseconds).
+           Catches the bug class that motivated this layer: corrupt
+           writes, mashed quotes, truncated output.  Hot path.
+
+        2. **Delta refinement against pre-write content** when the
+           syntax tier reports errors.  Filter out errors that already
+           existed pre-edit so the agent isn't distracted by inherited
+           state.
+
+        Semantic diagnostics from the LSP layer are fetched separately
+        via :meth:`_maybe_lsp_diagnostics` and surfaced in the
+        ``lsp_diagnostics`` field on :class:`WriteResult` /
+        :class:`PatchResult`.  Keeping the two channels separate lets
+        the agent (and any downstream parsers) read syntax errors and
+        semantic errors as independent signals.

        Args:
            path: File path (for linter selection).
@ -1122,12 +1162,12 @@ class ShellFileOperations(FileOperations):
        """
        post = self._check_lint(path, content=post_content)

-        # Hot path: clean post-write, no pre-lint needed.
+        # Hot path: clean post-write syntactically.
        if post.success or post.skipped:
            return post

-        # Post-write has errors.  If we have pre-content, run the delta
-        # refinement to filter out pre-existing errors.
+        # Post-write has syntax errors.  If we have pre-content, run the
+        # delta refinement to filter out pre-existing errors.
        if pre_content is None:
            return post

@ -1166,6 +1206,91 @@ class ShellFileOperations(FileOperations):
                "(pre-existing errors filtered out):\n" + "\n".join(post_lines)
            )
        )
+
+    def _lsp_local_only(self) -> bool:
+        """Return True iff this FileOperations is wired to a local backend.
+
+        LSP servers run on the host process — they need access to the
+        files they're linting.  Remote/sandboxed backends (Docker,
+        Modal, SSH, Daytona) keep files inside the sandbox where the
+        host-side LSP server can't reach them, so we skip the LSP
+        path for those entirely.
+        """
+        env = getattr(self, "env", None)
+        if env is None:
+            # Defensive: some tests construct ShellFileOperations via
+            # ``__new__`` without going through ``__init__``, so
+            # ``self.env`` may be missing.  No env = no LSP path.
+            return False
+        try:
+            from tools.environments.local import LocalEnvironment
+        except Exception:  # noqa: BLE001
+            return False
+        return isinstance(env, LocalEnvironment)
+
+    def _snapshot_lsp_baseline(self, path: str) -> None:
+        """Capture pre-edit LSP diagnostics so the post-write delta is correct.
+
+        Best-effort.  Silent on every failure path — LSP is an
+        enrichment layer and must never break a write.
+
+        Skipped entirely on non-local backends (Docker, Modal, SSH,
+        etc.) — the server can't see files inside the sandbox.
+        """
+        if not self._lsp_local_only():
+            return
+        try:
+            from agent.lsp import get_service
+            svc = get_service()
+        except Exception:  # noqa: BLE001
+            return
+        if svc is None:
+            return
+        try:
+            svc.snapshot_baseline(path)
+        except Exception:  # noqa: BLE001
+            pass
+
+    def _maybe_lsp_diagnostics(self, path: str) -> str:
+        """Best-effort LSP semantic diagnostics for ``path``.
+
+        Returns a formatted ``<diagnostics>`` block, or empty string
+        when LSP is unavailable / disabled / produced no errors.
+
+        Wraps everything in a try/except so a misbehaving LSP server
+        can't break a write.  This intentionally swallows all errors
+        — the calling tier already returned a clean syntax result, so
+        ``""`` here just means "no extra info to add".
+
+        Skipped entirely on non-local backends (Docker, Modal, SSH,
+        etc.) — same reasoning as ``_snapshot_lsp_baseline``.
+        """
+        if not self._lsp_local_only():
+            return ""
+        try:
+            from agent.lsp import get_service
+        except Exception:  # noqa: BLE001
+            return ""
+        try:
+            svc = get_service()
+        except Exception:  # noqa: BLE001
+            return ""
+        if svc is None or not svc.enabled_for(path):
+            return ""
+        try:
+            diagnostics = svc.get_diagnostics_sync(path, delta=True)
+        except Exception:  # noqa: BLE001
+            return ""
+        if not diagnostics:
+            return ""
+        try:
+            from agent.lsp.reporter import report_for_file, truncate
+            block = report_for_file(path, diagnostics)
+            if not block:
+                return ""
+            return truncate("LSP diagnostics introduced by this edit:\n" + block)
+        except Exception:  # noqa: BLE001
+            return ""
    
    # =========================================================================
    # SEARCH Implementation