feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168)

* feat(lsp): semantic diagnostics from real language servers in write_file/patch Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server, clangd, bash-language-server, ...) into the post-write lint check used by write_file and patch. The model now sees type errors, undefined names, missing imports, and project-wide semantic issues introduced by its edits, not just syntax errors. LSP is gated on git workspace detection: when the agent's cwd or the file being edited is inside a git worktree, LSP runs against that workspace; otherwise the existing in-process syntax checks are the only tier. This keeps users on user-home cwds (Telegram/Discord gateway chats) from spawning daemons. The post-write check is layered: in-process syntax check first (microseconds), then LSP semantic diagnostics second when syntax is clean. Diagnostics are delta-filtered against a baseline captured at write start, so the agent only sees errors its edit introduced. A flaky/missing language server can never break a write -- every LSP failure path falls back silently to the syntax-only result. New module agent/lsp/ split into: - protocol.py: Content-Length JSON-RPC framer + envelope helpers - client.py: async LSPClient (spawn, initialize, didOpen/didChange, ContentModified retry, push/pull diagnostic stores) - workspace.py: git worktree walk-up + per-server NearestRoot resolver - servers.py: registry of 26 language servers (extension match, root resolver, spawn builder per language) - install.py: auto-install dispatch (npm install --prefix, go install with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/ - manager.py: LSPService (per-(server_id, root) client registry, lazy spawn, broken-set, in-flight dedupe, sync facade for tools layer) - reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file) - cli.py: hermes lsp {status,list,install,install-all,restart,which} Wired into tools/file_operations.py: - write_file/patch_replace now call _snapshot_lsp_baseline before write - _check_lint_delta gains a third tier: LSP semantic diagnostics when syntax is clean - All LSP code paths swallow exceptions; write_file's contract unchanged Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true), wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server overrides (disabled, command, env, initialization_options). Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and read_message round-trip, EOF/truncation/missing Content-Length), workspace gate (git walk-up, exclude markers, fallback to file location), reporter (severity filter, max-per-file cap, truncation), service-level delta filter, and an in-process mock LSP server that exercises the full client lifecycle including didChange version bumps, dedup, crash recovery, and idempotent teardown. Live E2E verified end-to-end through ShellFileOperations: pyright auto-installed via npm into HERMES_HOME, baseline captured, type error introduced, single delta diagnostic surfaced with correct line/column/code/ source, then patch fix removes the diagnostic from the output. Docs: new website/docs/user-guide/features/lsp.md page covering supported languages, configuration knobs, performance characteristics, and troubleshooting; cli-commands.md updated with the 'hermes lsp' reference; sidebar updated. * feat(lsp): structured logging, backend gate, defensive walk caps Cherry-picks the substantive ideas from #24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call. * feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field Two improvements salvaged from #24414's plugin-form alternative, keeping our core-integrated design: 1. atexit cleanup of spawned language servers ---------------------------------------------------------------- ``agent/lsp/__init__.get_service`` now registers an ``atexit`` handler on first creation that tears down the LSPService on Python exit. Without this, every ``hermes chat`` exit was leaking pyright/gopls/etc. processes for a few seconds while their stdout buffers drained -- they got reaped by the kernel eventually but a watchful ``ps aux`` would catch them. The handler runs once per process (gated by ``_atexit_registered``); idempotent ``shutdown_service`` ensures double-fire is a no-op. Errors during shutdown are swallowed at debug level since by the time atexit fires the user has already seen the agent's final response. 2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult ---------------------------------------------------------------- Previously the LSP layer folded its diagnostic block into the ``lint.output`` string, conflating the syntax-check tier with the semantic tier. The agent (and any downstream parsers) now read syntax errors and semantic errors as independent signals: { "bytes_written": 42, "lint": {"status": "ok", "output": ""}, "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..." } ``_check_lint_delta`` returns to its original two-tier shape (syntax check + delta filter); ``write_file`` and ``patch_replace`` independently fetch LSP diagnostics via ``_maybe_lsp_diagnostics`` and pass them into the new field. ``patch_replace`` propagates the inner write_file's ``lsp_diagnostics`` so the outer PatchResult carries the patch's delta correctly. Tests: 19 new - tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration fires once and only once across N get_service calls; the registered callable is our internal shutdown wrapper; shutdown_service is idempotent and safe when never started; exceptions during shutdown are swallowed; inactive service is cached so we don't rebuild on every check. - tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult / PatchResult dataclass shape, to_dict include/omit semantics, channel separation (lint and lsp_diagnostics carry independent signals), write_file populates the field via _maybe_lsp_diagnostics only when the syntax tier is clean, patch_replace propagates the field forward from its internal write_file. Validation: - 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field) - 217/217 pass with file_operations + LSP combined - Live E2E reverified: clean writes -> both fields empty/none; type error introduced -> lint clean (parses), lsp_diagnostics carries the pyright reportReturnType block; patch fix -> both fields clean again. * fix(lsp): broken-set short-circuit so a wedged server isn't paid every write Discovered while auditing failure paths: a language server binary that hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY subsequent write to re-pay the 8s snapshot_baseline timeout. Five writes = ~64s of dead time. The bug: ``_get_or_spawn`` adds the (server_id, root) pair to ``_broken`` inside its inner exception handler, but when the OUTER ``_loop.run`` timeout fires, it cancels the inner task before that handler runs. The pair never makes it to broken-set, so the next write re-enters the spawn path and re-pays the timeout. Fix: - New ``_mark_broken_for_file`` helper at the service layer marks the (server_id, workspace_root) pair broken from the OUTSIDE when the outer timeout fires. Called from the except branches in ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError + generic Exception). Also kills any orphan client process that survived the cancelled future, fire-and-forget with a 1s ceiling. - ``enabled_for`` now consults the broken-set BEFORE returning True. Files in already-broken (server_id, root) pairs short-circuit to False, so the file_operations layer skips the LSP path entirely with no spawn cost. Until the service is restarted (``hermes lsp restart``) or the process exits. - A single eventlog WARNING is emitted on first mark-broken so the user knows which server gave up. Subsequent edits in the same project stay silent. Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the key shape (server_id, per_server_root), enabled_for short-circuit, sibling-file skip in same project, project isolation (broken in A doesn't affect B), graceful no-op for missing-server / no-workspace, and an end-to-end test that snapshots after a failure and verifies the next ``enabled_for`` returns False. Validation: - Live retest of the wedged-binary scenario: 5 sequential writes, first 8.88s (the one snapshot timeout), subsequent four ~0.84s (no LSP cost). Down from 5x12.85s = 64s before this fix. - 99/99 LSP tests pass (92 prior + 7 broken-set) - 224/224 pass with file_operations + LSP combined - Happy path E2E reverified — clean write, type error introduced, patch fix all behave correctly with the new broken-set logic. Note: the FIRST write to a wedged binary still pays 8s (the snapshot_baseline timeout). We could shorten that, but pyright/ tsserver normally take 2-3s and slow CI rust-analyzer can need 5+ seconds, so 8s is the conservative ceiling. Subsequent writes are instant.
2026-05-18 04:41:56 +00:00 · 2026-05-12 16:31:54 -07:00 · 2026-05-12 16:31:54 -07:00 · 83b93898c2
commit 83b93898c2
parent d89553c2d6
28 changed files with 6144 additions and 17 deletions
--- a/agent/lsp/init.py
+++ b/agent/lsp/init.py
@ -0,0 +1,106 @@
+"""Language Server Protocol (LSP) integration for Hermes Agent.
+
+Hermes runs full language servers (pyright, gopls, rust-analyzer,
+typescript-language-server, etc.) as subprocesses and pipes their
+``textDocument/publishDiagnostics`` output into the post-write lint
+delta filter used by ``write_file`` and ``patch``.
+
+LSP is **gated on git workspace detection** — if the agent's cwd is
+inside a git repository, LSP runs against that workspace; otherwise the
+file_operations layer falls back to its existing in-process syntax
+checks.  This keeps users on user-home cwd's (e.g. Telegram gateway
+chats) from spawning daemons they don't need.
+
+Public API:
+
+    from agent.lsp import get_service
+
+    svc = get_service()
+    if svc and svc.enabled_for(path):
+        await svc.touch_file(path)
+        diags = svc.diagnostics_for(path)
+
+The bulk of the wiring is internal — most callers only need the layer
+in :func:`tools.file_operations.FileOperations._check_lint_delta`,
+which is already wired (see that module).
+
+Architecture is documented in ``website/docs/user-guide/features/lsp.md``.
+"""
+from __future__ import annotations
+
+import atexit
+import logging
+import threading
+from typing import Optional
+
+from agent.lsp.manager import LSPService
+
+logger = logging.getLogger("agent.lsp")
+
+_service: Optional[LSPService] = None
+_atexit_registered = False
+_service_lock = threading.Lock()
+
+
+def get_service() -> Optional[LSPService]:
+    """Return the process-wide LSP service singleton, or None when disabled.
+
+    The service is created lazily on first call.  ``None`` is returned
+    when LSP is disabled in config, when no workspace can be detected,
+    or when the platform doesn't support subprocess-based LSP servers.
+
+    On first creation, registers an :mod:`atexit` handler that tears
+    down spawned language servers on Python exit so a long-running
+    CLI or gateway session doesn't leak pyright/gopls/etc. processes
+    when it terminates.
+    """
+    global _service, _atexit_registered
+    if _service is not None:
+        return _service if _service.is_active() else None
+    with _service_lock:
+        if _service is not None:
+            return _service if _service.is_active() else None
+        _service = LSPService.create_from_config()
+        if not _atexit_registered:
+            # ``atexit`` handlers run in LIFO order on normal Python
+            # exit and on SystemExit, but NOT on os._exit() or
+            # uncaught signals.  Language servers are stateless
+            # subprocesses — losing them on SIGKILL is fine; they'll
+            # be reaped by the kernel along with their parent.  We
+            # care about clean exits where Python flushes stdio
+            # before terminating; without this hook every
+            # ``hermes chat`` exit would leak pyright processes that
+            # outlive the parent for a few seconds while their
+            # stdout buffers drain.
+            atexit.register(_atexit_shutdown)
+            _atexit_registered = True
+    return _service if (_service is not None and _service.is_active()) else None
+
+
+def shutdown_service() -> None:
+    """Tear down the LSP service if one was started.
+
+    Safe to call multiple times; safe to call when no service was created.
+    """
+    global _service
+    with _service_lock:
+        svc = _service
+        _service = None
+    if svc is not None:
+        try:
+            svc.shutdown()
+        except Exception as e:  # noqa: BLE001
+            logger.debug("LSP shutdown error: %s", e)
+
+
+def _atexit_shutdown() -> None:
+    """atexit-registered wrapper.  Logs at debug because by the time
+    atexit fires the user has already seen the agent's final output —
+    a noisy shutdown line on top of that is just clutter."""
+    try:
+        shutdown_service()
+    except Exception as e:  # noqa: BLE001
+        logger.debug("atexit LSP shutdown failed: %s", e)
+
+
+__all__ = ["get_service", "shutdown_service", "LSPService"]
--- a/agent/lsp/cli.py
+++ b/agent/lsp/cli.py
@ -0,0 +1,270 @@
+"""``hermes lsp`` CLI subcommand.
+
+Subcommands:
+
+- ``status`` — show service state, configured servers, install status.
+- ``install <server_id>`` — eagerly install one server's binary.
+- ``install-all`` — try to install every server with a known recipe.
+- ``restart`` — tear down running clients so the next edit re-spawns.
+- ``which <server_id>`` — print the resolved binary path for one server.
+- ``list`` — print the registry of supported servers.
+
+The handlers are kept here (rather than in
+``hermes_cli/main.py``) so the LSP module ships self-contained.
+"""
+from __future__ import annotations
+
+import argparse
+import sys
+from typing import Optional
+
+
+def register_subparser(subparsers: argparse._SubParsersAction) -> None:
+    """Wire the ``hermes lsp`` subcommand tree into the main argparse."""
+    parser = subparsers.add_parser(
+        "lsp",
+        help="Language Server Protocol management",
+        description=(
+            "Manage the LSP layer that powers post-write semantic "
+            "diagnostics in write_file/patch."
+        ),
+    )
+    sub = parser.add_subparsers(dest="lsp_command")
+
+    sub_status = sub.add_parser("status", help="Show LSP service status")
+    sub_status.add_argument(
+        "--json", action="store_true", help="Emit machine-readable JSON"
+    )
+
+    sub_list = sub.add_parser("list", help="List supported language servers")
+    sub_list.add_argument(
+        "--installed-only",
+        action="store_true",
+        help="Only show servers whose binary is currently available",
+    )
+
+    sub_install = sub.add_parser("install", help="Install a server binary")
+    sub_install.add_argument("server", help="Server id (e.g. pyright, gopls)")
+
+    sub_install_all = sub.add_parser(
+        "install-all",
+        help="Install every server with a known auto-install recipe",
+    )
+    sub_install_all.add_argument(
+        "--include-manual",
+        action="store_true",
+        help="Even attempt servers marked manual-install (best effort)",
+    )
+
+    sub_restart = sub.add_parser(
+        "restart",
+        help="Tear down running LSP clients (next edit re-spawns)",
+    )
+
+    sub_which = sub.add_parser("which", help="Print binary path for a server")
+    sub_which.add_argument("server", help="Server id")
+
+    parser.set_defaults(func=run_lsp_command)
+
+
+def run_lsp_command(args: argparse.Namespace) -> int:
+    """Top-level dispatcher for ``hermes lsp <subcommand>``."""
+    sub = getattr(args, "lsp_command", None) or "status"
+    try:
+        if sub == "status":
+            return _cmd_status(getattr(args, "json", False))
+        if sub == "list":
+            return _cmd_list(getattr(args, "installed_only", False))
+        if sub == "install":
+            return _cmd_install(args.server)
+        if sub == "install-all":
+            return _cmd_install_all(getattr(args, "include_manual", False))
+        if sub == "restart":
+            return _cmd_restart()
+        if sub == "which":
+            return _cmd_which(args.server)
+        sys.stderr.write(f"unknown lsp subcommand: {sub}\n")
+        return 2
+    except KeyboardInterrupt:
+        return 130
+
+
+def _cmd_status(emit_json: bool) -> int:
+    from agent.lsp import get_service
+    from agent.lsp.servers import SERVERS
+    from agent.lsp.install import detect_status
+
+    svc = get_service()
+    service_active = svc is not None
+    info = svc.get_status() if svc is not None else {"enabled": False}
+
+    if emit_json:
+        import json
+        payload = {
+            "service": info,
+            "registry": [
+                {
+                    "server_id": s.server_id,
+                    "extensions": list(s.extensions),
+                    "description": s.description,
+                    "binary_status": detect_status(_recipe_pkg_for(s.server_id)),
+                }
+                for s in SERVERS
+            ],
+        }
+        sys.stdout.write(json.dumps(payload, indent=2) + "\n")
+        return 0
+
+    out = []
+    out.append("LSP Service")
+    out.append("===========")
+    out.append(f"  enabled:         {info.get('enabled', False)}")
+    if service_active:
+        out.append(f"  wait_mode:       {info.get('wait_mode')}")
+        out.append(f"  wait_timeout:    {info.get('wait_timeout')}s")
+        out.append(f"  install_strategy:{info.get('install_strategy')}")
+        clients = info.get("clients") or []
+        if clients:
+            out.append(f"  active clients:  {len(clients)}")
+            for c in clients:
+                out.append(
+                    f"    - {c['server_id']:20s} state={c['state']:10s} root={c['workspace_root']}"
+                )
+        else:
+            out.append("  active clients:  none")
+        broken = info.get("broken") or []
+        if broken:
+            out.append(f"  broken pairs:    {len(broken)}")
+            for b in broken:
+                out.append(f"    - {b}")
+        disabled = info.get("disabled_servers") or []
+        if disabled:
+            out.append(f"  disabled in cfg: {', '.join(disabled)}")
+    out.append("")
+    out.append("Registered Servers")
+    out.append("==================")
+    for s in SERVERS:
+        pkg = _recipe_pkg_for(s.server_id)
+        status = detect_status(pkg)
+        marker = {
+            "installed": "✓",
+            "missing": "·",
+            "manual-only": "?",
+        }.get(status, " ")
+        ext_summary = ", ".join(list(s.extensions)[:5])
+        if len(s.extensions) > 5:
+            ext_summary += f", … (+{len(s.extensions) - 5})"
+        out.append(
+            f"  {marker} {s.server_id:24s} [{status:11s}] {ext_summary}"
+        )
+        if s.description:
+            out.append(f"      {s.description}")
+    sys.stdout.write("\n".join(out) + "\n")
+    return 0
+
+
+def _cmd_list(installed_only: bool) -> int:
+    from agent.lsp.servers import SERVERS
+    from agent.lsp.install import detect_status
+
+    for s in SERVERS:
+        pkg = _recipe_pkg_for(s.server_id)
+        status = detect_status(pkg)
+        if installed_only and status != "installed":
+            continue
+        sys.stdout.write(
+            f"{s.server_id:24s} [{status:11s}] {','.join(s.extensions)}\n"
+        )
+    return 0
+
+
+def _cmd_install(server_id: str) -> int:
+    from agent.lsp.install import try_install, INSTALL_RECIPES, detect_status
+    pkg = _recipe_pkg_for(server_id)
+    pre_status = detect_status(pkg)
+    if pre_status == "installed":
+        sys.stdout.write(f"{server_id} already installed\n")
+        return 0
+    sys.stdout.write(f"installing {server_id} (pkg={pkg}) ...\n")
+    sys.stdout.flush()
+    bin_path = try_install(pkg, "auto")
+    if bin_path is None:
+        recipe = INSTALL_RECIPES.get(pkg)
+        if recipe and recipe.get("strategy") == "manual":
+            sys.stderr.write(
+                f"{server_id}: this server requires a manual install. "
+                f"See documentation.\n"
+            )
+        else:
+            sys.stderr.write(f"{server_id}: install failed (see logs).\n")
+        return 1
+    sys.stdout.write(f"installed: {bin_path}\n")
+    return 0
+
+
+def _cmd_install_all(include_manual: bool) -> int:
+    from agent.lsp.servers import SERVERS
+    from agent.lsp.install import try_install, INSTALL_RECIPES, detect_status
+
+    rc = 0
+    for s in SERVERS:
+        pkg = _recipe_pkg_for(s.server_id)
+        recipe = INSTALL_RECIPES.get(pkg)
+        if recipe is None:
+            continue
+        if recipe.get("strategy") == "manual" and not include_manual:
+            continue
+        if detect_status(pkg) == "installed":
+            sys.stdout.write(f"  {s.server_id:24s} already installed\n")
+            continue
+        sys.stdout.write(f"  installing {s.server_id} (pkg={pkg}) ... ")
+        sys.stdout.flush()
+        path = try_install(pkg, "auto")
+        if path:
+            sys.stdout.write(f"ok ({path})\n")
+        else:
+            sys.stdout.write("FAILED\n")
+            rc = 1
+    return rc
+
+
+def _cmd_restart() -> int:
+    from agent.lsp import shutdown_service
+
+    shutdown_service()
+    sys.stdout.write("LSP service shut down. Next edit will respawn clients.\n")
+    return 0
+
+
+def _cmd_which(server_id: str) -> int:
+    from agent.lsp.install import INSTALL_RECIPES, hermes_lsp_bin_dir
+    import os
+    import shutil as _shutil
+
+    recipe = INSTALL_RECIPES.get(server_id)
+    bin_name = (recipe or {}).get("bin", server_id)
+    staged = hermes_lsp_bin_dir() / bin_name
+    if staged.exists():
+        sys.stdout.write(str(staged) + "\n")
+        return 0
+    on_path = _shutil.which(bin_name)
+    if on_path:
+        sys.stdout.write(on_path + "\n")
+        return 0
+    sys.stderr.write(f"{server_id}: not installed\n")
+    return 1
+
+
+def _recipe_pkg_for(server_id: str) -> str:
+    """Map a registry ``server_id`` to its install-recipe package key."""
+    # The mapping lives here (not in install.py) because it's a CLI
+    # convenience layer.  Most server_ids are also their own recipe
+    # key, but a few differ (e.g. ``vue-language-server`` →
+    # ``@vue/language-server``).
+    aliases = {
+        "vue-language-server": "@vue/language-server",
+        "astro-language-server": "@astrojs/language-server",
+        "dockerfile-ls": "dockerfile-language-server-nodejs",
+        "typescript": "typescript-language-server",
+    }
+    return aliases.get(server_id, server_id)
--- a/agent/lsp/client.py
+++ b/agent/lsp/client.py
@ -0,0 +1,930 @@
+"""Async LSP client over stdin/stdout.
+
+One :class:`LSPClient` corresponds to one ``(language_server, workspace_root)``
+pair — exactly what OpenCode keys clients on, and the same shape Claude
+Code uses.  The client owns a child process, drives the JSON-RPC
+exchange, and exposes:
+
+- :meth:`open_file` / :meth:`change_file` — text document sync
+- :meth:`wait_for_diagnostics` — block until the server emits fresh
+  diagnostics for a specific file (or a timeout fires)
+- :meth:`diagnostics_for` — read the current per-file diagnostic store
+- :meth:`shutdown` — graceful close + SIGTERM/SIGKILL fallback
+
+The class is designed for async use from a single asyncio event loop.
+The :class:`agent.lsp.manager.LSPService` runs an event loop in a
+background thread so the synchronous file_operations layer can call
+into it via :func:`agent.lsp.manager.LSPService.touch_file`.
+
+Implementation notes:
+
+- Push diagnostics are stored per-URI in :attr:`_push_diagnostics` from
+  ``textDocument/publishDiagnostics`` notifications.  Pull diagnostics
+  go in :attr:`_pull_diagnostics`.  The merged view dedupes by content.
+
+- Whole-document sync.  Even when the server advertises incremental
+  sync, we send a single ``contentChanges`` entry replacing the
+  entire document.  Pretending to be incremental while sending a
+  full replacement is well-tolerated by every major server and saves
+  range bookkeeping.  See OpenCode's ``client.ts:584-659`` for the
+  same trick.
+
+- The "touch-file dance": every ``open_file`` call also fires a
+  ``workspace/didChangeWatchedFiles`` notification (CREATED on the
+  first open, CHANGED thereafter).  Some servers (clangd, eslint)
+  only re-scan when this notification fires, even though the LSP spec
+  doesn't strictly require it.
+
+- ``ContentModified`` (-32801) errors get retried with exponential
+  backoff up to 3 times.  This matches Claude Code's
+  ``LSPServerInstance.sendRequest``.
+"""
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+from pathlib import Path
+from typing import Any, Awaitable, Callable, Dict, List, Optional, Set
+from urllib.parse import quote, unquote
+
+from agent.lsp.protocol import (
+    ERROR_CONTENT_MODIFIED,
+    ERROR_METHOD_NOT_FOUND,
+    LSPProtocolError,
+    LSPRequestError,
+    classify_message,
+    encode_message,
+    make_error_response,
+    make_notification,
+    make_request,
+    make_response,
+    read_message,
+)
+
+logger = logging.getLogger("agent.lsp.client")
+
+# Timeouts (seconds) — mirror OpenCode's constants, scaled to seconds.
+INITIALIZE_TIMEOUT = 45.0
+DIAGNOSTICS_DOCUMENT_WAIT = 5.0
+DIAGNOSTICS_FULL_WAIT = 10.0
+DIAGNOSTICS_REQUEST_TIMEOUT = 3.0
+PUSH_DEBOUNCE = 0.15
+SHUTDOWN_GRACE = 1.0  # seconds between SIGTERM and SIGKILL
+
+# Retry policy for transient ContentModified errors.
+MAX_CONTENT_MODIFIED_RETRIES = 3
+RETRY_BASE_DELAY = 0.5  # 0.5, 1.0, 2.0 — exponential
+
+
+def file_uri(path: str) -> str:
+    """Return ``file://`` URI for an absolute filesystem path.
+
+    Mirrors Node's ``pathToFileURL`` — handles spaces, unicode, and
+    Windows drive letters (``C:\\foo`` → ``file:///C:/foo``).
+    """
+    abs_path = os.path.abspath(path)
+    if os.name == "nt":
+        # Windows: backslash → forward slash, prepend extra slash so
+        # the drive letter shows up as part of the path component.
+        abs_path = abs_path.replace("\\", "/")
+        if not abs_path.startswith("/"):
+            abs_path = "/" + abs_path
+    return "file://" + quote(abs_path, safe="/:")
+
+
+def uri_to_path(uri: str) -> str:
+    """Inverse of :func:`file_uri`."""
+    if not uri.startswith("file://"):
+        return uri
+    raw = uri[len("file://"):]
+    if os.name == "nt" and raw.startswith("/") and len(raw) > 2 and raw[2] == ":":
+        raw = raw[1:]  # strip leading slash before drive letter
+    return os.path.normpath(unquote(raw))
+
+
+def _end_position(text: str) -> Dict[str, int]:
+    """Return the LSP Position at the end of ``text``.
+
+    Used to construct a single-range "replace whole document" change
+    for ``textDocument/didChange`` regardless of the server's declared
+    sync mode.
+    """
+    if not text:
+        return {"line": 0, "character": 0}
+    lines = text.splitlines(keepends=False)
+    last_line = len(lines) - 1
+    last_col = len(lines[-1]) if lines else 0
+    # If the text ends with a trailing newline, ``splitlines`` won't
+    # represent it.  The end position is then the start of the next
+    # (empty) line — line index is len(lines), column 0.
+    if text.endswith(("\n", "\r")):
+        return {"line": last_line + 1, "character": 0}
+    return {"line": last_line, "character": last_col}
+
+
+class LSPClient:
+    """Async LSP client tied to one server process and one workspace root.
+
+    Lifecycle:
+
+        c = LSPClient(server_id, workspace_root, command, args, init_options)
+        await c.start()       # spawn + initialize
+        ver = await c.open_file("/path/to/foo.py")
+        await c.wait_for_diagnostics("/path/to/foo.py", ver)
+        diags = c.diagnostics_for("/path/to/foo.py")
+        await c.shutdown()
+    """
+
+    # ------------------------------------------------------------------
+    # construction + lifecycle
+    # ------------------------------------------------------------------
+
+    def __init__(
+        self,
+        *,
+        server_id: str,
+        workspace_root: str,
+        command: List[str],
+        env: Optional[Dict[str, str]] = None,
+        cwd: Optional[str] = None,
+        initialization_options: Optional[Dict[str, Any]] = None,
+        seed_diagnostics_on_first_push: bool = False,
+    ) -> None:
+        self.server_id = server_id
+        self.workspace_root = workspace_root
+        self._command = list(command)
+        self._env = env
+        self._cwd = cwd or workspace_root
+        self._init_options = initialization_options or {}
+        self._seed_first_push = seed_diagnostics_on_first_push
+
+        # Process + streams
+        self._proc: Optional[asyncio.subprocess.Process] = None
+        self._stderr_task: Optional[asyncio.Task] = None
+        self._reader_task: Optional[asyncio.Task] = None
+
+        # Request/response correlation
+        self._next_id: int = 0
+        self._pending: Dict[int, asyncio.Future] = {}
+
+        # Server-side request handlers (server → client requests).
+        # Kept small and explicit; everything else returns method-not-found.
+        self._request_handlers: Dict[str, Callable[[Any], Awaitable[Any]]] = {
+            "window/workDoneProgress/create": self._handle_work_done_create,
+            "workspace/configuration": self._handle_workspace_configuration,
+            "client/registerCapability": self._handle_register_capability,
+            "client/unregisterCapability": self._handle_unregister_capability,
+            "workspace/workspaceFolders": self._handle_workspace_folders,
+            "workspace/diagnostic/refresh": self._handle_diagnostic_refresh,
+        }
+        # Notifications (server → client) we care about.
+        self._notification_handlers: Dict[str, Callable[[Any], None]] = {
+            "textDocument/publishDiagnostics": self._handle_publish_diagnostics,
+            # Everything else (window/showMessage, $/progress, etc.)
+            # is silently dropped by default.
+        }
+
+        # Tracked file state — required for didChange version bumps.
+        self._files: Dict[str, Dict[str, Any]] = {}
+        # Diagnostic stores, keyed by file path (NOT URI).
+        self._push_diagnostics: Dict[str, List[Dict[str, Any]]] = {}
+        self._pull_diagnostics: Dict[str, List[Dict[str, Any]]] = {}
+        # Per-path "last published" time so wait-for-fresh logic works.
+        self._published: Dict[str, float] = {}
+        # Per-path version of the latest push (matches our didChange
+        # version when the server respects it).
+        self._published_version: Dict[str, int] = {}
+        # First-push seen flag, for typescript-style seed-on-first-push.
+        self._first_push_seen: Set[str] = set()
+        # Capability registrations — only diagnostic ones are tracked.
+        self._diagnostic_registrations: Dict[str, Dict[str, Any]] = {}
+
+        # State machine
+        self._state: str = "stopped"
+        self._initialize_result: Optional[Dict[str, Any]] = None
+        self._sync_kind: int = 1  # 1=Full, 2=Incremental
+        self._stopping: bool = False
+
+        # Push event for waiters.
+        self._push_event = asyncio.Event()
+        # Monotonic counter incremented on every publishDiagnostics push.
+        # Waiters snapshot it on entry and treat any increase as
+        # "something happened, recheck the predicate".  Avoids the
+        # asyncio.Event sticky-state trap.
+        self._push_counter = 0
+        # Registration change event so wait_for_diagnostics can re-loop
+        # when the server announces a new dynamic provider.
+        self._registration_event = asyncio.Event()
+
+    @property
+    def is_running(self) -> bool:
+        return self._state == "running" and self._proc is not None and self._proc.returncode is None
+
+    @property
+    def state(self) -> str:
+        return self._state
+
+    async def start(self) -> None:
+        """Spawn the server and complete the initialize handshake.
+
+        Raises any exception encountered during spawn/init.  On failure
+        the process is killed and the client is left in state
+        ``"error"`` — re-call ``start()`` to retry.
+        """
+        if self._state in ("running", "starting"):
+            return
+        self._state = "starting"
+        try:
+            await self._spawn()
+            await self._initialize()
+            self._state = "running"
+        except Exception:
+            self._state = "error"
+            await self._cleanup_process()
+            raise
+
+    async def _spawn(self) -> None:
+        env = dict(os.environ)
+        if self._env:
+            env.update(self._env)
+
+        try:
+            self._proc = await asyncio.create_subprocess_exec(
+                self._command[0],
+                *self._command[1:],
+                stdin=asyncio.subprocess.PIPE,
+                stdout=asyncio.subprocess.PIPE,
+                stderr=asyncio.subprocess.PIPE,
+                env=env,
+                cwd=self._cwd,
+            )
+        except FileNotFoundError as e:
+            raise LSPProtocolError(
+                f"LSP server binary not found: {self._command[0]} ({e})"
+            ) from e
+
+        # Drain stderr at debug level — if we don't, the pipe buffer
+        # fills and the server hangs.
+        self._stderr_task = asyncio.create_task(self._drain_stderr())
+        # Start the reader loop.
+        self._reader_task = asyncio.create_task(self._reader_loop())
+
+    async def _drain_stderr(self) -> None:
+        if self._proc is None or self._proc.stderr is None:
+            return
+        try:
+            while True:
+                line = await self._proc.stderr.readline()
+                if not line:
+                    break
+                text = line.decode("utf-8", errors="replace").rstrip()
+                if text:
+                    logger.debug("[%s] stderr: %s", self.server_id, text[:1000])
+        except (asyncio.CancelledError, OSError):
+            pass
+
+    async def _reader_loop(self) -> None:
+        if self._proc is None or self._proc.stdout is None:
+            return
+        try:
+            while True:
+                msg = await read_message(self._proc.stdout)
+                if msg is None:
+                    logger.debug("[%s] server closed stdout cleanly", self.server_id)
+                    break
+                kind, key = classify_message(msg)
+                if kind == "response":
+                    self._dispatch_response(key, msg)
+                elif kind == "request":
+                    asyncio.create_task(self._dispatch_request(key, msg))
+                elif kind == "notification":
+                    self._dispatch_notification(key, msg)
+                else:
+                    logger.warning("[%s] dropping invalid message: %r", self.server_id, msg)
+        except LSPProtocolError as e:
+            logger.warning("[%s] protocol error in reader loop: %s", self.server_id, e)
+        except (asyncio.CancelledError, OSError):
+            pass
+        finally:
+            # Wake up any pending requests so they can fail fast.
+            for fut in list(self._pending.values()):
+                if not fut.done():
+                    fut.set_exception(LSPProtocolError("server connection closed"))
+            self._pending.clear()
+
+    async def _initialize(self) -> None:
+        params = {
+            "rootUri": file_uri(self.workspace_root),
+            "rootPath": self.workspace_root,
+            "processId": os.getpid(),
+            "workspaceFolders": [
+                {"name": "workspace", "uri": file_uri(self.workspace_root)}
+            ],
+            "initializationOptions": self._init_options,
+            "capabilities": {
+                "window": {"workDoneProgress": True},
+                "workspace": {
+                    "configuration": True,
+                    "workspaceFolders": True,
+                    "didChangeWatchedFiles": {"dynamicRegistration": True},
+                    "diagnostics": {"refreshSupport": False},
+                },
+                "textDocument": {
+                    "synchronization": {
+                        "dynamicRegistration": False,
+                        "didOpen": True,
+                        "didChange": True,
+                        "didSave": True,
+                        "willSave": False,
+                        "willSaveWaitUntil": False,
+                    },
+                    "diagnostic": {
+                        "dynamicRegistration": True,
+                        "relatedDocumentSupport": True,
+                    },
+                    "publishDiagnostics": {
+                        "relatedInformation": True,
+                        "tagSupport": {"valueSet": [1, 2]},
+                        "versionSupport": True,
+                        "codeDescriptionSupport": True,
+                        "dataSupport": False,
+                    },
+                    "hover": {"contentFormat": ["markdown", "plaintext"]},
+                    "definition": {"linkSupport": True},
+                    "references": {},
+                    "documentSymbol": {"hierarchicalDocumentSymbolSupport": True},
+                },
+                "general": {"positionEncodings": ["utf-16"]},
+            },
+        }
+
+        result = await asyncio.wait_for(
+            self._send_request("initialize", params),
+            timeout=INITIALIZE_TIMEOUT,
+        )
+        self._initialize_result = result
+        self._sync_kind = self._extract_sync_kind(result.get("capabilities") or {})
+
+        await self._send_notification("initialized", {})
+        if self._init_options:
+            # Some servers (vtsls, eslint) want config pushed via
+            # didChangeConfiguration even if it was sent in
+            # initializationOptions.
+            await self._send_notification(
+                "workspace/didChangeConfiguration",
+                {"settings": self._init_options},
+            )
+
+    @staticmethod
+    def _extract_sync_kind(capabilities: dict) -> int:
+        sync = capabilities.get("textDocumentSync")
+        if isinstance(sync, int):
+            return sync
+        if isinstance(sync, dict):
+            change = sync.get("change")
+            if isinstance(change, int):
+                return change
+        return 1  # default to Full
+
+    async def shutdown(self) -> None:
+        """Best-effort graceful shutdown.
+
+        Sends ``shutdown`` + ``exit``, then SIGTERMs/SIGKILLs the
+        process if it doesn't exit cleanly.  Idempotent.
+        """
+        if self._stopping:
+            return
+        self._stopping = True
+        try:
+            if self.is_running:
+                try:
+                    await asyncio.wait_for(self._send_request("shutdown", None), timeout=2.0)
+                except (asyncio.TimeoutError, LSPRequestError, LSPProtocolError):
+                    pass
+                try:
+                    await self._send_notification("exit", None)
+                except Exception:
+                    pass
+        finally:
+            self._state = "stopped"
+            await self._cleanup_process()
+
+    async def _cleanup_process(self) -> None:
+        if self._reader_task is not None and not self._reader_task.done():
+            self._reader_task.cancel()
+            try:
+                await self._reader_task
+            except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                pass
+        if self._stderr_task is not None and not self._stderr_task.done():
+            self._stderr_task.cancel()
+            try:
+                await self._stderr_task
+            except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                pass
+        proc = self._proc
+        self._proc = None
+        if proc is None:
+            return
+        if proc.returncode is None:
+            try:
+                proc.terminate()
+                try:
+                    await asyncio.wait_for(proc.wait(), timeout=SHUTDOWN_GRACE)
+                except asyncio.TimeoutError:
+                    try:
+                        proc.kill()
+                        await proc.wait()
+                    except ProcessLookupError:
+                        pass
+            except ProcessLookupError:
+                pass
+
+    # ------------------------------------------------------------------
+    # request / notification plumbing
+    # ------------------------------------------------------------------
+
+    async def _send_request(self, method: str, params: Any) -> Any:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            raise LSPProtocolError(f"cannot send {method!r}: stdin closed")
+        loop = asyncio.get_running_loop()
+        req_id = self._next_id
+        self._next_id += 1
+        fut: asyncio.Future = loop.create_future()
+        self._pending[req_id] = fut
+        try:
+            self._proc.stdin.write(encode_message(make_request(req_id, method, params)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError) as e:
+            self._pending.pop(req_id, None)
+            raise LSPProtocolError(f"send failed for {method!r}: {e}") from e
+        try:
+            return await fut
+        finally:
+            self._pending.pop(req_id, None)
+
+    async def _send_request_with_retry(self, method: str, params: Any, *, timeout: float) -> Any:
+        """Send a request, retrying on ``ContentModified`` (-32801).
+
+        Other errors propagate.  The retry policy matches Claude Code's
+        ``LSPServerInstance.sendRequest`` — 3 attempts with delays
+        0.5s, 1.0s, 2.0s.
+        """
+        for attempt in range(MAX_CONTENT_MODIFIED_RETRIES + 1):
+            try:
+                return await asyncio.wait_for(self._send_request(method, params), timeout=timeout)
+            except LSPRequestError as e:
+                if e.code == ERROR_CONTENT_MODIFIED and attempt < MAX_CONTENT_MODIFIED_RETRIES:
+                    await asyncio.sleep(RETRY_BASE_DELAY * (2 ** attempt))
+                    continue
+                raise
+
+    async def _send_notification(self, method: str, params: Any) -> None:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            return
+        try:
+            self._proc.stdin.write(encode_message(make_notification(method, params)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError) as e:
+            logger.debug("[%s] notify %s failed: %s", self.server_id, method, e)
+
+    async def _send_response(self, req_id: Any, result: Any) -> None:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            return
+        try:
+            self._proc.stdin.write(encode_message(make_response(req_id, result)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError):
+            pass
+
+    async def _send_error_response(self, req_id: Any, code: int, message: str) -> None:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            return
+        try:
+            self._proc.stdin.write(encode_message(make_error_response(req_id, code, message)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError):
+            pass
+
+    def _dispatch_response(self, req_id: int, msg: dict) -> None:
+        fut = self._pending.get(req_id)
+        if fut is None or fut.done():
+            return
+        if "error" in msg:
+            err = msg["error"] or {}
+            fut.set_exception(
+                LSPRequestError(
+                    code=int(err.get("code", -32000)),
+                    message=str(err.get("message", "unknown")),
+                    data=err.get("data"),
+                )
+            )
+        else:
+            fut.set_result(msg.get("result"))
+
+    async def _dispatch_request(self, req_id: Any, msg: dict) -> None:
+        method = msg.get("method", "")
+        params = msg.get("params")
+        handler = self._request_handlers.get(method)
+        if handler is None:
+            await self._send_error_response(req_id, ERROR_METHOD_NOT_FOUND, f"method not found: {method}")
+            return
+        try:
+            result = await handler(params)
+        except Exception as e:  # noqa: BLE001 — protocol must not blow up
+            logger.warning("[%s] request handler %s failed: %s", self.server_id, method, e)
+            await self._send_error_response(req_id, -32000, f"handler failed: {e}")
+            return
+        await self._send_response(req_id, result)
+
+    def _dispatch_notification(self, method: str, msg: dict) -> None:
+        handler = self._notification_handlers.get(method)
+        if handler is None:
+            return
+        try:
+            handler(msg.get("params"))
+        except Exception as e:  # noqa: BLE001
+            logger.debug("[%s] notification handler %s failed: %s", self.server_id, method, e)
+
+    # ------------------------------------------------------------------
+    # built-in server-→-client request handlers
+    # ------------------------------------------------------------------
+
+    async def _handle_work_done_create(self, params: Any) -> Any:
+        # Acknowledge progress tokens — required by some servers.
+        return None
+
+    async def _handle_workspace_configuration(self, params: Any) -> Any:
+        # Walk dotted sections through initializationOptions.  Mirrors
+        # OpenCode's `client.ts:198-220` — return null when missing.
+        if not isinstance(params, dict):
+            return [None]
+        items = params.get("items") or []
+        out: List[Any] = []
+        for item in items:
+            if not isinstance(item, dict):
+                out.append(None)
+                continue
+            section = item.get("section")
+            if not section or not self._init_options:
+                out.append(self._init_options or None)
+                continue
+            cur: Any = self._init_options
+            for part in str(section).split("."):
+                if isinstance(cur, dict) and part in cur:
+                    cur = cur[part]
+                else:
+                    cur = None
+                    break
+            out.append(cur)
+        return out
+
+    async def _handle_register_capability(self, params: Any) -> Any:
+        if not isinstance(params, dict):
+            return None
+        for reg in params.get("registrations") or []:
+            if not isinstance(reg, dict):
+                continue
+            method = reg.get("method")
+            reg_id = reg.get("id")
+            if method == "textDocument/diagnostic" and reg_id:
+                self._diagnostic_registrations[str(reg_id)] = reg
+                self._registration_event.set()
+        return None
+
+    async def _handle_unregister_capability(self, params: Any) -> Any:
+        if not isinstance(params, dict):
+            return None
+        for unreg in params.get("unregisterations") or []:
+            if not isinstance(unreg, dict):
+                continue
+            reg_id = unreg.get("id")
+            if reg_id:
+                self._diagnostic_registrations.pop(str(reg_id), None)
+        return None
+
+    async def _handle_workspace_folders(self, params: Any) -> Any:
+        return [{"name": "workspace", "uri": file_uri(self.workspace_root)}]
+
+    async def _handle_diagnostic_refresh(self, params: Any) -> Any:
+        # We don't honour refresh — we re-pull on every touchFile.
+        return None
+
+    # ------------------------------------------------------------------
+    # publishDiagnostics handler
+    # ------------------------------------------------------------------
+
+    def _handle_publish_diagnostics(self, params: Any) -> None:
+        if not isinstance(params, dict):
+            return
+        uri = params.get("uri")
+        if not isinstance(uri, str):
+            return
+        path = uri_to_path(uri)
+        diagnostics = params.get("diagnostics") or []
+        if not isinstance(diagnostics, list):
+            diagnostics = []
+        version = params.get("version")
+        loop_time = asyncio.get_event_loop().time()
+
+        if self._seed_first_push and path not in self._first_push_seen:
+            # First push: seed without firing the event so a waiter
+            # doesn't resolve on the very first push (which arrives
+            # before the user-triggered didChange could've produced
+            # fresh diagnostics).
+            self._first_push_seen.add(path)
+            self._push_diagnostics[path] = diagnostics
+            self._published[path] = loop_time
+            if isinstance(version, int):
+                self._published_version[path] = version
+            return
+
+        self._push_diagnostics[path] = diagnostics
+        self._published[path] = loop_time
+        if isinstance(version, int):
+            self._published_version[path] = version
+        self._first_push_seen.add(path)
+        # Bump the monotonic push counter and wake every waiter.  We
+        # keep the Event sticky-set so any wait already in progress
+        # resolves; waiters re-check their predicate after waking and
+        # decide whether to keep waiting.  ``_push_counter`` is what
+        # they actually compare against to detect a fresh event.
+        self._push_counter += 1
+        self._push_event.set()
+
+    # ------------------------------------------------------------------
+    # public file-sync API
+    # ------------------------------------------------------------------
+
+    async def open_file(self, path: str, *, language_id: str = "plaintext") -> int:
+        """Send didOpen (first time) or didChange (subsequent) for ``path``.
+
+        Returns the new document version number that the agent's
+        ``wait_for_diagnostics`` should match against.
+        """
+        if not self.is_running:
+            raise LSPProtocolError("client not running")
+
+        abs_path = os.path.abspath(path)
+        try:
+            text = Path(abs_path).read_text(encoding="utf-8", errors="replace")
+        except OSError as e:
+            raise LSPProtocolError(f"cannot read {abs_path}: {e}") from e
+
+        uri = file_uri(abs_path)
+        existing = self._files.get(abs_path)
+
+        if existing is not None:
+            # Re-open: bump version, fire didChangeWatchedFiles + didChange.
+            await self._send_notification(
+                "workspace/didChangeWatchedFiles",
+                {"changes": [{"uri": uri, "type": 2}]},  # 2 = CHANGED
+            )
+            new_version = existing["version"] + 1
+            old_text = existing["text"]
+            content_changes: List[Dict[str, Any]]
+            if self._sync_kind == 2:
+                content_changes = [
+                    {
+                        "range": {
+                            "start": {"line": 0, "character": 0},
+                            "end": _end_position(old_text),
+                        },
+                        "text": text,
+                    }
+                ]
+            else:
+                content_changes = [{"text": text}]
+            await self._send_notification(
+                "textDocument/didChange",
+                {
+                    "textDocument": {"uri": uri, "version": new_version},
+                    "contentChanges": content_changes,
+                },
+            )
+            self._files[abs_path] = {"version": new_version, "text": text}
+            return new_version
+
+        # First open: didChangeWatchedFiles CREATED + didOpen.
+        await self._send_notification(
+            "workspace/didChangeWatchedFiles",
+            {"changes": [{"uri": uri, "type": 1}]},  # 1 = CREATED
+        )
+        # Clear any stale push/pull entries — fresh open should start
+        # from scratch.
+        self._push_diagnostics.pop(abs_path, None)
+        self._pull_diagnostics.pop(abs_path, None)
+        self._published.pop(abs_path, None)
+        self._published_version.pop(abs_path, None)
+        await self._send_notification(
+            "textDocument/didOpen",
+            {
+                "textDocument": {
+                    "uri": uri,
+                    "languageId": language_id,
+                    "version": 0,
+                    "text": text,
+                }
+            },
+        )
+        self._files[abs_path] = {"version": 0, "text": text}
+        return 0
+
+    async def save_file(self, path: str) -> None:
+        """Send didSave for ``path``.  Some linters re-scan only on save."""
+        if not self.is_running:
+            return
+        abs_path = os.path.abspath(path)
+        await self._send_notification(
+            "textDocument/didSave",
+            {"textDocument": {"uri": file_uri(abs_path)}},
+        )
+
+    # ------------------------------------------------------------------
+    # diagnostics: pull + wait
+    # ------------------------------------------------------------------
+
+    async def _pull_document_diagnostics(self, path: str) -> None:
+        """Send ``textDocument/diagnostic`` for one file.
+
+        Stores results into :attr:`_pull_diagnostics`.  Silently
+        no-ops on errors (server may not support the pull endpoint).
+        """
+        try:
+            params: Dict[str, Any] = {
+                "textDocument": {"uri": file_uri(os.path.abspath(path))}
+            }
+            result = await self._send_request_with_retry(
+                "textDocument/diagnostic",
+                params,
+                timeout=DIAGNOSTICS_REQUEST_TIMEOUT,
+            )
+        except (LSPRequestError, LSPProtocolError, asyncio.TimeoutError) as e:
+            logger.debug("[%s] document diagnostic pull failed: %s", self.server_id, e)
+            return
+        if not isinstance(result, dict):
+            return
+        items = result.get("items")
+        if isinstance(items, list):
+            self._pull_diagnostics[os.path.abspath(path)] = items
+        related = result.get("relatedDocuments")
+        if isinstance(related, dict):
+            for uri, sub in related.items():
+                if not isinstance(sub, dict):
+                    continue
+                sub_items = sub.get("items")
+                if isinstance(sub_items, list):
+                    self._pull_diagnostics[uri_to_path(uri)] = sub_items
+
+    async def wait_for_diagnostics(
+        self,
+        path: str,
+        version: int,
+        *,
+        mode: str = "document",
+    ) -> None:
+        """Wait for the server to publish diagnostics for ``path`` at ``version``.
+
+        ``mode`` is ``"document"`` (5s budget, document pulls) or
+        ``"full"`` (10s budget, also workspace pulls).  Best-effort —
+        returns silently on timeout.  Does NOT throw if the server
+        doesn't support pull diagnostics; we still get the push side.
+        """
+        budget = DIAGNOSTICS_FULL_WAIT if mode == "full" else DIAGNOSTICS_DOCUMENT_WAIT
+        deadline = asyncio.get_event_loop().time() + budget
+        abs_path = os.path.abspath(path)
+
+        while True:
+            remaining = deadline - asyncio.get_event_loop().time()
+            if remaining <= 0:
+                return
+
+            # Concurrent: document pull + push wait.
+            pull_task = asyncio.create_task(self._pull_document_diagnostics(abs_path))
+            push_task = asyncio.create_task(self._wait_for_fresh_push(abs_path, version, remaining))
+            done, pending = await asyncio.wait(
+                {pull_task, push_task},
+                timeout=remaining,
+                return_when=asyncio.FIRST_COMPLETED,
+            )
+            for t in pending:
+                t.cancel()
+            for t in pending:
+                try:
+                    await t
+                except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                    pass
+
+            # If we got a fresh push for our version, we're done.
+            current_v = self._published_version.get(abs_path)
+            if abs_path in self._published and (
+                current_v is None or current_v >= version
+            ):
+                return
+
+            # Pull may have populated _pull_diagnostics — that's also
+            # success.
+            if abs_path in self._pull_diagnostics:
+                return
+
+            # Loop until budget runs out.
+
+    async def _wait_for_fresh_push(self, path: str, version: int, timeout: float) -> None:
+        """Wait until a publishDiagnostics arrives for ``path`` at ``version``+."""
+        deadline = asyncio.get_event_loop().time() + timeout
+        baseline = self._push_counter
+        while True:
+            current_v = self._published_version.get(path)
+            if path in self._published and (current_v is None or current_v >= version):
+                # Debounce — wait a tick in case more diagnostics arrive
+                # immediately after.  TS often emits in pairs.  We
+                # snapshot the counter so we wake on a *new* push, not
+                # on the one that satisfied us a moment ago.
+                debounce_baseline = self._push_counter
+                debounce_deadline = asyncio.get_event_loop().time() + PUSH_DEBOUNCE
+                while self._push_counter == debounce_baseline:
+                    remaining = debounce_deadline - asyncio.get_event_loop().time()
+                    if remaining <= 0:
+                        break
+                    self._push_event.clear()
+                    try:
+                        await asyncio.wait_for(self._push_event.wait(), timeout=remaining)
+                    except asyncio.TimeoutError:
+                        break
+                return
+            remaining = deadline - asyncio.get_event_loop().time()
+            if remaining <= 0:
+                return
+            if self._push_counter > baseline:
+                # New event arrived but predicate still false — re-check
+                # immediately without waiting again.
+                baseline = self._push_counter
+                continue
+            self._push_event.clear()
+            try:
+                await asyncio.wait_for(self._push_event.wait(), timeout=min(remaining, 0.5))
+            except asyncio.TimeoutError:
+                continue
+
+    def diagnostics_for(self, path: str) -> List[Dict[str, Any]]:
+        """Return current merged + deduped diagnostics for one file.
+
+        Diagnostics from push and pull stores are concatenated and
+        deduplicated by ``(severity, code, message, range)`` content
+        key.  Empty list if the server hasn't published anything.
+        """
+        abs_path = os.path.abspath(path)
+        push = self._push_diagnostics.get(abs_path) or []
+        pull = self._pull_diagnostics.get(abs_path) or []
+        return _dedupe(push, pull)
+
+
+def _dedupe(*lists: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    seen: Set[str] = set()
+    out: List[Dict[str, Any]] = []
+    for lst in lists:
+        for d in lst:
+            if not isinstance(d, dict):
+                continue
+            key = _diagnostic_key(d)
+            if key in seen:
+                continue
+            seen.add(key)
+            out.append(d)
+    return out
+
+
+def _diagnostic_key(d: Dict[str, Any]) -> str:
+    """Content-equality key for a diagnostic.
+
+    Matches the structural-equality used in claude-code's
+    ``areDiagnosticsEqual`` — message + severity + source + code +
+    range coords.  The range is reduced to a tuple to keep the key
+    stable across dict orderings.
+    """
+    rng = d.get("range") or {}
+    start = rng.get("start") or {}
+    end = rng.get("end") or {}
+    code = d.get("code")
+    if code is not None and not isinstance(code, str):
+        code = str(code)
+    return "\x00".join(
+        [
+            str(d.get("severity") or 1),
+            str(code or ""),
+            str(d.get("source") or ""),
+            str(d.get("message") or "").strip(),
+            f"{start.get('line', 0)}:{start.get('character', 0)}-{end.get('line', 0)}:{end.get('character', 0)}",
+        ]
+    )
+
+
+__all__ = [
+    "LSPClient",
+    "file_uri",
+    "uri_to_path",
+    "INITIALIZE_TIMEOUT",
+    "DIAGNOSTICS_DOCUMENT_WAIT",
+    "DIAGNOSTICS_FULL_WAIT",
+]
--- a/agent/lsp/eventlog.py
+++ b/agent/lsp/eventlog.py
@ -0,0 +1,213 @@
+"""Structured logging with steady-state silence for the LSP layer.
+
+The LSP layer fires on every write_file/patch.  In a busy session
+that's hundreds of events.  We want users to be able to ``rg`` the
+log for "did LSP fire on that edit?" without drowning in noise.
+
+The level model:
+
+- ``DEBUG`` for steady-state events that have no novel signal:
+  ``clean``, ``feature off``, ``extension not mapped``, ``no project
+  root for already-announced file``, ``server unavailable for
+  already-announced binary``.  These never reach ``agent.log`` at the
+  default INFO threshold.
+
+- ``INFO`` for state transitions worth surfacing exactly once per
+  session: ``active for <root>`` the first time a (server_id,
+  workspace_root) client starts, ``no project root for <path>``
+  the first time we see that file.  Plus every diagnostic event
+  (those are inherently rare and per-edit, exactly what users grep
+  for).
+
+- ``WARNING`` for action-required failures: ``server unavailable``
+  (binary not on PATH) the first time per (server_id, binary),
+  ``no server configured`` once per language.  Per-call WARNING for
+  timeouts and unexpected bridge exceptions.
+
+The dedup is in-process module-level sets.  Each set grows at most by
+the number of distinct (server_id, root) and (server_id, binary)
+pairs touched in one Python process — bytes of memory in even an
+aggressive monorepo session.  Bounded LRU was rejected: evicting an
+entry would risk re-firing the WARNING/INFO line we explicitly want
+to suppress.
+
+Grep recipe::
+
+    tail -f ~/.hermes/logs/agent.log | rg 'lsp\\['
+"""
+from __future__ import annotations
+
+import logging
+import os
+import threading
+from typing import Tuple
+
+# Dedicated logger name so the documented grep recipe survives a
+# ``logging.getLogger(__name__)`` rename of any internal module.
+event_log = logging.getLogger("hermes.lint.lsp")
+
+# ---------------------------------------------------------------------------
+# Once-per-X dedup sets
+# ---------------------------------------------------------------------------
+
+_announce_lock = threading.Lock()
+_announced_active: set = set()        # keys: (server_id, workspace_root)
+_announced_unavailable: set = set()   # keys: (server_id, binary_path_or_name)
+_announced_no_root: set = set()       # keys: (server_id, file_path)
+_announced_no_server: set = set()     # keys: (server_id,)
+
+
+def _short_path(file_path: str) -> str:
+    """Render *file_path* relative to the cwd when sensible, else absolute.
+
+    Keeps log lines readable for the common case (the user is inside
+    the project they're editing) without emitting brittle ``../../..``
+    chains for the cross-tree case.
+    """
+    if not file_path:
+        return file_path
+    try:
+        rel = os.path.relpath(file_path)
+    except ValueError:
+        return file_path
+    if rel.startswith(".." + os.sep) or rel == "..":
+        return file_path
+    return rel
+
+
+def _emit(server_id: str, level: int, message: str) -> None:
+    event_log.log(level, "lsp[%s] %s", server_id, message)
+
+
+def _announce_once(bucket: set, key: Tuple) -> bool:
+    """Return True if *key* has not been announced for *bucket* yet.
+
+    Atomically marks the key as announced so concurrent callers
+    cannot both win the race and double-log.
+    """
+    with _announce_lock:
+        if key in bucket:
+            return False
+        bucket.add(key)
+        return True
+
+
+# ---------------------------------------------------------------------------
+# Public event helpers — call these from the LSP layer.
+# ---------------------------------------------------------------------------
+
+
+def log_clean(server_id: str, file_path: str) -> None:
+    """No diagnostics emitted for *file_path*.  DEBUG (silent at default)."""
+    _emit(server_id, logging.DEBUG, f"clean ({_short_path(file_path)})")
+
+
+def log_disabled(server_id: str, file_path: str, reason: str) -> None:
+    """LSP intentionally skipped for this file (feature off, ext unmapped,
+    backend not local, etc.).  DEBUG."""
+    _emit(server_id, logging.DEBUG, f"skipped: {reason} ({_short_path(file_path)})")
+
+
+def log_active(server_id: str, workspace_root: str) -> None:
+    """A new LSP client started for (server_id, workspace_root).
+
+    INFO once per (server_id, workspace_root); DEBUG thereafter.
+    Lets users verify "is LSP actually running?" with a single grep.
+    """
+    key = (server_id, workspace_root)
+    if _announce_once(_announced_active, key):
+        _emit(server_id, logging.INFO, f"active for {workspace_root}")
+    else:
+        _emit(server_id, logging.DEBUG, f"reused client for {workspace_root}")
+
+
+def log_diagnostics(server_id: str, file_path: str, count: int) -> None:
+    """Diagnostics arrived for a file.  INFO every time — these are the
+    failure signals users actually want to grep for, and they are
+    inherently rare per edit."""
+    _emit(server_id, logging.INFO, f"{count} diags ({_short_path(file_path)})")
+
+
+def log_no_project_root(server_id: str, file_path: str) -> None:
+    """File had no recognised project marker.  INFO once per file,
+    DEBUG thereafter."""
+    key = (server_id, file_path)
+    if _announce_once(_announced_no_root, key):
+        _emit(server_id, logging.INFO, f"no project root for {_short_path(file_path)}")
+    else:
+        _emit(server_id, logging.DEBUG, f"no project root for {_short_path(file_path)}")
+
+
+def log_server_unavailable(server_id: str, binary_or_pkg: str) -> None:
+    """The server binary couldn't be resolved.  WARNING once per
+    (server_id, binary), DEBUG thereafter so a hundred subsequent
+    .py edits don't spam the log."""
+    key = (server_id, binary_or_pkg)
+    if _announce_once(_announced_unavailable, key):
+        _emit(
+            server_id,
+            logging.WARNING,
+            f"server unavailable: {binary_or_pkg} not found "
+            "(install via `hermes lsp install <id>` or set lsp.servers.<id>.command)",
+        )
+    else:
+        _emit(server_id, logging.DEBUG, f"server still unavailable: {binary_or_pkg}")
+
+
+def log_no_server_configured(server_id: str) -> None:
+    """No spawn recipe for this language.  WARNING once."""
+    if _announce_once(_announced_no_server, (server_id,)):
+        _emit(server_id, logging.WARNING, "no server configured")
+
+
+def log_timeout(server_id: str, file_path: str, kind: str = "diagnostics") -> None:
+    """A request to the server timed out.  WARNING every time — these are
+    inherently novel events worth surfacing on each occurrence."""
+    _emit(
+        server_id,
+        logging.WARNING,
+        f"{kind} timed out for {_short_path(file_path)}",
+    )
+
+
+def log_server_error(server_id: str, file_path: str, exc: BaseException) -> None:
+    """An unexpected exception bubbled out of the LSP layer.  WARNING."""
+    _emit(
+        server_id,
+        logging.WARNING,
+        f"unexpected error for {_short_path(file_path)}: {type(exc).__name__}: {exc}",
+    )
+
+
+def log_spawn_failed(server_id: str, workspace_root: str, exc: BaseException) -> None:
+    """The LSP server failed to spawn or initialize.  WARNING."""
+    _emit(
+        server_id,
+        logging.WARNING,
+        f"spawn/initialize failed for {workspace_root}: {type(exc).__name__}: {exc}",
+    )
+
+
+def reset_announce_caches() -> None:
+    """Test-only: clear the dedup caches.  Production code never calls this."""
+    with _announce_lock:
+        _announced_active.clear()
+        _announced_unavailable.clear()
+        _announced_no_root.clear()
+        _announced_no_server.clear()
+
+
+__all__ = [
+    "event_log",
+    "log_clean",
+    "log_disabled",
+    "log_active",
+    "log_diagnostics",
+    "log_no_project_root",
+    "log_server_unavailable",
+    "log_no_server_configured",
+    "log_timeout",
+    "log_server_error",
+    "log_spawn_failed",
+    "reset_announce_caches",
+]
--- a/agent/lsp/install.py
+++ b/agent/lsp/install.py
@ -0,0 +1,347 @@
+"""Auto-installation of LSP server binaries.
+
+Tries to install missing servers using whatever package manager is
+appropriate.  All installs go to a Hermes-owned bin staging dir,
+``<HERMES_HOME>/lsp/bin/``, so we don't pollute the user's global
+toolchain.
+
+Strategies:
+
+- ``auto`` — attempt to install with the best available package
+  manager.  This is the default.
+- ``manual`` — never install; if a binary is missing, the server is
+  silently skipped and the user is told about it via ``hermes lsp
+  status``.
+- ``off`` — same as ``manual`` for now (kept distinct so we can
+  evolve behavior later, e.g. logging differently).
+
+The actual installs happen synchronously the first time a server is
+needed and concurrent calls to :func:`try_install` for the same
+package are deduplicated via a per-package lock.
+
+Failure modes are non-fatal: every install path is wrapped in
+try/except and returns ``None`` on failure.  The tool layer then
+falls back to its in-process syntax checker, exactly as if the user
+hadn't enabled LSP at all.
+"""
+from __future__ import annotations
+
+import logging
+import os
+import shutil
+import subprocess
+import sys
+import threading
+from pathlib import Path
+from typing import Dict, Optional
+
+logger = logging.getLogger("agent.lsp.install")
+
+# Package-name → install-strategy hint registry.  Each entry is a
+# tuple of strategy name + package name + executable name.  When the
+# install completes, we look for the executable in
+# ``<HERMES_HOME>/lsp/bin/`` first, then on PATH.
+INSTALL_RECIPES: Dict[str, Dict[str, str]] = {
+    # Python
+    "pyright": {"strategy": "npm", "pkg": "pyright", "bin": "pyright-langserver"},
+    # JS/TS family
+    "typescript-language-server": {
+        "strategy": "npm",
+        "pkg": "typescript-language-server",
+        "bin": "typescript-language-server",
+    },
+    "@vue/language-server": {
+        "strategy": "npm",
+        "pkg": "@vue/language-server",
+        "bin": "vue-language-server",
+    },
+    "svelte-language-server": {
+        "strategy": "npm",
+        "pkg": "svelte-language-server",
+        "bin": "svelteserver",
+    },
+    "@astrojs/language-server": {
+        "strategy": "npm",
+        "pkg": "@astrojs/language-server",
+        "bin": "astro-ls",
+    },
+    "yaml-language-server": {
+        "strategy": "npm",
+        "pkg": "yaml-language-server",
+        "bin": "yaml-language-server",
+    },
+    "bash-language-server": {
+        "strategy": "npm",
+        "pkg": "bash-language-server",
+        "bin": "bash-language-server",
+    },
+    "intelephense": {"strategy": "npm", "pkg": "intelephense", "bin": "intelephense"},
+    "dockerfile-language-server-nodejs": {
+        "strategy": "npm",
+        "pkg": "dockerfile-language-server-nodejs",
+        "bin": "docker-langserver",
+    },
+    # Go
+    "gopls": {"strategy": "go", "pkg": "golang.org/x/tools/gopls@latest", "bin": "gopls"},
+    # Rust — too heavy (hundreds of MB to bootstrap).  We do NOT
+    # auto-install rust-analyzer; users install via rustup.
+    "rust-analyzer": {"strategy": "manual", "pkg": "", "bin": "rust-analyzer"},
+    # C/C++ — manual (clangd ships with LLVM, very heavy)
+    "clangd": {"strategy": "manual", "pkg": "", "bin": "clangd"},
+    # Lua — manual (LuaLS is platform-specific binaries from GitHub
+    # releases; complex enough that we punt to the user)
+    "lua-language-server": {"strategy": "manual", "pkg": "", "bin": "lua-language-server"},
+}
+
+
+_install_locks: Dict[str, threading.Lock] = {}
+_install_results: Dict[str, Optional[str]] = {}
+_install_lock_meta = threading.Lock()
+
+
+def hermes_lsp_bin_dir() -> Path:
+    """Return the Hermes-owned bin staging dir for LSP servers."""
+    home = os.environ.get("HERMES_HOME")
+    if home is None:
+        home = os.path.join(os.path.expanduser("~"), ".hermes")
+    p = Path(home) / "lsp" / "bin"
+    p.mkdir(parents=True, exist_ok=True)
+    return p
+
+
+def _existing_binary(name: str) -> Optional[str]:
+    """Probe the staging dir + PATH for a binary named ``name``."""
+    staged = hermes_lsp_bin_dir() / name
+    if staged.exists() and os.access(staged, os.X_OK):
+        return str(staged)
+    on_path = shutil.which(name)
+    if on_path:
+        return on_path
+    return None
+
+
+def _get_lock(pkg: str) -> threading.Lock:
+    with _install_lock_meta:
+        lock = _install_locks.get(pkg)
+        if lock is None:
+            lock = threading.Lock()
+            _install_locks[pkg] = lock
+        return lock
+
+
+def try_install(pkg: str, strategy: str = "auto") -> Optional[str]:
+    """Try to install ``pkg`` and return the binary path if successful.
+
+    ``strategy`` is ``"auto"``, ``"manual"``, or ``"off"``.  In
+    ``manual``/``off`` mode, this function only probes for an
+    existing binary and returns ``None`` if not found.
+
+    The install is cached per-package — a second call returns the
+    same path (or ``None``) without reinstalling.  Concurrent calls
+    are serialized.
+    """
+    if strategy not in ("auto",):
+        # Only ``auto`` triggers an actual install.  In manual/off,
+        # we still check whether the binary already exists.
+        recipe = INSTALL_RECIPES.get(pkg, {})
+        bin_name = recipe.get("bin", pkg)
+        return _existing_binary(bin_name)
+
+    if pkg in _install_results:
+        return _install_results[pkg]
+
+    lock = _get_lock(pkg)
+    with lock:
+        # Double-check after acquiring lock.
+        if pkg in _install_results:
+            return _install_results[pkg]
+        result = _do_install(pkg)
+        _install_results[pkg] = result
+        return result
+
+
+def _do_install(pkg: str) -> Optional[str]:
+    recipe = INSTALL_RECIPES.get(pkg)
+    if recipe is None:
+        # Not in our registry — best-effort: just probe PATH.
+        return shutil.which(pkg)
+
+    strategy = recipe.get("strategy", "manual")
+    bin_name = recipe.get("bin", pkg)
+
+    # Check if already present (shutil.which or staging dir)
+    existing = _existing_binary(bin_name)
+    if existing:
+        return existing
+
+    if strategy == "manual":
+        logger.debug("[install] %s requires manual install (recipe=%s)", pkg, recipe)
+        return None
+
+    if strategy == "npm":
+        return _install_npm(recipe.get("pkg", pkg), bin_name)
+    if strategy == "go":
+        return _install_go(recipe.get("pkg", pkg), bin_name)
+    if strategy == "pip":
+        return _install_pip(recipe.get("pkg", pkg), bin_name)
+
+    logger.warning("[install] unknown strategy %r for %s", strategy, pkg)
+    return None
+
+
+def _install_npm(pkg: str, bin_name: str) -> Optional[str]:
+    """Install an npm package into our staging dir.
+
+    Uses ``npm install --prefix`` so the binaries land in
+    ``<staging>/node_modules/.bin/<bin_name>`` and we symlink them up
+    one level for direct PATH-style access.
+    """
+    npm = shutil.which("npm")
+    if npm is None:
+        logger.info("[install] cannot install %s: npm not on PATH", pkg)
+        return None
+    staging = hermes_lsp_bin_dir().parent  # <HERMES_HOME>/lsp/
+    try:
+        logger.info("[install] npm install --prefix %s %s", staging, pkg)
+        proc = subprocess.run(
+            [npm, "install", "--prefix", str(staging), "--silent", "--no-fund", "--no-audit", pkg],
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=300,
+        )
+        if proc.returncode != 0:
+            logger.warning(
+                "[install] npm install failed for %s: %s", pkg, proc.stderr.strip()[:500]
+            )
+            return None
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.warning("[install] npm install errored for %s: %s", pkg, e)
+        return None
+
+    # Find the bin
+    nm_bin = staging / "node_modules" / ".bin" / bin_name
+    if os.name == "nt":
+        # On Windows npm sometimes drops `.cmd` shims
+        candidates = [nm_bin, nm_bin.with_suffix(".cmd")]
+    else:
+        candidates = [nm_bin]
+    for c in candidates:
+        if c.exists():
+            # Symlink into our `lsp/bin/` for stable PATH access.
+            link = hermes_lsp_bin_dir() / c.name
+            if not link.exists():
+                try:
+                    link.symlink_to(c)
+                except (OSError, NotImplementedError):
+                    # Symlinks fail on some Windows setups — copy instead.
+                    try:
+                        shutil.copy2(c, link)
+                    except OSError:
+                        return str(c)
+            return str(link if link.exists() else c)
+    logger.warning("[install] npm install for %s succeeded but bin %s not found", pkg, bin_name)
+    return None
+
+
+def _install_go(pkg: str, bin_name: str) -> Optional[str]:
+    """Install a Go module to GOBIN=<staging>."""
+    go = shutil.which("go")
+    if go is None:
+        logger.info("[install] cannot install %s: go not on PATH", pkg)
+        return None
+    staging = hermes_lsp_bin_dir()
+    env = dict(os.environ)
+    env["GOBIN"] = str(staging)
+    try:
+        logger.info("[install] go install %s (GOBIN=%s)", pkg, staging)
+        proc = subprocess.run(
+            [go, "install", pkg],
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=600,
+            env=env,
+        )
+        if proc.returncode != 0:
+            logger.warning(
+                "[install] go install failed for %s: %s", pkg, proc.stderr.strip()[:500]
+            )
+            return None
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.warning("[install] go install errored for %s: %s", pkg, e)
+        return None
+    bin_path = staging / bin_name
+    if os.name == "nt":
+        bin_path = bin_path.with_suffix(".exe")
+    if bin_path.exists():
+        return str(bin_path)
+    logger.warning("[install] go install for %s succeeded but bin %s not found", pkg, bin_name)
+    return None
+
+
+def _install_pip(pkg: str, bin_name: str) -> Optional[str]:
+    """Install a Python package into a hermes-owned target dir.
+
+    We avoid polluting the user's site-packages by using
+    ``pip install --target``.  Bins go into
+    ``<staging>/python-packages/bin/`` which we symlink into
+    ``<staging>/bin``.  Note: this only works for packages that ship a
+    console script.
+    """
+    pip_target = hermes_lsp_bin_dir().parent / "python-packages"
+    pip_target.mkdir(parents=True, exist_ok=True)
+    try:
+        logger.info("[install] pip install --target %s %s", pip_target, pkg)
+        proc = subprocess.run(
+            [sys.executable, "-m", "pip", "install", "--target", str(pip_target), "--quiet", pkg],
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=300,
+        )
+        if proc.returncode != 0:
+            logger.warning(
+                "[install] pip install failed for %s: %s", pkg, proc.stderr.strip()[:500]
+            )
+            return None
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.warning("[install] pip install errored for %s: %s", pkg, e)
+        return None
+    # Look for the script
+    bin_path = pip_target / "bin" / bin_name
+    if bin_path.exists():
+        link = hermes_lsp_bin_dir() / bin_name
+        if not link.exists():
+            try:
+                link.symlink_to(bin_path)
+            except (OSError, NotImplementedError):
+                try:
+                    shutil.copy2(bin_path, link)
+                except OSError:
+                    return str(bin_path)
+        return str(link if link.exists() else bin_path)
+    return None
+
+
+def detect_status(pkg: str) -> str:
+    """Return ``installed``, ``missing``, or ``manual-only`` for a package.
+
+    Used by the ``hermes lsp status`` CLI to give users a quick
+    overview of what's available without spawning anything.
+    """
+    recipe = INSTALL_RECIPES.get(pkg)
+    bin_name = recipe.get("bin", pkg) if recipe else pkg
+    if _existing_binary(bin_name):
+        return "installed"
+    if recipe and recipe.get("strategy") == "manual":
+        return "manual-only"
+    return "missing"
+
+
+__all__ = [
+    "INSTALL_RECIPES",
+    "try_install",
+    "detect_status",
+    "hermes_lsp_bin_dir",
+]
--- a/agent/lsp/manager.py
+++ b/agent/lsp/manager.py
@ -0,0 +1,607 @@
+"""Service-level orchestration for LSP clients.
+
+The :class:`LSPService` is the bridge between the synchronous
+file_operations layer and the async :class:`agent.lsp.client.LSPClient`.
+
+Design choices:
+
+- A **single asyncio event loop** runs in a background thread.  All
+  client work happens on that loop.  Synchronous callers from
+  ``tools/file_operations.py`` use :meth:`get_diagnostics_sync` to
+  open + wait + drain in one blocking call.
+
+- One client per ``(server_id, workspace_root)`` key.  Lazy spawn:
+  the first request for a key spawns the client; subsequent requests
+  re-use it.
+
+- A **broken-set** records ``(server_id, workspace_root)`` pairs that
+  failed to spawn or initialize.  These are never retried for the
+  life of the service.  Mirrors OpenCode's design.
+
+- A **delta baseline** map keeps "diagnostics-as-of-the-last-snapshot"
+  per file.  ``snapshot_baseline()`` is called BEFORE a write; the
+  next ``get_diagnostics_sync()`` returns only diagnostics that
+  weren't in the baseline.  This is the lift from Claude Code's
+  ``beforeFileEdited`` / ``getNewDiagnostics`` pattern, except wired
+  to the local LSP layer instead of MCP IDE RPC.
+
+The service is **off by default** — call :meth:`is_active` to check
+whether it's actually doing anything.  When LSP is disabled in
+config, when no git workspace can be detected, when all configured
+servers are missing binaries and auto-install is off, ``is_active``
+returns False and the file_operations layer falls through to the
+in-process syntax check.
+"""
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+import threading
+import time
+from concurrent.futures import Future as ConcurrentFuture
+from typing import Any, Dict, List, Optional, Tuple
+
+from agent.lsp import eventlog
+from agent.lsp.client import (
+    DIAGNOSTICS_DOCUMENT_WAIT,
+    LSPClient,
+    file_uri,
+)
+from agent.lsp.servers import (
+    ServerContext,
+    ServerDef,
+    SpawnSpec,
+    find_server_for_file,
+    language_id_for,
+)
+from agent.lsp.workspace import (
+    clear_cache,
+    is_inside_workspace,
+    resolve_workspace_for_file,
+)
+
+logger = logging.getLogger("agent.lsp.manager")
+
+DEFAULT_IDLE_TIMEOUT = 600  # seconds; servers idle for >10min get reaped
+
+
+class _BackgroundLoop:
+    """A daemon thread that owns one asyncio event loop.
+
+    Provides :meth:`run` for synchronous callers — submits a coroutine
+    to the loop and blocks until it finishes (or a timeout fires).
+    """
+
+    def __init__(self) -> None:
+        self._loop: Optional[asyncio.AbstractEventLoop] = None
+        self._thread: Optional[threading.Thread] = None
+        self._ready = threading.Event()
+
+    def start(self) -> None:
+        if self._thread is not None:
+            return
+        self._thread = threading.Thread(
+            target=self._run_forever,
+            name="hermes-lsp-loop",
+            daemon=True,
+        )
+        self._thread.start()
+        self._ready.wait(timeout=5.0)
+
+    def _run_forever(self) -> None:
+        loop = asyncio.new_event_loop()
+        self._loop = loop
+        asyncio.set_event_loop(loop)
+        self._ready.set()
+        try:
+            loop.run_forever()
+        finally:
+            try:
+                loop.close()
+            except Exception:  # noqa: BLE001
+                pass
+
+    def run(self, coro, *, timeout: Optional[float] = None) -> Any:
+        """Submit a coroutine to the loop and block until done.
+
+        Returns the coroutine's result, or raises its exception.
+        """
+        if self._loop is None:
+            raise RuntimeError("background loop not started")
+        fut: ConcurrentFuture = asyncio.run_coroutine_threadsafe(coro, self._loop)
+        try:
+            return fut.result(timeout=timeout)
+        except Exception:
+            fut.cancel()
+            raise
+
+    def stop(self) -> None:
+        loop = self._loop
+        if loop is None:
+            return
+        try:
+            loop.call_soon_threadsafe(loop.stop)
+        except RuntimeError:
+            pass
+        if self._thread is not None:
+            self._thread.join(timeout=2.0)
+        self._loop = None
+        self._thread = None
+
+
+class LSPService:
+    """The process-wide LSP service.
+
+    Created once via :meth:`create_from_config`; the
+    :func:`agent.lsp.get_service` accessor manages the singleton.
+    Most callers should use that accessor rather than constructing
+    :class:`LSPService` directly.
+    """
+
+    # ------------------------------------------------------------------
+    # construction + factory
+    # ------------------------------------------------------------------
+
+    def __init__(
+        self,
+        *,
+        enabled: bool,
+        wait_mode: str,
+        wait_timeout: float,
+        install_strategy: str,
+        binary_overrides: Optional[Dict[str, List[str]]] = None,
+        env_overrides: Optional[Dict[str, Dict[str, str]]] = None,
+        init_overrides: Optional[Dict[str, Dict[str, Any]]] = None,
+        disabled_servers: Optional[List[str]] = None,
+        idle_timeout: float = DEFAULT_IDLE_TIMEOUT,
+    ) -> None:
+        self._enabled = enabled
+        self._wait_mode = wait_mode if wait_mode in ("document", "full") else "document"
+        self._wait_timeout = wait_timeout
+        self._install_strategy = install_strategy
+        self._binary_overrides = binary_overrides or {}
+        self._env_overrides = env_overrides or {}
+        self._init_overrides = init_overrides or {}
+        self._disabled_servers = set(disabled_servers or [])
+        self._idle_timeout = idle_timeout
+
+        self._loop = _BackgroundLoop()
+        if self._enabled:
+            self._loop.start()
+
+        # Per-(server_id, workspace_root) state
+        self._clients: Dict[Tuple[str, str], LSPClient] = {}
+        self._broken: set = set()
+        self._spawning: Dict[Tuple[str, str], asyncio.Future] = {}
+        self._last_used: Dict[Tuple[str, str], float] = {}
+        self._state_lock = threading.Lock()
+
+        # Delta baseline: file path → snapshot of diagnostics taken
+        # immediately before a write.  ``get_diagnostics_sync`` filters
+        # out anything in the baseline so the agent only sees errors
+        # introduced by the current edit.
+        self._delta_baseline: Dict[str, List[Dict[str, Any]]] = {}
+
+    @classmethod
+    def create_from_config(cls) -> Optional["LSPService"]:
+        """Build a service from ``hermes_cli.config`` settings.
+
+        Returns ``None`` if the config can't be loaded.  The service
+        itself returns ``is_active()`` False when LSP is disabled.
+        """
+        try:
+            from hermes_cli.config import load_config
+            cfg = load_config()
+        except Exception as e:  # noqa: BLE001
+            logger.debug("LSP config load failed: %s", e)
+            return None
+
+        lsp_cfg = (cfg.get("lsp") or {}) if isinstance(cfg, dict) else {}
+        if not isinstance(lsp_cfg, dict):
+            lsp_cfg = {}
+
+        enabled = bool(lsp_cfg.get("enabled", True))
+        wait_mode = lsp_cfg.get("wait_mode", "document")
+        wait_timeout = float(lsp_cfg.get("wait_timeout", DIAGNOSTICS_DOCUMENT_WAIT))
+        install_strategy = lsp_cfg.get("install_strategy", "auto")
+        servers_cfg = lsp_cfg.get("servers") or {}
+        disabled = []
+        binary_overrides: Dict[str, List[str]] = {}
+        env_overrides: Dict[str, Dict[str, str]] = {}
+        init_overrides: Dict[str, Dict[str, Any]] = {}
+        if isinstance(servers_cfg, dict):
+            for name, sub in servers_cfg.items():
+                if not isinstance(sub, dict):
+                    continue
+                if sub.get("disabled"):
+                    disabled.append(name)
+                cmd = sub.get("command")
+                if isinstance(cmd, list) and cmd:
+                    binary_overrides[name] = cmd
+                env = sub.get("env")
+                if isinstance(env, dict):
+                    env_overrides[name] = {k: str(v) for k, v in env.items()}
+                init = sub.get("initialization_options")
+                if isinstance(init, dict):
+                    init_overrides[name] = init
+
+        return cls(
+            enabled=enabled,
+            wait_mode=wait_mode,
+            wait_timeout=wait_timeout,
+            install_strategy=install_strategy,
+            binary_overrides=binary_overrides,
+            env_overrides=env_overrides,
+            init_overrides=init_overrides,
+            disabled_servers=disabled,
+        )
+
+    # ------------------------------------------------------------------
+    # public API
+    # ------------------------------------------------------------------
+
+    def is_active(self) -> bool:
+        """Return True iff this service should be consulted at all."""
+        return self._enabled
+
+    def enabled_for(self, file_path: str) -> bool:
+        """Return True iff LSP should run for this specific file.
+
+        Gates on workspace detection (file or cwd inside a git worktree),
+        on whether any registered server matches the extension, and
+        on whether the (server_id, workspace_root) pair is in the
+        broken-set from a previous spawn failure.
+
+        Files in already-broken pairs return False so the file_operations
+        layer skips the LSP path entirely — no spawn attempts, no
+        timeout cost — until the service is restarted (``hermes lsp
+        restart``) or the process exits.
+        """
+        if not self._enabled:
+            return False
+        srv = find_server_for_file(file_path)
+        if srv is None or srv.server_id in self._disabled_servers:
+            return False
+        ws_root, gated_in = resolve_workspace_for_file(file_path)
+        if not (ws_root and gated_in):
+            return False
+        # Broken-set short-circuit.  Use the per-server root if we can
+        # compute one cheaply; otherwise fall back to the workspace
+        # root as the broken key (which is what _get_or_spawn would
+        # have used anyway when it failed).
+        try:
+            per_server_root = srv.resolve_root(file_path, ws_root) or ws_root
+        except Exception:  # noqa: BLE001
+            per_server_root = ws_root
+        if (srv.server_id, per_server_root) in self._broken:
+            return False
+        return True
+
+    def snapshot_baseline(self, file_path: str) -> None:
+        """Snapshot current diagnostics for ``file_path`` as the delta baseline.
+
+        Called BEFORE a write so the next ``get_diagnostics_sync()``
+        can filter out pre-existing errors.  Best-effort — failures
+        are silently swallowed so a flaky server can't break a write.
+
+        Outer timeouts (e.g. server hangs during initialize) mark the
+        (server_id, workspace_root) pair as broken so subsequent edits
+        skip it instantly instead of re-paying the timeout cost.
+        """
+        if not self.enabled_for(file_path):
+            return
+        try:
+            diags = self._loop.run(self._snapshot_async(file_path), timeout=8.0)
+            self._delta_baseline[os.path.abspath(file_path)] = diags or []
+        except Exception as e:  # noqa: BLE001
+            logger.debug("baseline snapshot failed for %s: %s", file_path, e)
+            self._mark_broken_for_file(file_path, e)
+            self._delta_baseline[os.path.abspath(file_path)] = []
+
+    def get_diagnostics_sync(
+        self,
+        file_path: str,
+        *,
+        delta: bool = True,
+        timeout: Optional[float] = None,
+    ) -> List[Dict[str, Any]]:
+        """Synchronously open ``file_path`` in the right server, wait for
+        diagnostics, return them.
+
+        If ``delta`` is True (default), the result is filtered against
+        any baseline previously captured via :meth:`snapshot_baseline`.
+        Diagnostics present in the baseline are removed so the caller
+        only sees errors introduced by the current edit.
+
+        Returns an empty list when LSP is disabled, when no workspace
+        can be detected, when no server matches, or when the server
+        can't be spawned.  Never raises.
+        """
+        if not self.enabled_for(file_path):
+            return []
+
+        # Resolve server_id eagerly so we can emit structured logs even
+        # when the request errors out below.
+        srv = find_server_for_file(file_path)
+        server_id = srv.server_id if srv else "?"
+
+        try:
+            t = timeout if timeout is not None else self._wait_timeout + 2.0
+            diags = self._loop.run(self._open_and_wait_async(file_path), timeout=t) or []
+        except asyncio.TimeoutError as e:
+            eventlog.log_timeout(server_id, file_path)
+            logger.debug("LSP diagnostics timeout for %s: %s", file_path, e)
+            self._mark_broken_for_file(file_path, e)
+            return []
+        except Exception as e:  # noqa: BLE001
+            eventlog.log_server_error(server_id, file_path, e)
+            logger.debug("LSP diagnostics fetch failed for %s: %s", file_path, e)
+            self._mark_broken_for_file(file_path, e)
+            return []
+
+        abs_path = os.path.abspath(file_path)
+        if delta:
+            baseline = self._delta_baseline.get(abs_path) or []
+            if baseline:
+                seen = {_diag_key(d) for d in baseline}
+                diags = [d for d in diags if _diag_key(d) not in seen]
+            # Roll baseline forward — next call returns deltas relative
+            # to the just-emitted state, mirroring claude-code's
+            # diagnosticTracking.
+            try:
+                fresh = self._loop.run(self._current_diags_async(file_path), timeout=2.0) or []
+            except Exception:  # noqa: BLE001
+                fresh = []
+            if fresh:
+                self._delta_baseline[abs_path] = fresh
+
+        if diags:
+            eventlog.log_diagnostics(server_id, file_path, len(diags))
+        else:
+            eventlog.log_clean(server_id, file_path)
+        return diags
+
+    def _mark_broken_for_file(self, file_path: str, exc: BaseException) -> None:
+        """Mark the (server_id, workspace_root) pair as broken so subsequent
+        edits skip it instantly instead of re-paying timeout cost.
+
+        Called when the outer ``_loop.run`` timeout cancels an in-flight
+        spawn/initialize that the inner ``_get_or_spawn`` task was still
+        holding open.  Without this, every subsequent write would re-enter
+        the spawn path and re-pay the full ``snapshot_baseline``
+        timeout (8s) until the binary is fixed.
+
+        Also kills any orphan client process that survived the cancelled
+        future, and emits a single eventlog WARNING so the user knows
+        which server gave up.
+
+        ``exc`` is whatever exception the outer wrapper caught — used
+        only for logging, never re-raised.
+        """
+        srv = find_server_for_file(file_path)
+        if srv is None:
+            return
+        ws_root, gated = resolve_workspace_for_file(file_path)
+        if not (ws_root and gated):
+            return
+        try:
+            per_server_root = srv.resolve_root(file_path, ws_root) or ws_root
+        except Exception:  # noqa: BLE001
+            per_server_root = ws_root
+        key = (srv.server_id, per_server_root)
+        already_broken = key in self._broken
+        self._broken.add(key)
+
+        # Kill any client we managed to spawn before the timeout.  The
+        # cancelled future never reached the broken-set add inside
+        # ``_get_or_spawn`` so the client may still be hanging in
+        # ``_clients`` with a half-initialized state.
+        with self._state_lock:
+            client = self._clients.pop(key, None)
+        if client is not None:
+            try:
+                # Fire-and-forget shutdown — give it a second to cleanup,
+                # but don't block.  We're already on a slow path.
+                self._loop.run(client.shutdown(), timeout=1.0)
+            except Exception:  # noqa: BLE001
+                pass
+
+        if not already_broken:
+            eventlog.log_spawn_failed(srv.server_id, per_server_root, exc)
+
+    def shutdown(self) -> None:
+        """Tear down all clients and stop the background loop."""
+        if not self._enabled:
+            return
+        try:
+            self._loop.run(self._shutdown_async(), timeout=10.0)
+        except Exception as e:  # noqa: BLE001
+            logger.debug("LSP shutdown error: %s", e)
+        self._loop.stop()
+        clear_cache()
+
+    # ------------------------------------------------------------------
+    # async internals
+    # ------------------------------------------------------------------
+
+    async def _snapshot_async(self, file_path: str) -> List[Dict[str, Any]]:
+        client = await self._get_or_spawn(file_path)
+        if client is None:
+            return []
+        try:
+            version = await client.open_file(file_path, language_id=language_id_for(file_path))
+            await client.wait_for_diagnostics(file_path, version, mode=self._wait_mode)
+        except Exception as e:  # noqa: BLE001
+            logger.debug("snapshot open/wait failed: %s", e)
+            return []
+        self._last_used[(client.server_id, client.workspace_root)] = time.time()
+        return list(client.diagnostics_for(file_path))
+
+    async def _open_and_wait_async(self, file_path: str) -> List[Dict[str, Any]]:
+        client = await self._get_or_spawn(file_path)
+        if client is None:
+            return []
+        try:
+            version = await client.open_file(file_path, language_id=language_id_for(file_path))
+            await client.save_file(file_path)
+            await client.wait_for_diagnostics(file_path, version, mode=self._wait_mode)
+        except Exception as e:  # noqa: BLE001
+            logger.debug("open/wait failed for %s: %s", file_path, e)
+            return []
+        self._last_used[(client.server_id, client.workspace_root)] = time.time()
+        return list(client.diagnostics_for(file_path))
+
+    async def _current_diags_async(self, file_path: str) -> List[Dict[str, Any]]:
+        ws, gated = resolve_workspace_for_file(file_path)
+        srv = find_server_for_file(file_path)
+        if not (ws and gated and srv):
+            return []
+        with self._state_lock:
+            client = self._clients.get((srv.server_id, ws))
+        if client is None:
+            return []
+        return list(client.diagnostics_for(file_path))
+
+    async def _get_or_spawn(self, file_path: str) -> Optional[LSPClient]:
+        srv = find_server_for_file(file_path)
+        if srv is None:
+            return None
+        if srv.server_id in self._disabled_servers:
+            eventlog.log_disabled(srv.server_id, file_path, "disabled in config")
+            return None
+        ws_root, gated = resolve_workspace_for_file(file_path)
+        if not (ws_root and gated):
+            eventlog.log_no_project_root(srv.server_id, file_path)
+            return None
+        per_server_root = srv.resolve_root(file_path, ws_root)
+        if per_server_root is None:
+            eventlog.log_disabled(
+                srv.server_id, file_path, "exclude marker hit (server gated off)"
+            )
+            return None  # exclude marker hit, server gated off
+
+        key = (srv.server_id, per_server_root)
+        if key in self._broken:
+            return None
+        with self._state_lock:
+            client = self._clients.get(key)
+            if client is not None and client.is_running:
+                eventlog.log_active(srv.server_id, per_server_root)
+                return client
+            spawning = self._spawning.get(key)
+        if spawning is not None:
+            try:
+                return await spawning
+            except Exception:  # noqa: BLE001
+                return None
+
+        # Begin spawn
+        loop = asyncio.get_running_loop()
+        spawn_future: asyncio.Future = loop.create_future()
+        with self._state_lock:
+            self._spawning[key] = spawn_future
+        try:
+            ctx = ServerContext(
+                workspace_root=per_server_root,
+                install_strategy=self._install_strategy,
+                binary_overrides=self._binary_overrides,
+                env_overrides=self._env_overrides,
+                init_overrides=self._init_overrides,
+            )
+            spec = srv.build_spawn(per_server_root, ctx)
+            if spec is None:
+                # ``build_spawn`` returns None when the binary can't be
+                # located (auto-install disabled, manual-only server,
+                # or install attempt failed).  Surface this once via
+                # the structured logger so the user can act on it.
+                eventlog.log_server_unavailable(srv.server_id, srv.server_id)
+                self._broken.add(key)
+                spawn_future.set_result(None)
+                return None
+            client = LSPClient(
+                server_id=srv.server_id,
+                workspace_root=spec.workspace_root,
+                command=spec.command,
+                env=spec.env,
+                cwd=spec.cwd,
+                initialization_options=spec.initialization_options,
+                seed_diagnostics_on_first_push=spec.seed_diagnostics_on_first_push or srv.seed_first_push,
+            )
+            try:
+                await client.start()
+            except Exception as e:  # noqa: BLE001
+                eventlog.log_spawn_failed(srv.server_id, per_server_root, e)
+                self._broken.add(key)
+                spawn_future.set_result(None)
+                return None
+            with self._state_lock:
+                self._clients[key] = client
+            self._last_used[key] = time.time()
+            eventlog.log_active(srv.server_id, per_server_root)
+            spawn_future.set_result(client)
+            return client
+        finally:
+            with self._state_lock:
+                self._spawning.pop(key, None)
+
+    async def _shutdown_async(self) -> None:
+        with self._state_lock:
+            clients = list(self._clients.values())
+            self._clients.clear()
+            self._broken.clear()
+            self._last_used.clear()
+        await asyncio.gather(
+            *(c.shutdown() for c in clients),
+            return_exceptions=True,
+        )
+
+    # ------------------------------------------------------------------
+    # status / introspection (used by ``hermes lsp status``)
+    # ------------------------------------------------------------------
+
+    def get_status(self) -> Dict[str, Any]:
+        """Return a snapshot of the service for the CLI status command."""
+        with self._state_lock:
+            clients = [
+                {
+                    "server_id": k[0],
+                    "workspace_root": k[1],
+                    "state": c.state,
+                    "running": c.is_running,
+                }
+                for k, c in self._clients.items()
+            ]
+            broken = list(self._broken)
+        return {
+            "enabled": self._enabled,
+            "wait_mode": self._wait_mode,
+            "wait_timeout": self._wait_timeout,
+            "install_strategy": self._install_strategy,
+            "clients": clients,
+            "broken": broken,
+            "disabled_servers": sorted(self._disabled_servers),
+        }
+
+
+def _diag_key(d: Dict[str, Any]) -> str:
+    """Content equality key used for delta filtering.  Mirrors
+    :func:`agent.lsp.client._diagnostic_key`."""
+    rng = d.get("range") or {}
+    start = rng.get("start") or {}
+    end = rng.get("end") or {}
+    code = d.get("code")
+    if code is not None and not isinstance(code, str):
+        code = str(code)
+    return "\x00".join(
+        [
+            str(d.get("severity") or 1),
+            str(code or ""),
+            str(d.get("source") or ""),
+            str(d.get("message") or "").strip(),
+            f"{start.get('line', 0)}:{start.get('character', 0)}-{end.get('line', 0)}:{end.get('character', 0)}",
+        ]
+    )
+
+
+__all__ = ["LSPService"]
--- a/agent/lsp/protocol.py
+++ b/agent/lsp/protocol.py
@ -0,0 +1,196 @@
+"""Minimal LSP JSON-RPC 2.0 framer over async streams.
+
+LSP wire format:
+
+    Content-Length: <bytes>\\r\\n
+    \\r\\n
+    <utf-8 JSON body>
+
+The body is a JSON-RPC 2.0 envelope: request, response, or notification.
+
+This module replaces what ``vscode-jsonrpc/node`` would do in a
+TypeScript implementation.  We keep it deliberately small — just the
+framer + envelope helpers — so :class:`agent.lsp.client.LSPClient` can
+focus on protocol semantics.
+"""
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+from typing import Any, Optional, Tuple
+
+logger = logging.getLogger("agent.lsp.protocol")
+
+# LSP error codes we care about.  Full list in
+# https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#errorCodes
+ERROR_CONTENT_MODIFIED = -32801
+ERROR_REQUEST_CANCELLED = -32800
+ERROR_METHOD_NOT_FOUND = -32601
+
+
+class LSPProtocolError(Exception):
+    """Raised when the wire protocol is violated.
+
+    Distinct from :class:`LSPRequestError` which represents a server
+    returning a JSON-RPC error response — that's protocol-conformant.
+    This exception means the framing or envelope itself is broken.
+    """
+
+
+class LSPRequestError(Exception):
+    """Raised when an LSP request returns an error response.
+
+    Carries the JSON-RPC ``code``, ``message``, and optional ``data``.
+    """
+
+    def __init__(self, code: int, message: str, data: Any = None) -> None:
+        super().__init__(f"LSP error {code}: {message}")
+        self.code = code
+        self.message = message
+        self.data = data
+
+
+def encode_message(obj: dict) -> bytes:
+    """Encode a JSON-RPC envelope as a Content-Length framed byte string.
+
+    The body is encoded as compact UTF-8 JSON (no spaces between
+    separators) — matches what ``vscode-jsonrpc`` emits and keeps the
+    Content-Length count exact.
+    """
+    body = json.dumps(obj, separators=(",", ":"), ensure_ascii=False).encode("utf-8")
+    header = f"Content-Length: {len(body)}\r\n\r\n".encode("ascii")
+    return header + body
+
+
+async def read_message(reader: asyncio.StreamReader) -> Optional[dict]:
+    """Read one Content-Length framed JSON-RPC message from the stream.
+
+    Returns ``None`` on clean EOF (server closed stdout cleanly between
+    messages — typical shutdown).  Raises :class:`LSPProtocolError` on
+    malformed framing.
+
+    The reader is advanced to just past the JSON body on success.
+    """
+    headers: dict = {}
+    header_bytes = 0
+    while True:
+        try:
+            line = await reader.readuntil(b"\r\n")
+        except asyncio.IncompleteReadError as e:
+            # EOF while reading headers.  If we hadn't started a header
+            # block, treat as clean EOF; otherwise the framing is bad.
+            if not e.partial and not headers:
+                return None
+            raise LSPProtocolError(
+                f"unexpected EOF while reading LSP headers (partial={e.partial!r})"
+            ) from e
+        # Defensive cap against a server streaming headers without ever
+        # emitting CRLF-CRLF.  Caps total header bytes at 8 KiB — a
+        # well-behaved server fits in well under 200 bytes.
+        header_bytes += len(line)
+        if header_bytes > 8192:
+            raise LSPProtocolError(
+                f"LSP header block exceeded 8 KiB without terminator"
+            )
+        line = line[:-2]  # strip CRLF
+        if not line:
+            break  # blank line ends header block
+        try:
+            key, _, value = line.decode("ascii").partition(":")
+        except UnicodeDecodeError as e:
+            raise LSPProtocolError(f"non-ASCII LSP header: {line!r}") from e
+        if not key:
+            raise LSPProtocolError(f"malformed LSP header line: {line!r}")
+        headers[key.strip().lower()] = value.strip()
+
+    cl = headers.get("content-length")
+    if cl is None:
+        raise LSPProtocolError(f"LSP message missing Content-Length: {headers!r}")
+    try:
+        n = int(cl)
+    except ValueError as e:
+        raise LSPProtocolError(f"non-integer Content-Length: {cl!r}") from e
+    if n < 0 or n > 64 * 1024 * 1024:  # 64 MiB sanity cap
+        raise LSPProtocolError(f"unreasonable Content-Length: {n}")
+
+    try:
+        body = await reader.readexactly(n)
+    except asyncio.IncompleteReadError as e:
+        raise LSPProtocolError(
+            f"truncated LSP body: expected {n} bytes, got {len(e.partial)}"
+        ) from e
+
+    try:
+        return json.loads(body.decode("utf-8"))
+    except json.JSONDecodeError as e:
+        raise LSPProtocolError(f"invalid JSON in LSP body: {e}") from e
+    except UnicodeDecodeError as e:
+        raise LSPProtocolError(f"non-UTF-8 LSP body: {e}") from e
+
+
+def make_request(req_id: int, method: str, params: Any) -> dict:
+    """Build a JSON-RPC 2.0 request envelope."""
+    msg: dict = {"jsonrpc": "2.0", "id": req_id, "method": method}
+    if params is not None:
+        msg["params"] = params
+    return msg
+
+
+def make_notification(method: str, params: Any) -> dict:
+    """Build a JSON-RPC 2.0 notification envelope (no ``id``)."""
+    msg: dict = {"jsonrpc": "2.0", "method": method}
+    if params is not None:
+        msg["params"] = params
+    return msg
+
+
+def make_response(req_id: Any, result: Any) -> dict:
+    """Build a JSON-RPC 2.0 success response envelope."""
+    return {"jsonrpc": "2.0", "id": req_id, "result": result}
+
+
+def make_error_response(req_id: Any, code: int, message: str, data: Any = None) -> dict:
+    """Build a JSON-RPC 2.0 error response envelope."""
+    err: dict = {"code": code, "message": message}
+    if data is not None:
+        err["data"] = data
+    return {"jsonrpc": "2.0", "id": req_id, "error": err}
+
+
+def classify_message(msg: dict) -> Tuple[str, Any]:
+    """Return ``(kind, key)`` where kind is one of ``request``,
+    ``response``, ``notification``, ``invalid``.
+
+    The key is the request id for request/response, the method name
+    for notifications, and ``None`` for invalid messages.
+    """
+    if not isinstance(msg, dict):
+        return "invalid", None
+    if msg.get("jsonrpc") != "2.0":
+        return "invalid", None
+    has_id = "id" in msg
+    has_method = "method" in msg
+    if has_id and has_method:
+        return "request", msg["id"]
+    if has_id and ("result" in msg or "error" in msg):
+        return "response", msg["id"]
+    if has_method and not has_id:
+        return "notification", msg["method"]
+    return "invalid", None
+
+
+__all__ = [
+    "ERROR_CONTENT_MODIFIED",
+    "ERROR_REQUEST_CANCELLED",
+    "ERROR_METHOD_NOT_FOUND",
+    "LSPProtocolError",
+    "LSPRequestError",
+    "encode_message",
+    "read_message",
+    "make_request",
+    "make_notification",
+    "make_response",
+    "make_error_response",
+    "classify_message",
+]
--- a/agent/lsp/reporter.py
+++ b/agent/lsp/reporter.py
@ -0,0 +1,78 @@
+"""Format LSP diagnostics for inclusion in tool output.
+
+The model sees a compact, severity-filtered, line-bounded summary of
+diagnostics introduced by the latest edit.  Format matches what
+OpenCode's ``lsp/diagnostic.ts`` and Claude Code's
+``formatDiagnosticsSummary`` produce — ``<diagnostics>`` blocks with
+1-indexed line/column, capped at ``MAX_PER_FILE`` errors.
+"""
+from __future__ import annotations
+
+from typing import Any, Dict, List
+
+# Severity-1 only by default — warnings/info/hints would flood the
+# agent.  Lift this in config under ``lsp.severities`` if needed.
+SEVERITY_NAMES = {1: "ERROR", 2: "WARN", 3: "INFO", 4: "HINT"}
+DEFAULT_SEVERITIES = frozenset({1})  # ERROR only
+
+MAX_PER_FILE = 20
+MAX_TOTAL_CHARS = 4000
+
+
+def format_diagnostic(d: Dict[str, Any]) -> str:
+    """One-line representation of a single diagnostic."""
+    sev = SEVERITY_NAMES.get(d.get("severity") or 1, "ERROR")
+    rng = d.get("range") or {}
+    start = rng.get("start") or {}
+    line = int(start.get("line", 0)) + 1
+    col = int(start.get("character", 0)) + 1
+    msg = str(d.get("message") or "").rstrip()
+    code = d.get("code")
+    code_part = f" [{code}]" if code not in (None, "") else ""
+    source = d.get("source")
+    source_part = f" ({source})" if source else ""
+    return f"{sev} [{line}:{col}] {msg}{code_part}{source_part}"
+
+
+def report_for_file(
+    file_path: str,
+    diagnostics: List[Dict[str, Any]],
+    *,
+    severities: frozenset = DEFAULT_SEVERITIES,
+    max_per_file: int = MAX_PER_FILE,
+) -> str:
+    """Build a ``<diagnostics file=...>`` block for one file.
+
+    Returns an empty string when no diagnostics pass the severity
+    filter, so callers can do ``if block:`` to skip empty cases.
+    """
+    if not diagnostics:
+        return ""
+    filtered = [d for d in diagnostics if (d.get("severity") or 1) in severities]
+    if not filtered:
+        return ""
+    limited = filtered[:max_per_file]
+    extra = len(filtered) - len(limited)
+    lines = [format_diagnostic(d) for d in limited]
+    body = "\n".join(lines)
+    if extra > 0:
+        body += f"\n... and {extra} more"
+    return f"<diagnostics file=\"{file_path}\">\n{body}\n</diagnostics>"
+
+
+def truncate(s: str, *, limit: int = MAX_TOTAL_CHARS) -> str:
+    """Hard-cap a formatted summary string."""
+    if len(s) <= limit:
+        return s
+    marker = "\n…[truncated]"
+    return s[: limit - len(marker)] + marker
+
+
+__all__ = [
+    "SEVERITY_NAMES",
+    "DEFAULT_SEVERITIES",
+    "MAX_PER_FILE",
+    "format_diagnostic",
+    "report_for_file",
+    "truncate",
+]
--- a/agent/lsp/servers.py
+++ b/agent/lsp/servers.py
--- a/agent/lsp/workspace.py
+++ b/agent/lsp/workspace.py
@ -0,0 +1,223 @@
+"""Workspace and project-root resolution for LSP.
+
+Two concerns live here:
+
+1. **Workspace gate** — the upper-level "is this directory a project?"
+   check.  Hermes only runs LSP when the cwd (or the file being edited)
+   sits inside a git worktree.  Files outside any git root never
+   trigger LSP, even if a server is configured.  This keeps Telegram
+   gateway users on user-home cwd's from spawning daemons.
+
+2. **NearestRoot** — the per-server project-root walk.  Each language
+   server cares about a different marker (``pyproject.toml`` for
+   Python, ``Cargo.toml`` for Rust, ``go.mod`` for Go, etc.) and
+   wants the directory containing that marker.  ``nearest_root()``
+   walks up from a starting path looking for any of a list of marker
+   files, optionally bailing if an exclude marker shows up first.
+"""
+from __future__ import annotations
+
+import logging
+import os
+from pathlib import Path
+from typing import Iterable, Optional, Tuple
+
+logger = logging.getLogger("agent.lsp.workspace")
+
+# Cache: cwd → (worktree_root, is_git) so repeated calls don't re-stat.
+# Cleared on shutdown.  Keyed by absolute resolved path so symlink
+# folds collapse to one entry.
+_workspace_cache: dict = {}
+
+
+def normalize_path(path: str) -> str:
+    """Normalize a path for use as a stable map key.
+
+    Resolves ``~``, makes absolute, and collapses ``.``/``..``.  We do
+    NOT resolve symlinks here — symlink stability matters for some
+    LSP servers (rust-analyzer cares about Cargo workspace identity)
+    and we want the canonical path the user typed when possible.
+    """
+    return os.path.abspath(os.path.expanduser(path))
+
+
+def find_git_worktree(start: str) -> Optional[str]:
+    """Walk up from ``start`` looking for a ``.git`` entry (file or dir).
+
+    Returns the directory containing ``.git``, or ``None`` if no git
+    root is found before hitting the filesystem root.
+
+    A ``.git`` *file* (not directory) means we're inside a git
+    worktree set up via ``git worktree add`` — both forms count.
+    """
+    try:
+        start_path = Path(normalize_path(start))
+        if start_path.is_file():
+            start_path = start_path.parent
+    except (OSError, RuntimeError, ValueError):
+        # Pathological input (loop in symlinks, encoding error, etc.) —
+        # bail out rather than crash the lint hook.
+        return None
+
+    # Cache check
+    cached = _workspace_cache.get(str(start_path))
+    if cached is not None:
+        root, _is_git = cached
+        return root
+
+    cur = start_path
+    # Defensive cap: the deepest reasonable monorepo is well under 64
+    # levels.  Caps the walk so a pathological cwd or a symlink cycle
+    # we somehow traverse can't keep us looping.
+    for _ in range(64):
+        git_marker = cur / ".git"
+        try:
+            if git_marker.exists():
+                resolved = str(cur)
+                _workspace_cache[str(start_path)] = (resolved, True)
+                return resolved
+        except OSError:
+            # Permission error on a parent dir — bail out cleanly.
+            break
+        parent = cur.parent
+        if parent == cur:
+            break
+        cur = parent
+
+    _workspace_cache[str(start_path)] = (None, False)
+    return None
+
+
+def is_inside_workspace(path: str, workspace_root: str) -> bool:
+    """Return True iff ``path`` is inside (or equal to) ``workspace_root``.
+
+    Uses absolute paths but does not resolve symlinks — a file accessed
+    via a symlink that points outside the workspace still counts as
+    outside.  This is the conservative interpretation; matches LSP
+    behaviour where servers reject didOpen for unrelated files.
+    """
+    p = normalize_path(path)
+    root = normalize_path(workspace_root)
+    if p == root:
+        return True
+    # Use os.path.commonpath to handle case-insensitive filesystems
+    # correctly on macOS/Windows.
+    try:
+        common = os.path.commonpath([p, root])
+    except ValueError:
+        # Different drives on Windows.
+        return False
+    return common == root
+
+
+def nearest_root(
+    start: str,
+    markers: Iterable[str],
+    *,
+    excludes: Optional[Iterable[str]] = None,
+    ceiling: Optional[str] = None,
+) -> Optional[str]:
+    """Walk up from ``start`` looking for any of the given marker files.
+
+    Returns the **directory containing** the first matched marker, or
+    ``None`` if no marker is found before hitting ``ceiling`` (or the
+    filesystem root if no ceiling).
+
+    If ``excludes`` is provided and an exclude marker matches *first*
+    in the upward walk, returns ``None`` — the server is gated off
+    for that file.  Mirrors OpenCode's NearestRoot exclude semantics
+    (e.g. typescript skips deno projects when ``deno.json`` is found
+    before ``package.json``).
+    """
+    start_path = Path(normalize_path(start))
+    try:
+        if start_path.is_file():
+            start_path = start_path.parent
+    except (OSError, RuntimeError, ValueError):
+        return None
+    ceiling_path = Path(normalize_path(ceiling)) if ceiling else None
+
+    markers_list = list(markers)
+    excludes_list = list(excludes) if excludes else []
+
+    cur = start_path
+    # Defensive cap matching ``find_git_worktree``.  Bounded walk
+    # protects against pathological inputs even though the
+    # parent-equality stop normally terminates within ~10 steps.
+    for _ in range(64):
+        # Check excludes first — if an exclude is found at this level,
+        # the server is gated off for this file.
+        for exc in excludes_list:
+            try:
+                if (cur / exc).exists():
+                    return None
+            except OSError:
+                continue
+        # Then check markers.
+        for marker in markers_list:
+            try:
+                if (cur / marker).exists():
+                    return str(cur)
+            except OSError:
+                continue
+        # Stop conditions.
+        if ceiling_path is not None and cur == ceiling_path:
+            return None
+        parent = cur.parent
+        if parent == cur:
+            return None
+        cur = parent
+    return None
+
+
+def resolve_workspace_for_file(
+    file_path: str,
+    *,
+    cwd: Optional[str] = None,
+) -> Tuple[Optional[str], bool]:
+    """Resolve the workspace root for a file.
+
+    Returns ``(workspace_root, gated_in)`` where ``gated_in`` is True
+    iff LSP should run for this file at all.  Currently the gate is
+    "file is inside a git worktree found by walking up from cwd OR
+    from the file itself".
+
+    The cwd path takes precedence — if the agent was launched in a
+    git project, that worktree is the workspace, and any edit inside
+    it (regardless of where the file lives) is in-scope.  If the cwd
+    isn't in a git worktree, we try the file's own location as a
+    fallback.
+
+    Returns ``(None, False)`` when neither path is in a git worktree.
+    """
+    cwd = cwd or os.getcwd()
+    cwd_root = find_git_worktree(cwd)
+    if cwd_root is not None:
+        if is_inside_workspace(file_path, cwd_root):
+            return cwd_root, True
+        # File is outside the cwd's worktree — try the file's own
+        # location as a secondary anchor.  Useful for monorepos where
+        # the user opens an unrelated checkout.
+    file_root = find_git_worktree(file_path)
+    if file_root is not None:
+        return file_root, True
+    return None, False
+
+
+def clear_cache() -> None:
+    """Clear the workspace-resolution cache.
+
+    Called on service shutdown so a subsequent re-init doesn't pick
+    up stale results from a previous session.
+    """
+    _workspace_cache.clear()
+
+
+__all__ = [
+    "find_git_worktree",
+    "is_inside_workspace",
+    "nearest_root",
+    "normalize_path",
+    "resolve_workspace_for_file",
+    "clear_cache",
+]
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@ -1499,6 +1499,53 @@ DEFAULT_CONFIG = {
        "backup_keep": 5,
    },

+    # Language Server Protocol — semantic diagnostics from real
+    # language servers (pyright, gopls, rust-analyzer, etc.) wired
+    # into the post-write lint check used by ``write_file`` and
+    # ``patch``.
+    #
+    # LSP is gated on git-workspace detection: when the agent's
+    # cwd (or the file being edited) is inside a git worktree, LSP
+    # runs against that workspace.  When neither is in a git repo,
+    # LSP stays dormant and the in-process syntax check is the only
+    # tier — handy for Telegram/Discord chats where the cwd is the
+    # user's home directory.
+    "lsp": {
+        # Master toggle.  Setting this to false disables the entire
+        # subsystem — no servers spawn, no background event loop, no
+        # cost.
+        "enabled": True,
+
+        # Diagnostic-wait mode for the post-write check.
+        # ``"document"`` waits up to ``wait_timeout`` seconds for the
+        # current file's diagnostics; ``"full"`` additionally requests
+        # workspace-wide diagnostics (slower).
+        "wait_mode": "document",
+        "wait_timeout": 5.0,
+
+        # How to handle missing server binaries.
+        # ``"auto"`` — try to install via npm/go/pip into
+        #              ``<HERMES_HOME>/lsp/bin/`` on first use.
+        # ``"manual"`` — only use binaries already on PATH.
+        # ``"off"`` — alias for ``manual``.
+        "install_strategy": "auto",
+
+        # Per-server overrides.  Each key is a server_id from the
+        # registry (``pyright``, ``typescript``, ``gopls``,
+        # ``rust-analyzer``, etc.) and accepts:
+        #   disabled: true
+        #     — skip this server even when its extensions match
+        #   command: ["full/path/to/server", "--stdio"]
+        #     — pin a custom binary path; bypasses auto-install
+        #   env: {"KEY": "value"}
+        #     — extra env vars passed to the spawned process
+        #   initialization_options: {...}
+        #     — merged into the LSP ``initializationOptions``
+        # Empty by default; the registry defaults work for typical
+        # setups.
+        "servers": {},
+    },
+
    # Config schema version - bump this when adding new required fields
    "_config_version": 23,
 }
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@ -9533,6 +9533,17 @@ def main():

    gateway_parser.set_defaults(func=cmd_gateway)

+    # =========================================================================
+    # lsp command
+    # =========================================================================
+    try:
+        from agent.lsp.cli import register_subparser as _lsp_register
+        _lsp_register(subparsers)
+    except Exception as _lsp_err:  # noqa: BLE001
+        # LSP is optional infrastructure — never let a registration
+        # failure break the CLI overall.
+        logger.debug("LSP CLI registration failed: %s", _lsp_err)
+
    # =========================================================================
    # setup command
    # =========================================================================
--- a/tests/agent/lsp/init.py
+++ b/tests/agent/lsp/init.py
@ -0,0 +1 @@
+"""Pytest helpers for LSP-related tests."""
--- a/tests/agent/lsp/_mock_lsp_server.py
+++ b/tests/agent/lsp/_mock_lsp_server.py
@ -0,0 +1,159 @@
+#!/usr/bin/env python3
+"""A minimal in-process LSP server used by tests.
+
+Speaks just enough LSP to drive :class:`agent.lsp.client.LSPClient`
+through a full lifecycle: ``initialize``, ``initialized``,
+``textDocument/didOpen``, ``textDocument/didChange``, then a
+``textDocument/publishDiagnostics`` notification followed by
+``shutdown`` + ``exit``.
+
+Behaviour (all behaviours selectable via env var ``MOCK_LSP_SCRIPT``):
+
+- ``"clean"`` — initialize, accept didOpen/didChange, push empty
+  diagnostics on every open/change, exit cleanly on shutdown.
+- ``"errors"`` — same as ``clean`` but the published diagnostics
+  carry one severity-1 entry pointing at line 0:0.
+- ``"crash"`` — exit immediately after responding to ``initialize``
+  (simulates a crashing server).
+- ``"slow"`` — same as ``clean`` but sleeps 1s before responding to
+  ``initialize`` (lets us test timeout behaviour).
+
+The script writes JSON-RPC framed messages to stdout and reads from
+stdin.  No third-party dependencies — uses only stdlib so it runs
+under whatever Python the test process picks up.
+"""
+from __future__ import annotations
+
+import json
+import os
+import sys
+import time
+
+
+def read_message():
+    """Read one Content-Length framed JSON-RPC message from stdin."""
+    headers = {}
+    while True:
+        line = sys.stdin.buffer.readline()
+        if not line:
+            return None
+        line = line.rstrip(b"\r\n")
+        if not line:
+            break
+        k, _, v = line.decode("ascii").partition(":")
+        headers[k.strip().lower()] = v.strip()
+    n = int(headers["content-length"])
+    body = sys.stdin.buffer.read(n)
+    return json.loads(body.decode("utf-8"))
+
+
+def write_message(obj):
+    body = json.dumps(obj, separators=(",", ":")).encode("utf-8")
+    sys.stdout.buffer.write(f"Content-Length: {len(body)}\r\n\r\n".encode("ascii"))
+    sys.stdout.buffer.write(body)
+    sys.stdout.buffer.flush()
+
+
+def main():
+    script = os.environ.get("MOCK_LSP_SCRIPT", "clean")
+
+    while True:
+        msg = read_message()
+        if msg is None:
+            return 0
+
+        if "id" in msg and msg.get("method") == "initialize":
+            if script == "slow":
+                time.sleep(1.0)
+            write_message(
+                {
+                    "jsonrpc": "2.0",
+                    "id": msg["id"],
+                    "result": {
+                        "capabilities": {
+                            "textDocumentSync": 1,  # Full
+                            "diagnosticProvider": {"interFileDependencies": False, "workspaceDiagnostics": False},
+                        },
+                        "serverInfo": {"name": "mock-lsp", "version": "0.1"},
+                    },
+                }
+            )
+            if script == "crash":
+                return 0
+            continue
+
+        if msg.get("method") == "initialized":
+            continue
+
+        if msg.get("method") == "workspace/didChangeConfiguration":
+            continue
+
+        if msg.get("method") == "workspace/didChangeWatchedFiles":
+            continue
+
+        if msg.get("method") in ("textDocument/didOpen", "textDocument/didChange"):
+            params = msg.get("params") or {}
+            td = params.get("textDocument") or {}
+            uri = td.get("uri", "")
+            version = td.get("version", 0)
+            diagnostics = []
+            if script == "errors":
+                diagnostics = [
+                    {
+                        "range": {
+                            "start": {"line": 0, "character": 0},
+                            "end": {"line": 0, "character": 5},
+                        },
+                        "severity": 1,
+                        "code": "MOCK001",
+                        "source": "mock-lsp",
+                        "message": "synthetic error from mock-lsp",
+                    }
+                ]
+            write_message(
+                {
+                    "jsonrpc": "2.0",
+                    "method": "textDocument/publishDiagnostics",
+                    "params": {
+                        "uri": uri,
+                        "version": version,
+                        "diagnostics": diagnostics,
+                    },
+                }
+            )
+            continue
+
+        if msg.get("method") == "textDocument/diagnostic":
+            # Pull endpoint — return empty.
+            write_message(
+                {
+                    "jsonrpc": "2.0",
+                    "id": msg["id"],
+                    "result": {"kind": "full", "items": []},
+                }
+            )
+            continue
+
+        if msg.get("method") == "textDocument/didSave":
+            continue
+
+        if msg.get("method") == "shutdown":
+            write_message({"jsonrpc": "2.0", "id": msg["id"], "result": None})
+            continue
+
+        if msg.get("method") == "exit":
+            return 0
+
+        # Unknown request: respond with method-not-found.
+        if "id" in msg:
+            write_message(
+                {
+                    "jsonrpc": "2.0",
+                    "id": msg["id"],
+                    "error": {"code": -32601, "message": f"method not found: {msg.get('method')}"},
+                }
+            )
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/tests/agent/lsp/test_backend_gate.py
+++ b/tests/agent/lsp/test_backend_gate.py
@ -0,0 +1,108 @@
+"""Integration test: LSP layer is skipped on non-local backends.
+
+The host-side LSP server can't see files inside a Docker/Modal/SSH
+sandbox.  When the agent's terminal env isn't ``LocalEnvironment``,
+the file_operations layer must skip both ``snapshot_baseline`` and
+``get_diagnostics_sync`` calls — falling back to the in-process
+syntax check exactly as if LSP were disabled.
+"""
+from __future__ import annotations
+
+import os
+import sys
+from unittest.mock import MagicMock
+
+import pytest
+
+from agent.lsp import eventlog
+
+
+@pytest.fixture(autouse=True)
+def _reset():
+    eventlog.reset_announce_caches()
+
+
+def test_local_only_helper_returns_true_for_local_env():
+    from tools.environments.local import LocalEnvironment
+    from tools.file_operations import ShellFileOperations
+
+    fops = ShellFileOperations(LocalEnvironment(cwd="/tmp"))
+    assert fops._lsp_local_only() is True
+
+
+def test_local_only_helper_returns_false_for_non_local_env():
+    """A mocked non-local env (Docker/Modal/SSH stand-in) returns False."""
+    from tools.file_operations import ShellFileOperations
+
+    # Build something that's NOT a LocalEnvironment.  We use a bare
+    # MagicMock — isinstance() against LocalEnvironment is False.
+    fake_env = MagicMock()
+    fake_env.execute = MagicMock(return_value=MagicMock(exit_code=0, stdout=""))
+    fake_env.cwd = "/sandbox"
+    fops = ShellFileOperations(fake_env)
+    assert fops._lsp_local_only() is False
+
+
+def test_snapshot_baseline_skipped_for_non_local(monkeypatch):
+    """Verify the LSP service's snapshot_baseline is NOT called when
+    the backend isn't local."""
+    from tools.file_operations import ShellFileOperations
+
+    fake_env = MagicMock()
+    fake_env.execute = MagicMock(return_value=MagicMock(exit_code=0, stdout=""))
+    fake_env.cwd = "/sandbox"
+    fops = ShellFileOperations(fake_env)
+
+    snapshot_called = []
+
+    class FakeService:
+        def snapshot_baseline(self, path):
+            snapshot_called.append(path)
+
+    monkeypatch.setattr("agent.lsp.get_service", lambda: FakeService())
+
+    fops._snapshot_lsp_baseline("/sandbox/x.py")
+    assert snapshot_called == [], "snapshot must be skipped for non-local backends"
+
+
+def test_maybe_lsp_diagnostics_returns_empty_for_non_local(monkeypatch):
+    from tools.file_operations import ShellFileOperations
+
+    fake_env = MagicMock()
+    fake_env.execute = MagicMock(return_value=MagicMock(exit_code=0, stdout=""))
+    fake_env.cwd = "/sandbox"
+    fops = ShellFileOperations(fake_env)
+
+    called = []
+
+    class FakeService:
+        def enabled_for(self, path):
+            called.append(("enabled_for", path))
+            return True
+        def get_diagnostics_sync(self, path, **kw):
+            called.append(("get_diagnostics_sync", path))
+            return [{"severity": 1, "message": "should not see this"}]
+
+    monkeypatch.setattr("agent.lsp.get_service", lambda: FakeService())
+
+    result = fops._maybe_lsp_diagnostics("/sandbox/x.py")
+    assert result == ""
+    assert called == [], "service must not be queried for non-local backends"
+
+
+def test_snapshot_baseline_called_for_local_env(tmp_path, monkeypatch):
+    from tools.environments.local import LocalEnvironment
+    from tools.file_operations import ShellFileOperations
+
+    fops = ShellFileOperations(LocalEnvironment(cwd=str(tmp_path)))
+
+    snapshot_called = []
+
+    class FakeService:
+        def snapshot_baseline(self, path):
+            snapshot_called.append(path)
+
+    monkeypatch.setattr("agent.lsp.get_service", lambda: FakeService())
+
+    fops._snapshot_lsp_baseline(str(tmp_path / "x.py"))
+    assert snapshot_called == [str(tmp_path / "x.py")]
--- a/tests/agent/lsp/test_broken_set.py
+++ b/tests/agent/lsp/test_broken_set.py
@ -0,0 +1,213 @@
+"""Tests for the broken-set short-circuit added to handle outer-timeout failures.
+
+When ``snapshot_baseline`` or ``get_diagnostics_sync`` time out from the
+service layer (because a language server hangs during initialize, or
+the binary is wedged), the inner spawn task is cancelled — but the
+inner exception handler that adds to ``_broken`` never runs.  Without
+the service-layer fallback added in this module, every subsequent
+edit re-pays the full timeout cost until the process exits.
+
+This module verifies:
+- ``_mark_broken_for_file`` adds the right key
+- ``enabled_for`` short-circuits on broken keys
+- a missing binary is broken-set'd after one snapshot attempt
+"""
+from __future__ import annotations
+
+import os
+import sys
+from pathlib import Path
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from agent.lsp.manager import LSPService
+from agent.lsp.servers import SERVERS, ServerContext, ServerDef, SpawnSpec
+from agent.lsp.workspace import clear_cache
+
+
+@pytest.fixture(autouse=True)
+def _clear_workspace_cache():
+    clear_cache()
+    yield
+    clear_cache()
+
+
+def _make_git_workspace(tmp_path: Path) -> Path:
+    """Build a minimal git repo with a pyproject so pyright's root resolver fires."""
+    repo = tmp_path / "repo"
+    repo.mkdir()
+    (repo / ".git").mkdir()
+    (repo / "pyproject.toml").write_text("[project]\nname='t'\n")
+    return repo
+
+
+def test_mark_broken_for_file_adds_correct_key(tmp_path, monkeypatch):
+    """``_mark_broken_for_file`` keys the broken-set on
+    (server_id, per_server_root) so subsequent ``enabled_for`` calls
+    for files in the same project skip immediately."""
+    repo = _make_git_workspace(tmp_path)
+    monkeypatch.chdir(str(repo))
+    src = repo / "x.py"
+    src.write_text("")
+
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=2.0,
+        install_strategy="manual",
+    )
+    try:
+        svc._mark_broken_for_file(str(src), RuntimeError("simulated"))
+        # The pyright server resolves to the repo root via pyproject.toml.
+        assert ("pyright", str(repo)) in svc._broken
+    finally:
+        svc.shutdown()
+
+
+def test_enabled_for_returns_false_after_broken(tmp_path, monkeypatch):
+    """Once a (server_id, root) pair is in the broken-set,
+    ``enabled_for`` returns False so the file_operations layer skips
+    the LSP path entirely."""
+    repo = _make_git_workspace(tmp_path)
+    monkeypatch.chdir(str(repo))
+    src = repo / "x.py"
+    src.write_text("")
+
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=2.0,
+        install_strategy="manual",
+    )
+    try:
+        # Initially enabled.
+        assert svc.enabled_for(str(src)) is True
+        # Mark broken.
+        svc._mark_broken_for_file(str(src), RuntimeError("simulated"))
+        # Now disabled — the broken-set short-circuits.
+        assert svc.enabled_for(str(src)) is False
+    finally:
+        svc.shutdown()
+
+
+def test_enabled_for_other_file_in_same_project_also_skipped(tmp_path, monkeypatch):
+    """The broken key is (server_id, root), so ALL files routed through
+    the same server in the same project are skipped — not just the one
+    that triggered the failure."""
+    repo = _make_git_workspace(tmp_path)
+    monkeypatch.chdir(str(repo))
+    a = repo / "a.py"
+    a.write_text("")
+    b = repo / "b.py"
+    b.write_text("")
+
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=2.0,
+        install_strategy="manual",
+    )
+    try:
+        svc._mark_broken_for_file(str(a), RuntimeError("simulated"))
+        # Both files in the same project skip pyright now.
+        assert svc.enabled_for(str(a)) is False
+        assert svc.enabled_for(str(b)) is False
+    finally:
+        svc.shutdown()
+
+
+def test_unrelated_project_not_affected_by_broken(tmp_path, monkeypatch):
+    """Marking pyright broken for project A must NOT affect project B."""
+    repo_a = _make_git_workspace(tmp_path)
+    repo_b = tmp_path / "repo-b"
+    repo_b.mkdir()
+    (repo_b / ".git").mkdir()
+    (repo_b / "pyproject.toml").write_text("[project]\nname='b'\n")
+    a_src = repo_a / "x.py"
+    a_src.write_text("")
+    b_src = repo_b / "x.py"
+    b_src.write_text("")
+
+    monkeypatch.chdir(str(repo_a))
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=2.0,
+        install_strategy="manual",
+    )
+    try:
+        svc._mark_broken_for_file(str(a_src), RuntimeError("simulated"))
+        # Project A skipped.
+        assert svc.enabled_for(str(a_src)) is False
+        # Project B still enabled — the broken key is per-project.
+        monkeypatch.chdir(str(repo_b))
+        assert svc.enabled_for(str(b_src)) is True
+    finally:
+        svc.shutdown()
+
+
+def test_mark_broken_handles_missing_server_silently(tmp_path):
+    """If the file extension doesn't match any registered server,
+    ``_mark_broken_for_file`` no-ops — nothing to mark."""
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=2.0,
+        install_strategy="manual",
+    )
+    try:
+        # No registered server for .xyz; must not raise.
+        svc._mark_broken_for_file(str(tmp_path / "weird.xyz"), RuntimeError("x"))
+        assert len(svc._broken) == 0
+    finally:
+        svc.shutdown()
+
+
+def test_mark_broken_handles_no_workspace_silently(tmp_path):
+    """File outside any git worktree → no workspace → no key to add."""
+    src = tmp_path / "orphan.py"
+    src.write_text("")
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=2.0,
+        install_strategy="manual",
+    )
+    try:
+        svc._mark_broken_for_file(str(src), RuntimeError("x"))
+        assert len(svc._broken) == 0
+    finally:
+        svc.shutdown()
+
+
+def test_snapshot_failure_marks_broken_via_outer_timeout(tmp_path, monkeypatch):
+    """End-to-end: ``snapshot_baseline``'s outer ``_loop.run`` timeout
+    triggers ``_mark_broken_for_file``, so a second call to
+    ``enabled_for`` returns False."""
+    repo = _make_git_workspace(tmp_path)
+    monkeypatch.chdir(str(repo))
+    src = repo / "x.py"
+    src.write_text("")
+
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=2.0,
+        install_strategy="manual",
+    )
+    try:
+        # Force the inner snapshot coroutine to raise.
+        async def boom(_path):
+            raise RuntimeError("outer-timeout simulated")
+
+        with patch.object(svc, "_snapshot_async", boom):
+            assert svc.enabled_for(str(src)) is True
+            svc.snapshot_baseline(str(src))
+
+        # After the failure, the file's pair is in the broken-set and
+        # ``enabled_for`` skips it.
+        assert ("pyright", str(repo)) in svc._broken
+        assert svc.enabled_for(str(src)) is False
+    finally:
+        svc.shutdown()
--- a/tests/agent/lsp/test_client_e2e.py
+++ b/tests/agent/lsp/test_client_e2e.py
@ -0,0 +1,143 @@
+"""End-to-end client tests against the in-process mock LSP server.
+
+Spins up :file:`_mock_lsp_server.py` as an actual subprocess, drives
+it through real LSP traffic, and asserts diagnostic flow.  This is
+the closest thing we have to integration coverage without requiring
+pyright/gopls/etc. to be installed in CI.
+"""
+from __future__ import annotations
+
+import asyncio
+import os
+import sys
+from pathlib import Path
+
+import pytest
+
+from agent.lsp.client import LSPClient
+
+
+MOCK_SERVER = str(Path(__file__).parent / "_mock_lsp_server.py")
+
+
+def _client(workspace: Path, script: str = "clean") -> LSPClient:
+    env = {"MOCK_LSP_SCRIPT": script, "PYTHONPATH": os.environ.get("PYTHONPATH", "")}
+    return LSPClient(
+        server_id=f"mock-{script}",
+        workspace_root=str(workspace),
+        command=[sys.executable, MOCK_SERVER],
+        env=env,
+        cwd=str(workspace),
+    )
+
+
+@pytest.mark.asyncio
+async def test_client_lifecycle_clean(tmp_path: Path):
+    """Full lifecycle: spawn, initialize, open, get clean diagnostics, shutdown."""
+    f = tmp_path / "x.py"
+    f.write_text("print('hi')\n")
+
+    client = _client(tmp_path, "clean")
+    await client.start()
+    try:
+        assert client.is_running
+        version = await client.open_file(str(f), language_id="python")
+        assert version == 0
+        await client.wait_for_diagnostics(str(f), version, mode="document")
+        diags = client.diagnostics_for(str(f))
+        assert diags == []
+    finally:
+        await client.shutdown()
+    assert not client.is_running
+
+
+@pytest.mark.asyncio
+async def test_client_receives_published_errors(tmp_path: Path):
+    f = tmp_path / "x.py"
+    f.write_text("print('hi')\n")
+
+    client = _client(tmp_path, "errors")
+    await client.start()
+    try:
+        version = await client.open_file(str(f), language_id="python")
+        await client.wait_for_diagnostics(str(f), version, mode="document")
+        diags = client.diagnostics_for(str(f))
+        assert len(diags) == 1
+        d = diags[0]
+        assert d["severity"] == 1
+        assert d["code"] == "MOCK001"
+        assert d["source"] == "mock-lsp"
+        assert "synthetic error" in d["message"]
+    finally:
+        await client.shutdown()
+
+
+@pytest.mark.asyncio
+async def test_client_didchange_bumps_version(tmp_path: Path):
+    f = tmp_path / "x.py"
+    f.write_text("print('hi')\n")
+
+    client = _client(tmp_path, "errors")
+    await client.start()
+    try:
+        v0 = await client.open_file(str(f), language_id="python")
+        f.write_text("print('hi 2')\n")
+        v1 = await client.open_file(str(f), language_id="python")  # re-open path = didChange
+        assert v1 == v0 + 1
+        await client.wait_for_diagnostics(str(f), v1, mode="document")
+        # Mock pushed a diagnostic for both events; merged view has one
+        # entry (push store keyed by file path).
+        diags = client.diagnostics_for(str(f))
+        assert len(diags) == 1
+    finally:
+        await client.shutdown()
+
+
+@pytest.mark.asyncio
+async def test_client_handles_crashing_server(tmp_path: Path):
+    """When the server exits right after initialize, subsequent requests
+    fail gracefully (not hang)."""
+    f = tmp_path / "x.py"
+    f.write_text("")
+
+    client = _client(tmp_path, "crash")
+    await client.start()  # should succeed (mock answers initialize before crashing)
+    # Give the OS a moment to deliver the EOF.
+    await asyncio.sleep(0.2)
+    # The reader loop should detect EOF and mark pending requests as failed.
+    try:
+        await asyncio.wait_for(
+            client.open_file(str(f), language_id="python"), timeout=2.0
+        )
+    except Exception:
+        pass  # any exception is acceptable; the contract is "doesn't hang"
+    await client.shutdown()
+
+
+@pytest.mark.asyncio
+async def test_client_shutdown_idempotent(tmp_path: Path):
+    """Calling shutdown twice must be safe."""
+    f = tmp_path / "x.py"
+    f.write_text("")
+    client = _client(tmp_path, "clean")
+    await client.start()
+    await client.shutdown()
+    await client.shutdown()  # must not raise
+
+
+@pytest.mark.asyncio
+async def test_client_diagnostics_are_deduped(tmp_path: Path):
+    """Repeated identical pushes must not produce duplicate diagnostics."""
+    f = tmp_path / "x.py"
+    f.write_text("")
+    client = _client(tmp_path, "errors")
+    await client.start()
+    try:
+        for _ in range(3):
+            v = await client.open_file(str(f), language_id="python")
+            await client.wait_for_diagnostics(str(f), v, mode="document")
+        diags = client.diagnostics_for(str(f))
+        # Push store overwrites on every notification — should have 1.
+        assert len(diags) == 1
+    finally:
+        await client.shutdown()
--- a/tests/agent/lsp/test_diagnostics_field.py
+++ b/tests/agent/lsp/test_diagnostics_field.py
@ -0,0 +1,146 @@
+"""Tests for the ``lsp_diagnostics`` field on WriteResult / PatchResult.
+
+The field exists so the agent can read syntax errors (``lint``) and
+semantic errors (``lsp_diagnostics``) as separate signals rather than
+having LSP output prepended to the lint string.
+"""
+from __future__ import annotations
+
+import os
+import sys
+import tempfile
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from tools.environments.local import LocalEnvironment
+from tools.file_operations import (
+    PatchResult,
+    ShellFileOperations,
+    WriteResult,
+)
+
+
+# ---------------------------------------------------------------------------
+# Dataclass shape
+# ---------------------------------------------------------------------------
+
+
+def test_writeresult_lsp_diagnostics_optional():
+    r = WriteResult()
+    assert r.lsp_diagnostics is None
+
+
+def test_writeresult_to_dict_omits_field_when_none():
+    r = WriteResult(bytes_written=10)
+    assert "lsp_diagnostics" not in r.to_dict()
+
+
+def test_writeresult_to_dict_includes_field_when_set():
+    r = WriteResult(bytes_written=10, lsp_diagnostics="<diagnostics>...</diagnostics>")
+    d = r.to_dict()
+    assert d["lsp_diagnostics"] == "<diagnostics>...</diagnostics>"
+
+
+def test_patchresult_to_dict_includes_field_when_set():
+    r = PatchResult(success=True, lsp_diagnostics="ERROR [1:1] thing")
+    d = r.to_dict()
+    assert d["lsp_diagnostics"] == "ERROR [1:1] thing"
+
+
+def test_patchresult_to_dict_omits_field_when_none():
+    r = PatchResult(success=True)
+    assert "lsp_diagnostics" not in r.to_dict()
+
+
+def test_patchresult_to_dict_omits_field_when_empty_string():
+    """Empty string counts as falsy — agent shouldn't see an empty field."""
+    r = PatchResult(success=True, lsp_diagnostics="")
+    assert "lsp_diagnostics" not in r.to_dict()
+
+
+# ---------------------------------------------------------------------------
+# Channel separation: lint and lsp_diagnostics stay independent
+# ---------------------------------------------------------------------------
+
+
+def test_lint_and_lsp_diagnostics_are_separate_channels():
+    """A WriteResult can carry BOTH a syntax-error lint AND an LSP
+    diagnostic block.  They belong in separate fields."""
+    r = WriteResult(
+        bytes_written=42,
+        lint={"status": "error", "output": "SyntaxError: ..."},
+        lsp_diagnostics="<diagnostics>ERROR [1:5] type mismatch</diagnostics>",
+    )
+    d = r.to_dict()
+    assert "lint" in d
+    assert "lsp_diagnostics" in d
+    assert d["lint"]["output"] == "SyntaxError: ..."
+    assert "type mismatch" in d["lsp_diagnostics"]
+
+
+# ---------------------------------------------------------------------------
+# write_file populates the field via _maybe_lsp_diagnostics
+# ---------------------------------------------------------------------------
+
+
+def test_write_file_populates_lsp_diagnostics_when_layer_returns_block(tmp_path):
+    """When the LSP layer returns a non-empty block, write_file puts it
+    into the ``lsp_diagnostics`` field — NOT into ``lint.output``."""
+    fops = ShellFileOperations(LocalEnvironment(cwd=str(tmp_path)))
+    target = tmp_path / "x.py"
+
+    block = "<diagnostics file=\"x.py\">\nERROR [1:1] problem\n</diagnostics>"
+
+    with patch.object(fops, "_maybe_lsp_diagnostics", return_value=block):
+        res = fops.write_file(str(target), "x = 1\n")
+
+    assert res.lsp_diagnostics == block
+    # Lint is the syntax check, which is clean for "x = 1" — must NOT
+    # have the LSP block folded into it.
+    assert res.lint == {"status": "ok", "output": ""}
+
+
+def test_write_file_lsp_diagnostics_none_when_layer_returns_empty(tmp_path):
+    fops = ShellFileOperations(LocalEnvironment(cwd=str(tmp_path)))
+    target = tmp_path / "x.py"
+
+    with patch.object(fops, "_maybe_lsp_diagnostics", return_value=""):
+        res = fops.write_file(str(target), "x = 1\n")
+
+    assert res.lsp_diagnostics is None
+
+
+def test_write_file_skips_lsp_when_syntax_failed(tmp_path):
+    """If the syntax check finds errors, the LSP layer should not be
+    consulted (a file that won't parse won't yield meaningful semantic
+    diagnostics)."""
+    fops = ShellFileOperations(LocalEnvironment(cwd=str(tmp_path)))
+    target = tmp_path / "broken.py"
+
+    with patch.object(fops, "_maybe_lsp_diagnostics") as mock_lsp:
+        res = fops.write_file(str(target), "def x(:\n")  # syntax error
+    assert mock_lsp.call_count == 0
+    assert res.lsp_diagnostics is None
+    assert res.lint["status"] == "error"
+
+
+# ---------------------------------------------------------------------------
+# patch_replace propagates the field from the inner write_file
+# ---------------------------------------------------------------------------
+
+
+def test_patch_replace_propagates_lsp_diagnostics(tmp_path):
+    """patch_replace's internal write_file populates lsp_diagnostics —
+    the outer PatchResult must carry it forward."""
+    fops = ShellFileOperations(LocalEnvironment(cwd=str(tmp_path)))
+    target = tmp_path / "x.py"
+    target.write_text("x = 1\n")
+
+    block = "<diagnostics>ERROR [1:5] semantic issue</diagnostics>"
+
+    with patch.object(fops, "_maybe_lsp_diagnostics", return_value=block):
+        res = fops.patch_replace(str(target), "x = 1", "x = 2")
+
+    assert res.success is True
+    assert res.lsp_diagnostics == block
--- a/tests/agent/lsp/test_eventlog.py
+++ b/tests/agent/lsp/test_eventlog.py
@ -0,0 +1,199 @@
+"""Tests for the structured logging dedup model.
+
+The contract: a 1000-write session in one project should emit exactly
+ONE INFO line ("active for <root>") at the default INFO threshold.
+Steady-state events stay at DEBUG; first-time-seen events surface
+once at INFO/WARNING.
+"""
+from __future__ import annotations
+
+import logging
+
+import pytest
+
+from agent.lsp import eventlog
+
+
+@pytest.fixture(autouse=True)
+def _reset():
+    eventlog.reset_announce_caches()
+    yield
+    eventlog.reset_announce_caches()
+
+
+@pytest.fixture
+def caplog_lsp(caplog):
+    caplog.set_level(logging.DEBUG, logger="hermes.lint.lsp")
+    return caplog
+
+
+# ---------------------------------------------------------------------------
+# Steady-state silence (DEBUG)
+# ---------------------------------------------------------------------------
+
+
+def test_clean_emits_at_debug(caplog_lsp):
+    for _ in range(10):
+        eventlog.log_clean("pyright", "/proj/x.py")
+    info_records = [r for r in caplog_lsp.records if r.levelno >= logging.INFO]
+    debug_records = [r for r in caplog_lsp.records if r.levelno == logging.DEBUG]
+    assert info_records == []
+    assert len(debug_records) == 10
+
+
+def test_disabled_emits_at_debug(caplog_lsp):
+    eventlog.log_disabled("pyright", "/x.py", "feature off")
+    eventlog.log_disabled("pyright", "/x.py", "ext not mapped")
+    assert all(r.levelno == logging.DEBUG for r in caplog_lsp.records)
+
+
+# ---------------------------------------------------------------------------
+# State transitions: INFO once, DEBUG thereafter
+# ---------------------------------------------------------------------------
+
+
+def test_active_for_fires_once_per_root(caplog_lsp):
+    for _ in range(50):
+        eventlog.log_active("pyright", "/proj")
+    info_records = [
+        r for r in caplog_lsp.records
+        if r.levelno == logging.INFO and "active for" in r.getMessage()
+    ]
+    assert len(info_records) == 1
+
+
+def test_active_for_fires_per_distinct_root(caplog_lsp):
+    eventlog.log_active("pyright", "/proj-a")
+    eventlog.log_active("pyright", "/proj-b")
+    info = [r for r in caplog_lsp.records if r.levelno == logging.INFO]
+    assert len(info) == 2
+
+
+def test_active_for_separate_per_server(caplog_lsp):
+    eventlog.log_active("pyright", "/proj")
+    eventlog.log_active("typescript", "/proj")
+    info = [r for r in caplog_lsp.records if r.levelno == logging.INFO]
+    assert len(info) == 2
+
+
+def test_no_project_root_fires_once_per_path(caplog_lsp):
+    for _ in range(5):
+        eventlog.log_no_project_root("pyright", "/orphan.py")
+    info = [r for r in caplog_lsp.records if r.levelno == logging.INFO]
+    assert len(info) == 1
+
+
+# ---------------------------------------------------------------------------
+# Diagnostics events fire INFO every time
+# ---------------------------------------------------------------------------
+
+
+def test_diagnostics_always_info(caplog_lsp):
+    for i in range(5):
+        eventlog.log_diagnostics("pyright", f"/x{i}.py", 1)
+    info = [r for r in caplog_lsp.records if r.levelno == logging.INFO]
+    assert len(info) == 5
+    assert all("diags" in r.getMessage() for r in info)
+
+
+# ---------------------------------------------------------------------------
+# Action-required: WARNING once, DEBUG thereafter (or per call for novel events)
+# ---------------------------------------------------------------------------
+
+
+def test_server_unavailable_warns_once_per_binary(caplog_lsp):
+    for _ in range(20):
+        eventlog.log_server_unavailable("pyright", "pyright-langserver")
+    warns = [r for r in caplog_lsp.records if r.levelno == logging.WARNING]
+    assert len(warns) == 1
+    assert "pyright-langserver" in warns[0].getMessage()
+
+
+def test_server_unavailable_separate_per_binary(caplog_lsp):
+    eventlog.log_server_unavailable("pyright", "pyright-langserver")
+    eventlog.log_server_unavailable("typescript", "typescript-language-server")
+    warns = [r for r in caplog_lsp.records if r.levelno == logging.WARNING]
+    assert len(warns) == 2
+
+
+def test_no_server_configured_warns_once(caplog_lsp):
+    for _ in range(10):
+        eventlog.log_no_server_configured("pyright")
+    warns = [r for r in caplog_lsp.records if r.levelno == logging.WARNING]
+    assert len(warns) == 1
+
+
+def test_timeout_warns_every_call(caplog_lsp):
+    for _ in range(3):
+        eventlog.log_timeout("pyright", "/x.py")
+    warns = [r for r in caplog_lsp.records if r.levelno == logging.WARNING]
+    assert len(warns) == 3
+
+
+def test_server_error_warns_every_call(caplog_lsp):
+    for _ in range(3):
+        eventlog.log_server_error("pyright", "/x.py", RuntimeError("boom"))
+    warns = [r for r in caplog_lsp.records if r.levelno == logging.WARNING]
+    assert len(warns) == 3
+
+
+def test_spawn_failed_warns(caplog_lsp):
+    eventlog.log_spawn_failed("pyright", "/proj", FileNotFoundError("nope"))
+    warns = [r for r in caplog_lsp.records if r.levelno == logging.WARNING]
+    assert len(warns) == 1
+    assert "spawn/initialize failed" in warns[0].getMessage()
+
+
+# ---------------------------------------------------------------------------
+# Format: log lines all carry the lsp[<server_id>] prefix for grep
+# ---------------------------------------------------------------------------
+
+
+def test_log_lines_use_lsp_prefix(caplog_lsp):
+    eventlog.log_clean("pyright", "/x.py")
+    eventlog.log_active("pyright", "/proj")
+    eventlog.log_diagnostics("typescript", "/y.ts", 2)
+    for r in caplog_lsp.records:
+        assert r.getMessage().startswith("lsp[")
+
+
+# ---------------------------------------------------------------------------
+# Steady-state contract: 1000 clean writes → 1 INFO at most
+# ---------------------------------------------------------------------------
+
+
+def test_thousand_clean_writes_emit_one_info(caplog_lsp):
+    """A long session writes lots of files cleanly; agent.log should
+    show ONE 'active for' INFO and zero other INFO lines."""
+    eventlog.log_active("pyright", "/proj")
+    for _ in range(1000):
+        eventlog.log_clean("pyright", "/proj/x.py")
+    info_records = [r for r in caplog_lsp.records if r.levelno == logging.INFO]
+    assert len(info_records) == 1
+    assert "active for" in info_records[0].getMessage()
+
+
+# ---------------------------------------------------------------------------
+# Path shortening
+# ---------------------------------------------------------------------------
+
+
+def test_short_path_uses_relative_when_inside_cwd(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path)
+    sub = tmp_path / "x.py"
+    sub.write_text("")
+    out = eventlog._short_path(str(sub))
+    assert out == "x.py"
+
+
+def test_short_path_keeps_absolute_when_outside(tmp_path, monkeypatch):
+    monkeypatch.chdir(tmp_path / "a") if (tmp_path / "a").exists() else None
+    monkeypatch.chdir(tmp_path)
+    other = "/var/log/foo.txt"
+    out = eventlog._short_path(other)
+    # Outside cwd: keeps absolute (no leading "../")
+    assert out == "/var/log/foo.txt" or not out.startswith("..")
+
+
+def test_short_path_handles_empty_string():
+    assert eventlog._short_path("") == ""
--- a/tests/agent/lsp/test_lifecycle.py
+++ b/tests/agent/lsp/test_lifecycle.py
@ -0,0 +1,144 @@
+"""Tests for service-singleton lifecycle: atexit handler, idempotent shutdown.
+
+These cover the exit-cleanup behavior added to plug the language-server
+process leak — without the atexit hook, ``hermes chat`` exits while
+pyright/gopls/etc. are still alive on the host.
+"""
+from __future__ import annotations
+
+import atexit
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from agent import lsp as lsp_module
+
+
+@pytest.fixture(autouse=True)
+def _reset_singleton():
+    """Force a clean module state before each test.
+
+    Tests in this file share process-global state (the lazy
+    singleton + atexit registration flag); reset both before and
+    after every test so order doesn't matter.
+    """
+    lsp_module._service = None
+    lsp_module._atexit_registered = False
+    yield
+    lsp_module._service = None
+    lsp_module._atexit_registered = False
+
+
+def test_get_service_registers_atexit_handler_once(monkeypatch):
+    """First call to ``get_service`` must register an atexit handler;
+    subsequent calls must NOT register another one (Python's ``atexit``
+    runs every registered callable, so a duplicate would shutdown
+    twice — harmless but wasteful)."""
+    fake_svc = MagicMock()
+    fake_svc.is_active.return_value = True
+    monkeypatch.setattr(
+        lsp_module.LSPService, "create_from_config", classmethod(lambda cls: fake_svc)
+    )
+
+    registrations = []
+
+    def fake_register(fn):
+        registrations.append(fn)
+
+    monkeypatch.setattr(atexit, "register", fake_register)
+
+    a = lsp_module.get_service()
+    b = lsp_module.get_service()
+    c = lsp_module.get_service()
+
+    assert a is fake_svc
+    assert b is fake_svc
+    assert c is fake_svc
+    assert len(registrations) == 1
+    # The registered callable must be our internal shutdown wrapper.
+    assert registrations[0] is lsp_module._atexit_shutdown
+
+
+def test_atexit_shutdown_calls_shutdown_service(monkeypatch):
+    """The atexit-registered wrapper invokes ``shutdown_service`` and
+    swallows any exception — by the time atexit fires, the user has
+    already seen the response and a noisy traceback would be clutter."""
+    called = []
+    monkeypatch.setattr(
+        lsp_module, "shutdown_service", lambda: called.append("shutdown")
+    )
+    lsp_module._atexit_shutdown()
+    assert called == ["shutdown"]
+
+
+def test_atexit_shutdown_swallows_exceptions(monkeypatch):
+    def boom():
+        raise RuntimeError("server already dead")
+
+    monkeypatch.setattr(lsp_module, "shutdown_service", boom)
+    # Must not raise.
+    lsp_module._atexit_shutdown()
+
+
+def test_shutdown_service_idempotent(monkeypatch):
+    """Calling shutdown twice must be safe — first call cleans up,
+    second call no-ops (nothing to shut down)."""
+    fake_svc = MagicMock()
+    fake_svc.is_active.return_value = True
+    fake_svc.shutdown = MagicMock()
+    monkeypatch.setattr(
+        lsp_module.LSPService, "create_from_config", classmethod(lambda cls: fake_svc)
+    )
+    monkeypatch.setattr(atexit, "register", lambda fn: None)
+
+    lsp_module.get_service()
+    lsp_module.shutdown_service()
+    lsp_module.shutdown_service()  # must not raise
+
+    assert fake_svc.shutdown.call_count == 1
+
+
+def test_shutdown_service_no_op_when_never_started():
+    """Calling shutdown without ever creating the service is safe."""
+    lsp_module.shutdown_service()  # must not raise
+
+
+def test_shutdown_service_swallows_exception(monkeypatch):
+    """An exception during ``svc.shutdown()`` must not propagate —
+    the caller (often atexit) has nothing useful to do with it."""
+    fake_svc = MagicMock()
+    fake_svc.is_active.return_value = True
+    fake_svc.shutdown = MagicMock(side_effect=RuntimeError("kill -9 already"))
+    monkeypatch.setattr(
+        lsp_module.LSPService, "create_from_config", classmethod(lambda cls: fake_svc)
+    )
+    monkeypatch.setattr(atexit, "register", lambda fn: None)
+
+    lsp_module.get_service()
+    lsp_module.shutdown_service()  # must not raise
+
+
+def test_get_service_returns_none_for_inactive_service(monkeypatch):
+    """A service whose ``is_active()`` returns False is treated as
+    not running — callers see ``None`` and fall back."""
+    fake_svc = MagicMock()
+    fake_svc.is_active.return_value = False
+    monkeypatch.setattr(
+        lsp_module.LSPService, "create_from_config", classmethod(lambda cls: fake_svc)
+    )
+    monkeypatch.setattr(atexit, "register", lambda fn: None)
+
+    assert lsp_module.get_service() is None
+    # Subsequent call returns None too — but the inactive instance is
+    # cached so we don't re-build it on every check.
+    assert lsp_module.get_service() is None
+
+
+def test_get_service_returns_none_when_create_fails(monkeypatch):
+    """Service factory returning ``None`` (no config, etc.) propagates."""
+    monkeypatch.setattr(
+        lsp_module.LSPService, "create_from_config", classmethod(lambda cls: None)
+    )
+    monkeypatch.setattr(atexit, "register", lambda fn: None)
+
+    assert lsp_module.get_service() is None
--- a/tests/agent/lsp/test_protocol.py
+++ b/tests/agent/lsp/test_protocol.py
@ -0,0 +1,197 @@
+"""Tests for the LSP protocol framing layer.
+
+The framer is small but load-bearing — Content-Length parsing is the
+single most common reason for hand-rolled LSP clients to silently
+deadlock.  These tests exercise:
+
+- exact wire format of outgoing messages (encode_message)
+- partial-read tolerance + EOF handling (read_message)
+- envelope helpers (request, response, notification, error)
+- message classification
+"""
+from __future__ import annotations
+
+import asyncio
+import json
+import pytest
+
+from agent.lsp.protocol import (
+    ERROR_CONTENT_MODIFIED,
+    ERROR_METHOD_NOT_FOUND,
+    LSPProtocolError,
+    LSPRequestError,
+    classify_message,
+    encode_message,
+    make_error_response,
+    make_notification,
+    make_request,
+    make_response,
+    read_message,
+)
+
+
+# ---------------------------------------------------------------------------
+# encode_message
+# ---------------------------------------------------------------------------
+
+
+def test_encode_message_uses_compact_separators_and_utf8():
+    msg = {"jsonrpc": "2.0", "id": 1, "method": "x", "params": {"k": "ä"}}
+    out = encode_message(msg)
+    # Header is plain ASCII Content-Length CRLF CRLF
+    header_end = out.index(b"\r\n\r\n") + 4
+    header = out[:header_end].decode("ascii")
+    body = out[header_end:]
+    assert "Content-Length:" in header
+    declared = int(header.split("Content-Length:")[1].split("\r\n")[0].strip())
+    # Declared length must equal actual body bytes.
+    assert declared == len(body)
+    # Body parses as JSON and round-trips.
+    parsed = json.loads(body.decode("utf-8"))
+    assert parsed == msg
+    # Body uses compact separators (no spaces between kv).
+    assert b'"id":1' in body
+
+
+def test_encode_message_handles_unicode_in_strings():
+    msg = {"jsonrpc": "2.0", "method": "log", "params": {"text": "🚀 ünıcödé"}}
+    out = encode_message(msg)
+    header_end = out.index(b"\r\n\r\n") + 4
+    declared = int(out[: out.index(b"\r\n")].split(b": ")[1])
+    assert declared == len(out[header_end:])
+    assert json.loads(out[header_end:].decode("utf-8")) == msg
+
+
+# ---------------------------------------------------------------------------
+# read_message
+# ---------------------------------------------------------------------------
+
+
+async def _stream_from_bytes(data: bytes) -> asyncio.StreamReader:
+    """Build an asyncio.StreamReader pre-populated with ``data``."""
+    reader = asyncio.StreamReader()
+    reader.feed_data(data)
+    reader.feed_eof()
+    return reader
+
+
+@pytest.mark.asyncio
+async def test_read_message_round_trip():
+    msg = {"jsonrpc": "2.0", "method": "ping"}
+    reader = await _stream_from_bytes(encode_message(msg))
+    parsed = await read_message(reader)
+    assert parsed == msg
+
+
+@pytest.mark.asyncio
+async def test_read_message_clean_eof_returns_none():
+    reader = await _stream_from_bytes(b"")
+    assert await read_message(reader) is None
+
+
+@pytest.mark.asyncio
+async def test_read_message_truncated_body_raises():
+    msg = encode_message({"jsonrpc": "2.0", "method": "x"})
+    truncated = msg[: -3]  # cut the body
+    reader = await _stream_from_bytes(truncated)
+    with pytest.raises(LSPProtocolError):
+        await read_message(reader)
+
+
+@pytest.mark.asyncio
+async def test_read_message_missing_content_length_raises():
+    bad = b"X-Other: 5\r\n\r\n12345"
+    reader = await _stream_from_bytes(bad)
+    with pytest.raises(LSPProtocolError):
+        await read_message(reader)
+
+
+@pytest.mark.asyncio
+async def test_read_message_two_messages_back_to_back():
+    a = encode_message({"jsonrpc": "2.0", "method": "a"})
+    b = encode_message({"jsonrpc": "2.0", "method": "b"})
+    reader = await _stream_from_bytes(a + b)
+    assert (await read_message(reader))["method"] == "a"
+    assert (await read_message(reader))["method"] == "b"
+
+
+@pytest.mark.asyncio
+async def test_read_message_rejects_runaway_header():
+    """A pathological server that streams headers without ever emitting
+    the CRLF-CRLF terminator must not loop forever — the 8 KiB cap kicks
+    in and surfaces a protocol error."""
+    flood = (b"X-Junk: " + b"A" * 200 + b"\r\n") * 60   # ~12 KiB worth
+    reader = await _stream_from_bytes(flood)
+    with pytest.raises(LSPProtocolError) as exc:
+        await read_message(reader)
+    assert "8 KiB" in str(exc.value)
+
+
+# ---------------------------------------------------------------------------
+# envelope helpers
+# ---------------------------------------------------------------------------
+
+
+def test_make_request_includes_id_and_method():
+    msg = make_request(7, "ping", {"v": 1})
+    assert msg == {"jsonrpc": "2.0", "id": 7, "method": "ping", "params": {"v": 1}}
+
+
+def test_make_request_omits_params_when_none():
+    msg = make_request(7, "ping", None)
+    assert "params" not in msg
+
+
+def test_make_notification_omits_id():
+    msg = make_notification("log", {"line": "hi"})
+    assert "id" not in msg
+    assert msg["method"] == "log"
+
+
+def test_make_response_carries_result():
+    msg = make_response(7, {"ok": True})
+    assert msg["id"] == 7 and msg["result"] == {"ok": True}
+
+
+def test_make_error_response_shape():
+    msg = make_error_response(7, ERROR_CONTENT_MODIFIED, "stale", {"hint": "retry"})
+    assert msg["error"]["code"] == ERROR_CONTENT_MODIFIED
+    assert msg["error"]["message"] == "stale"
+    assert msg["error"]["data"] == {"hint": "retry"}
+
+
+# ---------------------------------------------------------------------------
+# classify_message
+# ---------------------------------------------------------------------------
+
+
+def test_classify_message_request():
+    msg = {"jsonrpc": "2.0", "id": 1, "method": "x"}
+    assert classify_message(msg) == ("request", 1)
+
+
+def test_classify_message_response():
+    msg = {"jsonrpc": "2.0", "id": 1, "result": None}
+    assert classify_message(msg) == ("response", 1)
+
+
+def test_classify_message_notification():
+    msg = {"jsonrpc": "2.0", "method": "log"}
+    assert classify_message(msg) == ("notification", "log")
+
+
+def test_classify_message_invalid():
+    assert classify_message({"id": 1})[0] == "invalid"
+    assert classify_message({"jsonrpc": "1.0", "method": "x"})[0] == "invalid"
+
+
+# ---------------------------------------------------------------------------
+# LSPRequestError
+# ---------------------------------------------------------------------------
+
+
+def test_lsp_request_error_carries_code_and_data():
+    e = LSPRequestError(ERROR_METHOD_NOT_FOUND, "no", {"x": 1})
+    assert e.code == ERROR_METHOD_NOT_FOUND
+    assert e.message == "no"
+    assert e.data == {"x": 1}
--- a/tests/agent/lsp/test_reporter.py
+++ b/tests/agent/lsp/test_reporter.py
@ -0,0 +1,94 @@
+"""Tests for the diagnostic reporter (formatting layer)."""
+from __future__ import annotations
+
+from agent.lsp.reporter import (
+    DEFAULT_SEVERITIES,
+    MAX_PER_FILE,
+    format_diagnostic,
+    report_for_file,
+    truncate,
+)
+
+
+def _diag(line=0, col=0, sev=1, code="E001", source="ls", msg="oops"):
+    return {
+        "range": {
+            "start": {"line": line, "character": col},
+            "end": {"line": line, "character": col + 1},
+        },
+        "severity": sev,
+        "code": code,
+        "source": source,
+        "message": msg,
+    }
+
+
+def test_format_diagnostic_uses_one_indexed_position():
+    line = format_diagnostic(_diag(line=4, col=2))
+    assert "[5:3]" in line  # +1 on both
+
+
+def test_format_diagnostic_includes_severity_label():
+    assert format_diagnostic(_diag(sev=1)).startswith("ERROR")
+    assert format_diagnostic(_diag(sev=2)).startswith("WARN")
+    assert format_diagnostic(_diag(sev=3)).startswith("INFO")
+    assert format_diagnostic(_diag(sev=4)).startswith("HINT")
+
+
+def test_format_diagnostic_includes_code_and_source():
+    line = format_diagnostic(_diag(code="X42", source="src"))
+    assert "[X42]" in line
+    assert "(src)" in line
+
+
+def test_format_diagnostic_omits_missing_optional_fields():
+    line = format_diagnostic(
+        {
+            "range": {
+                "start": {"line": 0, "character": 0},
+                "end": {"line": 0, "character": 0},
+            },
+            "severity": 1,
+            "message": "bare",
+        }
+    )
+    assert "[" not in line.split("]", 1)[1]  # no extra brackets after the position
+    assert "(" not in line
+
+
+def test_report_for_file_returns_empty_when_only_warnings():
+    """Default severity filter is ERROR-only."""
+    report = report_for_file("/x.py", [_diag(sev=2)])
+    assert report == ""
+
+
+def test_report_for_file_emits_block_with_errors():
+    diag = _diag(msg="real error")
+    report = report_for_file("/x.py", [diag])
+    assert "<diagnostics file=\"/x.py\">" in report
+    assert "real error" in report
+    assert "</diagnostics>" in report
+
+
+def test_report_for_file_caps_at_max_per_file():
+    diags = [_diag(line=i) for i in range(MAX_PER_FILE + 5)]
+    report = report_for_file("/x.py", diags)
+    assert "and 5 more" in report
+
+
+def test_report_for_file_respects_custom_severities():
+    diag = _diag(sev=2, msg="warn")
+    report = report_for_file("/x.py", [diag], severities=frozenset({1, 2}))
+    assert "warn" in report
+
+
+def test_truncate_below_limit_unchanged():
+    s = "abc" * 100
+    assert truncate(s, limit=4000) == s
+
+
+def test_truncate_above_limit_appends_marker():
+    s = "x" * 10000
+    out = truncate(s, limit=200)
+    assert out.endswith("[truncated]")
+    assert len(out) <= 200
--- a/tests/agent/lsp/test_service.py
+++ b/tests/agent/lsp/test_service.py
@ -0,0 +1,149 @@
+"""Tests for the synchronous LSPService wrapper.
+
+Drives the service through ``snapshot_baseline`` →
+``get_diagnostics_sync`` against the mock LSP server, exercising the
+delta filter that ``tools/file_operations._check_lint_delta`` relies
+on.
+"""
+from __future__ import annotations
+
+import os
+import sys
+from pathlib import Path
+
+import pytest
+
+from agent.lsp.manager import LSPService
+from agent.lsp.servers import (
+    SERVERS,
+    ServerContext,
+    ServerDef,
+    SpawnSpec,
+    find_server_for_file,
+)
+
+
+MOCK_SERVER = str(Path(__file__).parent / "_mock_lsp_server.py")
+
+
+def _install_mock_server(monkeypatch, script: str = "errors", server_id: str = "pyright"):
+    """Replace one registered server with a wrapper that spawns the mock.
+
+    We reuse ``pyright`` so .py files route to it.  This keeps the
+    test free of any LSP toolchain dependency.
+    """
+    target_index = next(i for i, s in enumerate(SERVERS) if s.server_id == server_id)
+    original = SERVERS[target_index]
+
+    def _spawn(root: str, ctx: ServerContext) -> SpawnSpec:
+        env = {"MOCK_LSP_SCRIPT": script}
+        return SpawnSpec(
+            command=[sys.executable, MOCK_SERVER],
+            workspace_root=root,
+            cwd=root,
+            env=env,
+            initialization_options={},
+        )
+
+    replacement = ServerDef(
+        server_id=server_id,
+        extensions=original.extensions,
+        resolve_root=lambda fp, ws: ws,  # always use workspace root
+        build_spawn=_spawn,
+        seed_first_push=False,
+        description="mock " + server_id,
+    )
+    # Patch the SERVERS list element directly + restore on teardown.
+    SERVERS[target_index] = replacement
+
+    yield
+
+    SERVERS[target_index] = original
+
+
+@pytest.fixture
+def mock_pyright(monkeypatch, tmp_path):
+    """Install the mock as ``pyright`` and create a fake git workspace."""
+    repo = tmp_path / "repo"
+    repo.mkdir()
+    (repo / ".git").mkdir()
+    (repo / "pyproject.toml").write_text("")  # so pyright's root resolver finds it
+    monkeypatch.chdir(str(repo))
+    gen = _install_mock_server(monkeypatch, "errors", "pyright")
+    next(gen)
+    yield repo
+    try:
+        next(gen)
+    except StopIteration:
+        pass
+
+
+def test_service_returns_empty_when_disabled(tmp_path):
+    svc = LSPService(
+        enabled=False,
+        wait_mode="document",
+        wait_timeout=2.0,
+        install_strategy="auto",
+    )
+    assert not svc.is_active()
+    f = tmp_path / "x.py"
+    f.write_text("")
+    assert svc.get_diagnostics_sync(str(f)) == []
+    svc.shutdown()
+
+
+def test_service_skips_files_outside_workspace(tmp_path):
+    """Files outside any git worktree must not trigger LSP."""
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=2.0,
+        install_strategy="manual",
+    )
+    f = tmp_path / "x.py"
+    f.write_text("")
+    # No .git anywhere — service should report not enabled for this file.
+    assert not svc.enabled_for(str(f))
+    svc.shutdown()
+
+
+def test_service_e2e_delta_filter(mock_pyright):
+    """End-to-end: snapshot baseline → wait → delta returned."""
+    repo = mock_pyright
+    f = repo / "x.py"
+    f.write_text("print('hi')\n")
+
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=3.0,
+        install_strategy="manual",
+    )
+    try:
+        assert svc.enabled_for(str(f))
+        # Baseline first — server pushes 1 error.
+        svc.snapshot_baseline(str(f))
+        # Re-poll: same error is in baseline, so delta is empty.
+        new_diags = svc.get_diagnostics_sync(str(f))
+        assert new_diags == []
+    finally:
+        svc.shutdown()
+
+
+def test_service_status_includes_clients(mock_pyright):
+    repo = mock_pyright
+    f = repo / "x.py"
+    f.write_text("")
+    svc = LSPService(
+        enabled=True,
+        wait_mode="document",
+        wait_timeout=3.0,
+        install_strategy="manual",
+    )
+    try:
+        svc.get_diagnostics_sync(str(f))
+        info = svc.get_status()
+        assert info["enabled"] is True
+        assert any(c["server_id"] == "pyright" for c in info["clients"])
+    finally:
+        svc.shutdown()
--- a/tests/agent/lsp/test_workspace.py
+++ b/tests/agent/lsp/test_workspace.py
@ -0,0 +1,139 @@
+"""Tests for workspace + project-root resolution."""
+from __future__ import annotations
+
+import os
+from pathlib import Path
+
+import pytest
+
+from agent.lsp.workspace import (
+    clear_cache,
+    find_git_worktree,
+    is_inside_workspace,
+    nearest_root,
+    normalize_path,
+    resolve_workspace_for_file,
+)
+
+
+@pytest.fixture(autouse=True)
+def _clear():
+    clear_cache()
+    yield
+    clear_cache()
+
+
+def test_find_git_worktree_returns_none_outside_repo(tmp_path: Path):
+    sub = tmp_path / "sub"
+    sub.mkdir()
+    assert find_git_worktree(str(sub)) is None
+
+
+def test_find_git_worktree_finds_dotgit(tmp_path: Path):
+    repo = tmp_path / "repo"
+    repo.mkdir()
+    (repo / ".git").mkdir()
+    sub = repo / "src" / "deep"
+    sub.mkdir(parents=True)
+    assert find_git_worktree(str(sub)) == str(repo)
+
+
+def test_find_git_worktree_handles_dotgit_file(tmp_path: Path):
+    """``.git`` can also be a file (gitfile pointing into a worktree)."""
+    repo = tmp_path / "repo"
+    repo.mkdir()
+    (repo / ".git").write_text("gitdir: /elsewhere\n")
+    assert find_git_worktree(str(repo)) == str(repo)
+
+
+def test_is_inside_workspace_true_for_subpath(tmp_path: Path):
+    root = tmp_path / "p"
+    root.mkdir()
+    sub = root / "x" / "y.py"
+    sub.parent.mkdir(parents=True)
+    sub.write_text("")
+    assert is_inside_workspace(str(sub), str(root))
+
+
+def test_is_inside_workspace_false_for_unrelated(tmp_path: Path):
+    a = tmp_path / "a"
+    b = tmp_path / "b"
+    a.mkdir()
+    b.mkdir()
+    f = b / "x.py"
+    f.write_text("")
+    assert not is_inside_workspace(str(f), str(a))
+
+
+def test_nearest_root_finds_first_marker(tmp_path: Path):
+    root = tmp_path / "p"
+    deep = root / "src" / "pkg"
+    deep.mkdir(parents=True)
+    (root / "pyproject.toml").write_text("")
+    found = nearest_root(str(deep / "mod.py"), ["pyproject.toml"])
+    assert found == str(root)
+
+
+def test_nearest_root_excludes_take_priority(tmp_path: Path):
+    """If an exclude marker matches first, return None."""
+    root = tmp_path / "p"
+    sub = root / "deno-app"
+    sub.mkdir(parents=True)
+    (sub / "deno.json").write_text("{}")
+    (root / "package.json").write_text("{}")  # would match if not for exclude
+    found = nearest_root(
+        str(sub / "main.ts"),
+        ["package.json"],
+        excludes=["deno.json"],
+    )
+    assert found is None
+
+
+def test_nearest_root_returns_none_when_no_marker(tmp_path: Path):
+    f = tmp_path / "x.py"
+    f.write_text("")
+    assert nearest_root(str(f), ["pyproject.toml"]) is None
+
+
+def test_resolve_workspace_for_file_uses_cwd_first(tmp_path: Path, monkeypatch):
+    repo = tmp_path / "repo"
+    (repo / ".git").mkdir(parents=True)
+    file_path = repo / "x.py"
+    file_path.write_text("")
+    # cwd is inside the repo
+    monkeypatch.chdir(str(repo))
+    root, gated = resolve_workspace_for_file(str(file_path))
+    assert root == str(repo)
+    assert gated is True
+
+
+def test_resolve_workspace_for_file_no_repo_returns_none(tmp_path: Path, monkeypatch):
+    monkeypatch.chdir(str(tmp_path))
+    f = tmp_path / "x.py"
+    f.write_text("")
+    root, gated = resolve_workspace_for_file(str(f))
+    assert root is None
+    assert gated is False
+
+
+def test_resolve_workspace_falls_back_to_file_location(tmp_path: Path, monkeypatch):
+    """When cwd isn't a git repo but the file is inside one, we still
+    discover the workspace from the file's path."""
+    not_a_repo = tmp_path / "loose"
+    not_a_repo.mkdir()
+    monkeypatch.chdir(str(not_a_repo))
+
+    repo = tmp_path / "actual-repo"
+    (repo / ".git").mkdir(parents=True)
+    f = repo / "x.py"
+    f.write_text("")
+
+    root, gated = resolve_workspace_for_file(str(f))
+    assert root == str(repo)
+    assert gated is True
+
+
+def test_normalize_path_expands_tilde(monkeypatch):
+    monkeypatch.setenv("HOME", "/home/user")
+    p = normalize_path("~/x.py")
+    assert p == os.path.abspath("/home/user/x.py")
--- a/tools/file_operations.py
+++ b/tools/file_operations.py
@ -120,6 +120,13 @@ class WriteResult:
    bytes_written: int = 0
    dirs_created: bool = False
    lint: Optional[Dict[str, Any]] = None
+    # Semantic diagnostics from the LSP layer, when applicable.  Kept in
+    # its own field (not folded into ``lint``) so the model and any
+    # downstream parsers can read syntax errors and semantic errors as
+    # separate signals.  ``None`` when LSP is disabled, when the file
+    # isn't in a git workspace, or when no diagnostics were introduced
+    # by this edit.
+    lsp_diagnostics: Optional[str] = None
    error: Optional[str] = None
    warning: Optional[str] = None

@ -136,6 +143,8 @@ class PatchResult:
    files_created: List[str] = field(default_factory=list)
    files_deleted: List[str] = field(default_factory=list)
    lint: Optional[Dict[str, Any]] = None
+    # See :class:`WriteResult.lsp_diagnostics`.
+    lsp_diagnostics: Optional[str] = None
    error: Optional[str] = None
    
    def to_dict(self) -> dict:
@ -150,6 +159,8 @@ class PatchResult:
            result["files_deleted"] = self.files_deleted
        if self.lint:
            result["lint"] = self.lint
+        if self.lsp_diagnostics:
+            result["lsp_diagnostics"] = self.lsp_diagnostics
        if self.error:
            result["error"] = self.error
        return result
@ -867,6 +878,13 @@ class ShellFileOperations(FileOperations):
            if read_result.exit_code == 0 and read_result.stdout:
                pre_content = read_result.stdout

+        # Snapshot LSP diagnostics for this file (best-effort) so the
+        # post-write LSP layer can return only diagnostics introduced
+        # by this specific edit.  Mirrors claude-code's
+        # ``beforeFileEdited`` pattern but wired to the local LSP
+        # rather than an external IDE.
+        self._snapshot_lsp_baseline(path)
+
        # Create parent directories
        parent = os.path.dirname(path)
        dirs_created = False
@ -897,10 +915,21 @@ class ShellFileOperations(FileOperations):
        # Post-write lint with delta refinement.
        lint_result = self._check_lint_delta(path, pre_content=pre_content, post_content=content)

+        # Semantic diagnostics from the LSP layer — separate channel.
+        # Only fired when the syntax tier reported clean (no point asking
+        # an LSP for a file that won't even parse).  Best-effort:
+        # ``""`` is returned for any failure path.
+        lsp_diagnostics: Optional[str] = None
+        if lint_result.success or lint_result.skipped:
+            block = self._maybe_lsp_diagnostics(path)
+            if block:
+                lsp_diagnostics = block
+
        return WriteResult(
            bytes_written=bytes_written,
            dirs_created=dirs_created,
            lint=lint_result.to_dict() if lint_result else None,
+            lsp_diagnostics=lsp_diagnostics,
        )
    
    # =========================================================================
@ -996,7 +1025,14 @@ class ShellFileOperations(FileOperations):
            success=True,
            diff=diff,
            files_modified=[path],
-            lint=lint_result.to_dict() if lint_result else None
+            lint=lint_result.to_dict() if lint_result else None,
+            # Propagate the LSP diagnostics already captured by the
+            # internal ``write_file`` call.  Its baseline was the
+            # pre-patch content (taken at the start of write_file via
+            # ``_snapshot_lsp_baseline``) so the delta is correct for
+            # the patch as a whole.  Keep the field separate from the
+            # syntax-check ``lint`` so the agent can read both signals.
+            lsp_diagnostics=write_result.lsp_diagnostics,
        )
    
    def patch_v4a(self, patch_content: str) -> PatchResult:
@ -1089,21 +1125,25 @@ class ShellFileOperations(FileOperations):
    def _check_lint_delta(self, path: str, pre_content: Optional[str],
                          post_content: Optional[str] = None) -> LintResult:
        """
-        Run post-write lint with pre-write baseline comparison.
+        Run post-write syntax lint with pre-write baseline comparison.

-        Strategy (post-first, pre-lazy):
-        1. Lint the post-write state.  If clean → return clean immediately.
-           This is the hot path and matches _check_lint() in cost.
-        2. If post-lint found errors AND we have pre-write content, lint
-           that too.  If the pre-write file was already broken, return only
-           the *new* errors introduced by this edit — errors that existed
-           before aren't the agent's problem to chase right now.
-        3. If pre_content is None (new file or unavailable), skip the delta
-           step and return all post-write errors.
+        Two-tier strategy:

-        This mirrors Cline's and OpenCode's post-edit LSP pattern: surface
-        only the errors this specific edit introduced, so the agent doesn't
-        get distracted by pre-existing problems.
+        1. **Syntax check** (in-process or shell-based, microseconds).
+           Catches the bug class that motivated this layer: corrupt
+           writes, mashed quotes, truncated output.  Hot path.
+
+        2. **Delta refinement against pre-write content** when the
+           syntax tier reports errors.  Filter out errors that already
+           existed pre-edit so the agent isn't distracted by inherited
+           state.
+
+        Semantic diagnostics from the LSP layer are fetched separately
+        via :meth:`_maybe_lsp_diagnostics` and surfaced in the
+        ``lsp_diagnostics`` field on :class:`WriteResult` /
+        :class:`PatchResult`.  Keeping the two channels separate lets
+        the agent (and any downstream parsers) read syntax errors and
+        semantic errors as independent signals.

        Args:
            path: File path (for linter selection).
@ -1122,12 +1162,12 @@ class ShellFileOperations(FileOperations):
        """
        post = self._check_lint(path, content=post_content)

-        # Hot path: clean post-write, no pre-lint needed.
+        # Hot path: clean post-write syntactically.
        if post.success or post.skipped:
            return post

-        # Post-write has errors.  If we have pre-content, run the delta
-        # refinement to filter out pre-existing errors.
+        # Post-write has syntax errors.  If we have pre-content, run the
+        # delta refinement to filter out pre-existing errors.
        if pre_content is None:
            return post

@ -1166,6 +1206,91 @@ class ShellFileOperations(FileOperations):
                "(pre-existing errors filtered out):\n" + "\n".join(post_lines)
            )
        )
+
+    def _lsp_local_only(self) -> bool:
+        """Return True iff this FileOperations is wired to a local backend.
+
+        LSP servers run on the host process — they need access to the
+        files they're linting.  Remote/sandboxed backends (Docker,
+        Modal, SSH, Daytona) keep files inside the sandbox where the
+        host-side LSP server can't reach them, so we skip the LSP
+        path for those entirely.
+        """
+        env = getattr(self, "env", None)
+        if env is None:
+            # Defensive: some tests construct ShellFileOperations via
+            # ``__new__`` without going through ``__init__``, so
+            # ``self.env`` may be missing.  No env = no LSP path.
+            return False
+        try:
+            from tools.environments.local import LocalEnvironment
+        except Exception:  # noqa: BLE001
+            return False
+        return isinstance(env, LocalEnvironment)
+
+    def _snapshot_lsp_baseline(self, path: str) -> None:
+        """Capture pre-edit LSP diagnostics so the post-write delta is correct.
+
+        Best-effort.  Silent on every failure path — LSP is an
+        enrichment layer and must never break a write.
+
+        Skipped entirely on non-local backends (Docker, Modal, SSH,
+        etc.) — the server can't see files inside the sandbox.
+        """
+        if not self._lsp_local_only():
+            return
+        try:
+            from agent.lsp import get_service
+            svc = get_service()
+        except Exception:  # noqa: BLE001
+            return
+        if svc is None:
+            return
+        try:
+            svc.snapshot_baseline(path)
+        except Exception:  # noqa: BLE001
+            pass
+
+    def _maybe_lsp_diagnostics(self, path: str) -> str:
+        """Best-effort LSP semantic diagnostics for ``path``.
+
+        Returns a formatted ``<diagnostics>`` block, or empty string
+        when LSP is unavailable / disabled / produced no errors.
+
+        Wraps everything in a try/except so a misbehaving LSP server
+        can't break a write.  This intentionally swallows all errors
+        — the calling tier already returned a clean syntax result, so
+        ``""`` here just means "no extra info to add".
+
+        Skipped entirely on non-local backends (Docker, Modal, SSH,
+        etc.) — same reasoning as ``_snapshot_lsp_baseline``.
+        """
+        if not self._lsp_local_only():
+            return ""
+        try:
+            from agent.lsp import get_service
+        except Exception:  # noqa: BLE001
+            return ""
+        try:
+            svc = get_service()
+        except Exception:  # noqa: BLE001
+            return ""
+        if svc is None or not svc.enabled_for(path):
+            return ""
+        try:
+            diagnostics = svc.get_diagnostics_sync(path, delta=True)
+        except Exception:  # noqa: BLE001
+            return ""
+        if not diagnostics:
+            return ""
+        try:
+            from agent.lsp.reporter import report_for_file, truncate
+            block = report_for_file(path, diagnostics)
+            if not block:
+                return ""
+            return truncate("LSP diagnostics introduced by this edit:\n" + block)
+        except Exception:  # noqa: BLE001
+            return ""
    
    # =========================================================================
    # SEARCH Implementation
--- a/website/docs/reference/cli-commands.md
+++ b/website/docs/reference/cli-commands.md
@ -40,6 +40,7 @@ hermes [global-options] <command> [subcommand/options]
 | `hermes model` | Interactively choose the default provider and model. |
 | `hermes fallback` | Manage fallback providers tried when the primary model errors. |
 | `hermes gateway` | Run or manage the messaging gateway service. |
+| `hermes lsp` | Manage Language Server Protocol integration (semantic diagnostics for write_file/patch). |
 | `hermes setup` | Interactive setup wizard for all or part of the configuration. |
 | `hermes whatsapp` | Configure and pair the WhatsApp bridge. |
 | `hermes slack` | Slack helpers (currently: generate the app manifest with every command as a native slash). |
@ -223,6 +224,33 @@ Options:
 Use `hermes gateway run` instead of `hermes gateway start` — WSL's systemd support is unreliable. Wrap it in tmux for persistence: `tmux new -s hermes 'hermes gateway run'`. See [WSL FAQ](/docs/reference/faq#wsl-gateway-keeps-disconnecting-or-hermes-gateway-start-fails) for details.
 :::

+## `hermes lsp`
+
+```bash
+hermes lsp <subcommand>
+```
+
+Manage the Language Server Protocol integration. LSP runs real
+language servers (pyright, gopls, rust-analyzer, …) in the
+background and feeds their diagnostics into the post-write check
+used by `write_file` and `patch`. Gated on git workspace detection
+— LSP only runs when the cwd or edited file is inside a git
+worktree.
+
+Subcommands:
+
+| Subcommand | Description |
+|------------|-------------|
+| `status` | Show service state, configured servers, install status. |
+| `list` | Print the registry of supported servers. Pass `--installed-only` to skip missing ones. |
+| `install <id>` | Eagerly install one server's binary. |
+| `install-all` | Install every server with a known auto-install recipe. |
+| `restart` | Tear down running clients so the next edit re-spawns. |
+| `which <id>` | Print the resolved binary path for one server. |
+
+See [LSP — Semantic Diagnostics](/docs/user-guide/features/lsp) for
+the full guide, supported languages, and configuration knobs.
+
 ## `hermes setup`

 ```bash
--- a/website/docs/user-guide/features/lsp.md
+++ b/website/docs/user-guide/features/lsp.md
@ -0,0 +1,228 @@
+---
+sidebar_position: 16
+title: "LSP — Semantic Diagnostics"
+description: "Real language servers (pyright, gopls, rust-analyzer, …) wired into the post-write lint check used by write_file and patch."
+---
+
+# Language Server Protocol (LSP)
+
+Hermes runs full language servers — pyright, gopls, rust-analyzer,
+typescript-language-server, clangd, and ~20 more — as background
+subprocesses and feeds their semantic diagnostics into the post-write
+lint check used by `write_file` and `patch`. When the agent edits a
+file, it sees exactly the errors that edit introduced — not just
+syntax errors, but **type errors, undefined names, missing imports,
+and project-wide semantic issues** the language server detects.
+
+This is the same architecture top-tier coding agents use. Hermes
+ships it self-contained: no editor host required, no plugins to
+install, no separate daemon to manage.
+
+## When LSP runs
+
+LSP is gated on **git workspace detection**. When the agent's working
+directory (or the file being edited) is inside a git worktree, LSP
+runs against that workspace. When neither is in a git repo, LSP
+stays dormant — useful for messaging gateways where the cwd is the
+user's home directory and there's no project to diagnose.
+
+The check is layered: in-process syntax check first (microseconds),
+then LSP diagnostics second when syntax is clean. A flaky or missing
+language server can never break a write — every LSP failure path
+falls back silently to the syntax-only result.
+
+Concretely, on every successful `write_file` or `patch`:
+
+1. Hermes captures a baseline of current diagnostics for the file.
+2. Performs the write.
+3. Re-queries the language server, filters out diagnostics that were
+   already in the baseline, and surfaces only the new ones.
+
+The agent sees output like:
+
+```
+{
+  "bytes_written": 42,
+  "dirs_created": false,
+  "lint": {"status": "ok", "output": ""},
+  "lsp_diagnostics": "LSP diagnostics introduced by this edit:\n<diagnostics file=\"/path/to/foo.py\">\nERROR [42:5] Cannot find name 'foo' [reportUndefinedVariable] (Pyright)\nERROR [50:1] Argument of type \"str\" is not assignable to \"int\" [reportArgumentType] (Pyright)\n</diagnostics>"
+}
+```
+
+The `lint` field carries the syntax-check result (microsecond
+in-process parse via `ast.parse`, `json.loads`, etc.); the
+`lsp_diagnostics` field carries the semantic diagnostics from the
+real language server. Two channels, independent signals — the
+agent sees a syntax-clean file with semantic problems as
+``lint: ok`` plus a populated ``lsp_diagnostics``.
+
+## Supported languages
+
+| Language | Server | Auto-install |
+|----------|--------|--------------|
+| Python | `pyright-langserver` | npm |
+| TypeScript / JavaScript / JSX / TSX | `typescript-language-server` | npm |
+| Vue | `@vue/language-server` | npm |
+| Svelte | `svelte-language-server` | npm |
+| Astro | `@astrojs/language-server` | npm |
+| Go | `gopls` | `go install` |
+| Rust | `rust-analyzer` | manual (rustup) |
+| C / C++ | `clangd` | manual (LLVM) |
+| Bash / Zsh | `bash-language-server` | npm |
+| YAML | `yaml-language-server` | npm |
+| Lua | `lua-language-server` | manual (GitHub releases) |
+| PHP | `intelephense` | npm |
+| OCaml | `ocaml-lsp` | manual (opam) |
+| Dockerfile | `dockerfile-language-server-nodejs` | npm |
+| Terraform | `terraform-ls` | manual |
+| Dart | `dart language-server` | manual (dart sdk) |
+| Haskell | `haskell-language-server` | manual (ghcup) |
+| Julia | `julia` + LanguageServer.jl | manual |
+| Clojure | `clojure-lsp` | manual |
+| Nix | `nixd` | manual |
+| Zig | `zls` | manual |
+| Gleam | `gleam lsp` | manual (gleam install) |
+| Elixir | `elixir-ls` | manual |
+| Prisma | `prisma language-server` | manual |
+| Kotlin | `kotlin-language-server` | manual |
+| Java | `jdtls` | manual |
+
+For "manual" entries, install the server through whatever toolchain
+manager makes sense for that language (rustup, ghcup, opam, brew,
+…). Hermes auto-detects the binary on PATH or in
+`<HERMES_HOME>/lsp/bin/`.
+
+## CLI
+
+```
+hermes lsp status          # service state + per-server install status
+hermes lsp list            # registry, optionally --installed-only
+hermes lsp install <id>    # eagerly install one server
+hermes lsp install-all     # try every server with a known recipe
+hermes lsp restart         # tear down running clients
+hermes lsp which <id>      # print resolved binary path
+```
+
+`hermes lsp status` is the best starting point — it shows which
+languages will get semantic diagnostics today and which need a
+binary installed.
+
+## Configuration
+
+The defaults work for typical setups; nothing to set if the binaries
+are on PATH.
+
+```yaml
+# config.yaml
+lsp:
+  # Master toggle. Disabling skips the entire subsystem — no servers
+  # spawn, no background event loop runs.
+  enabled: true
+
+  # How long to wait for diagnostics after each write.
+  wait_mode: document      # "document" or "full"
+  wait_timeout: 5.0
+
+  # How to handle missing server binaries.
+  #   auto    — install via npm/pip/go install into <HERMES_HOME>/lsp/bin
+  #   manual  — only use binaries already on PATH
+  install_strategy: auto
+
+  # Per-server overrides (all optional).
+  servers:
+    pyright:
+      disabled: false
+      command: ["/abs/path/to/pyright-langserver", "--stdio"]
+      env: { PYRIGHT_LOG_LEVEL: "info" }
+      initialization_options:
+        python:
+          analysis:
+            typeCheckingMode: "strict"
+    typescript:
+      disabled: true       # skip TS even when its extensions match
+```
+
+### Per-server keys
+
+* `disabled: true` — skip this server entirely even when its
+  extensions match a file.
+* `command: [bin, ...args]` — pin a custom binary path. Bypasses
+  auto-install.
+* `env: {KEY: value}` — extra env vars passed to the spawned process.
+* `initialization_options: {...}` — merged into the LSP
+  `initializationOptions` payload sent in the `initialize`
+  handshake. Server-specific; consult the language server's docs.
+
+## Installation locations
+
+When `install_strategy: auto`, Hermes installs binaries into
+`<HERMES_HOME>/lsp/bin/`. NPM packages land in
+`<HERMES_HOME>/lsp/node_modules/` with bin symlinks one level up.
+Go binaries come from `go install` with `GOBIN` pointed at the
+staging dir.
+
+Nothing is ever installed to `/usr/local/`, `~/.local/`, or any other
+shared location — the staging dir is fully Hermes-owned and is
+removed when you reset the profile.
+
+## Performance characteristics
+
+LSP servers are **lazy-spawned** on first use. Editing a Python file
+in a project that's never seen `.py` traffic spawns pyright; the
+spawn takes 1-3 seconds for most servers (rust-analyzer can take 10+
+on a cold project). Subsequent edits in the same workspace re-use
+the running server.
+
+The LSP layer adds a few milliseconds to clean writes when no
+diagnostics are emitted. When diagnostics are emitted, the wait
+budget is `wait_timeout` seconds — typically the server responds in
+tens of milliseconds for pyright/tsserver and a few seconds for
+rust-analyzer mid-indexing.
+
+Servers are kept alive for the life of the Hermes process. There's
+no idle-timeout reaper — the cost of restarting the server's index
+on every write would be far higher than holding the daemon.
+
+## Disabling
+
+Set `lsp.enabled: false` in `config.yaml` to disable the entire
+subsystem. The post-write check falls back to the in-process syntax
+check (`ast.parse` for Python, `json.loads` for JSON, etc.) which
+ships unchanged from earlier versions.
+
+To disable a single language without disabling the whole layer:
+
+```yaml
+lsp:
+  servers:
+    rust-analyzer:
+      disabled: true
+```
+
+## Troubleshooting
+
+**`hermes lsp status` shows a server as "missing"**
+
+The binary isn't on PATH and isn't in `<HERMES_HOME>/lsp/bin/`. Run
+`hermes lsp install <server_id>` to attempt an auto-install, or
+install the binary manually through the language's normal toolchain.
+
+**Server starts but never returns diagnostics**
+
+Check `~/.hermes/logs/agent.log` for `[agent.lsp.client]` entries —
+both stderr from the language server and protocol errors land
+there. Some servers (rust-analyzer especially) need to finish a
+project-wide index before they emit per-file diagnostics; the first
+edit after server start may complete with no diagnostics, with
+subsequent edits picking them up.
+
+**Server crashed**
+
+A crashed server is added to the broken-set and won't be retried for
+the rest of the session. Run `hermes lsp restart` to clear the set;
+the next edit re-spawns.
+
+**Editing a file outside any git repo**
+
+By design, LSP only runs inside git worktrees. Run `git init` in the
+project, or accept the in-process syntax-only fallback.
--- a/website/sidebars.ts
+++ b/website/sidebars.ts
@ -49,6 +49,7 @@ const sidebars: SidebarsConfig = {
          items: [
            'user-guide/features/tools',
            'user-guide/features/skills',
+            'user-guide/features/lsp',
            'user-guide/features/curator',
            'user-guide/features/memory',
            'user-guide/features/memory-providers',