feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168)

* feat(lsp): semantic diagnostics from real language servers in write_file/patch Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server, clangd, bash-language-server, ...) into the post-write lint check used by write_file and patch. The model now sees type errors, undefined names, missing imports, and project-wide semantic issues introduced by its edits, not just syntax errors. LSP is gated on git workspace detection: when the agent's cwd or the file being edited is inside a git worktree, LSP runs against that workspace; otherwise the existing in-process syntax checks are the only tier. This keeps users on user-home cwds (Telegram/Discord gateway chats) from spawning daemons. The post-write check is layered: in-process syntax check first (microseconds), then LSP semantic diagnostics second when syntax is clean. Diagnostics are delta-filtered against a baseline captured at write start, so the agent only sees errors its edit introduced. A flaky/missing language server can never break a write -- every LSP failure path falls back silently to the syntax-only result. New module agent/lsp/ split into: - protocol.py: Content-Length JSON-RPC framer + envelope helpers - client.py: async LSPClient (spawn, initialize, didOpen/didChange, ContentModified retry, push/pull diagnostic stores) - workspace.py: git worktree walk-up + per-server NearestRoot resolver - servers.py: registry of 26 language servers (extension match, root resolver, spawn builder per language) - install.py: auto-install dispatch (npm install --prefix, go install with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/ - manager.py: LSPService (per-(server_id, root) client registry, lazy spawn, broken-set, in-flight dedupe, sync facade for tools layer) - reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file) - cli.py: hermes lsp {status,list,install,install-all,restart,which} Wired into tools/file_operations.py: - write_file/patch_replace now call _snapshot_lsp_baseline before write - _check_lint_delta gains a third tier: LSP semantic diagnostics when syntax is clean - All LSP code paths swallow exceptions; write_file's contract unchanged Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true), wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server overrides (disabled, command, env, initialization_options). Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and read_message round-trip, EOF/truncation/missing Content-Length), workspace gate (git walk-up, exclude markers, fallback to file location), reporter (severity filter, max-per-file cap, truncation), service-level delta filter, and an in-process mock LSP server that exercises the full client lifecycle including didChange version bumps, dedup, crash recovery, and idempotent teardown. Live E2E verified end-to-end through ShellFileOperations: pyright auto-installed via npm into HERMES_HOME, baseline captured, type error introduced, single delta diagnostic surfaced with correct line/column/code/ source, then patch fix removes the diagnostic from the output. Docs: new website/docs/user-guide/features/lsp.md page covering supported languages, configuration knobs, performance characteristics, and troubleshooting; cli-commands.md updated with the 'hermes lsp' reference; sidebar updated. * feat(lsp): structured logging, backend gate, defensive walk caps Cherry-picks the substantive ideas from #24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call. * feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field Two improvements salvaged from #24414's plugin-form alternative, keeping our core-integrated design: 1. atexit cleanup of spawned language servers ---------------------------------------------------------------- ``agent/lsp/__init__.get_service`` now registers an ``atexit`` handler on first creation that tears down the LSPService on Python exit. Without this, every ``hermes chat`` exit was leaking pyright/gopls/etc. processes for a few seconds while their stdout buffers drained -- they got reaped by the kernel eventually but a watchful ``ps aux`` would catch them. The handler runs once per process (gated by ``_atexit_registered``); idempotent ``shutdown_service`` ensures double-fire is a no-op. Errors during shutdown are swallowed at debug level since by the time atexit fires the user has already seen the agent's final response. 2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult ---------------------------------------------------------------- Previously the LSP layer folded its diagnostic block into the ``lint.output`` string, conflating the syntax-check tier with the semantic tier. The agent (and any downstream parsers) now read syntax errors and semantic errors as independent signals: { "bytes_written": 42, "lint": {"status": "ok", "output": ""}, "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..." } ``_check_lint_delta`` returns to its original two-tier shape (syntax check + delta filter); ``write_file`` and ``patch_replace`` independently fetch LSP diagnostics via ``_maybe_lsp_diagnostics`` and pass them into the new field. ``patch_replace`` propagates the inner write_file's ``lsp_diagnostics`` so the outer PatchResult carries the patch's delta correctly. Tests: 19 new - tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration fires once and only once across N get_service calls; the registered callable is our internal shutdown wrapper; shutdown_service is idempotent and safe when never started; exceptions during shutdown are swallowed; inactive service is cached so we don't rebuild on every check. - tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult / PatchResult dataclass shape, to_dict include/omit semantics, channel separation (lint and lsp_diagnostics carry independent signals), write_file populates the field via _maybe_lsp_diagnostics only when the syntax tier is clean, patch_replace propagates the field forward from its internal write_file. Validation: - 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field) - 217/217 pass with file_operations + LSP combined - Live E2E reverified: clean writes -> both fields empty/none; type error introduced -> lint clean (parses), lsp_diagnostics carries the pyright reportReturnType block; patch fix -> both fields clean again. * fix(lsp): broken-set short-circuit so a wedged server isn't paid every write Discovered while auditing failure paths: a language server binary that hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY subsequent write to re-pay the 8s snapshot_baseline timeout. Five writes = ~64s of dead time. The bug: ``_get_or_spawn`` adds the (server_id, root) pair to ``_broken`` inside its inner exception handler, but when the OUTER ``_loop.run`` timeout fires, it cancels the inner task before that handler runs. The pair never makes it to broken-set, so the next write re-enters the spawn path and re-pays the timeout. Fix: - New ``_mark_broken_for_file`` helper at the service layer marks the (server_id, workspace_root) pair broken from the OUTSIDE when the outer timeout fires. Called from the except branches in ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError + generic Exception). Also kills any orphan client process that survived the cancelled future, fire-and-forget with a 1s ceiling. - ``enabled_for`` now consults the broken-set BEFORE returning True. Files in already-broken (server_id, root) pairs short-circuit to False, so the file_operations layer skips the LSP path entirely with no spawn cost. Until the service is restarted (``hermes lsp restart``) or the process exits. - A single eventlog WARNING is emitted on first mark-broken so the user knows which server gave up. Subsequent edits in the same project stay silent. Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the key shape (server_id, per_server_root), enabled_for short-circuit, sibling-file skip in same project, project isolation (broken in A doesn't affect B), graceful no-op for missing-server / no-workspace, and an end-to-end test that snapshots after a failure and verifies the next ``enabled_for`` returns False. Validation: - Live retest of the wedged-binary scenario: 5 sequential writes, first 8.88s (the one snapshot timeout), subsequent four ~0.84s (no LSP cost). Down from 5x12.85s = 64s before this fix. - 99/99 LSP tests pass (92 prior + 7 broken-set) - 224/224 pass with file_operations + LSP combined - Happy path E2E reverified — clean write, type error introduced, patch fix all behave correctly with the new broken-set logic. Note: the FIRST write to a wedged binary still pays 8s (the snapshot_baseline timeout). We could shorten that, but pyright/ tsserver normally take 2-3s and slow CI rust-analyzer can need 5+ seconds, so 8s is the conservative ceiling. Subsequent writes are instant.
2026-05-18 04:41:56 +00:00 · 2026-05-12 16:31:54 -07:00 · 2026-05-12 16:31:54 -07:00 · 83b93898c2
commit 83b93898c2
parent d89553c2d6
28 changed files with 6144 additions and 17 deletions
--- a/agent/lsp/init.py
+++ b/agent/lsp/init.py
@ -0,0 +1,106 @@
+"""Language Server Protocol (LSP) integration for Hermes Agent.
+
+Hermes runs full language servers (pyright, gopls, rust-analyzer,
+typescript-language-server, etc.) as subprocesses and pipes their
+``textDocument/publishDiagnostics`` output into the post-write lint
+delta filter used by ``write_file`` and ``patch``.
+
+LSP is **gated on git workspace detection** — if the agent's cwd is
+inside a git repository, LSP runs against that workspace; otherwise the
+file_operations layer falls back to its existing in-process syntax
+checks.  This keeps users on user-home cwd's (e.g. Telegram gateway
+chats) from spawning daemons they don't need.
+
+Public API:
+
+    from agent.lsp import get_service
+
+    svc = get_service()
+    if svc and svc.enabled_for(path):
+        await svc.touch_file(path)
+        diags = svc.diagnostics_for(path)
+
+The bulk of the wiring is internal — most callers only need the layer
+in :func:`tools.file_operations.FileOperations._check_lint_delta`,
+which is already wired (see that module).
+
+Architecture is documented in ``website/docs/user-guide/features/lsp.md``.
+"""
+from __future__ import annotations
+
+import atexit
+import logging
+import threading
+from typing import Optional
+
+from agent.lsp.manager import LSPService
+
+logger = logging.getLogger("agent.lsp")
+
+_service: Optional[LSPService] = None
+_atexit_registered = False
+_service_lock = threading.Lock()
+
+
+def get_service() -> Optional[LSPService]:
+    """Return the process-wide LSP service singleton, or None when disabled.
+
+    The service is created lazily on first call.  ``None`` is returned
+    when LSP is disabled in config, when no workspace can be detected,
+    or when the platform doesn't support subprocess-based LSP servers.
+
+    On first creation, registers an :mod:`atexit` handler that tears
+    down spawned language servers on Python exit so a long-running
+    CLI or gateway session doesn't leak pyright/gopls/etc. processes
+    when it terminates.
+    """
+    global _service, _atexit_registered
+    if _service is not None:
+        return _service if _service.is_active() else None
+    with _service_lock:
+        if _service is not None:
+            return _service if _service.is_active() else None
+        _service = LSPService.create_from_config()
+        if not _atexit_registered:
+            # ``atexit`` handlers run in LIFO order on normal Python
+            # exit and on SystemExit, but NOT on os._exit() or
+            # uncaught signals.  Language servers are stateless
+            # subprocesses — losing them on SIGKILL is fine; they'll
+            # be reaped by the kernel along with their parent.  We
+            # care about clean exits where Python flushes stdio
+            # before terminating; without this hook every
+            # ``hermes chat`` exit would leak pyright processes that
+            # outlive the parent for a few seconds while their
+            # stdout buffers drain.
+            atexit.register(_atexit_shutdown)
+            _atexit_registered = True
+    return _service if (_service is not None and _service.is_active()) else None
+
+
+def shutdown_service() -> None:
+    """Tear down the LSP service if one was started.
+
+    Safe to call multiple times; safe to call when no service was created.
+    """
+    global _service
+    with _service_lock:
+        svc = _service
+        _service = None
+    if svc is not None:
+        try:
+            svc.shutdown()
+        except Exception as e:  # noqa: BLE001
+            logger.debug("LSP shutdown error: %s", e)
+
+
+def _atexit_shutdown() -> None:
+    """atexit-registered wrapper.  Logs at debug because by the time
+    atexit fires the user has already seen the agent's final output —
+    a noisy shutdown line on top of that is just clutter."""
+    try:
+        shutdown_service()
+    except Exception as e:  # noqa: BLE001
+        logger.debug("atexit LSP shutdown failed: %s", e)
+
+
+__all__ = ["get_service", "shutdown_service", "LSPService"]
--- a/agent/lsp/cli.py
+++ b/agent/lsp/cli.py
@ -0,0 +1,270 @@
+"""``hermes lsp`` CLI subcommand.
+
+Subcommands:
+
+- ``status`` — show service state, configured servers, install status.
+- ``install <server_id>`` — eagerly install one server's binary.
+- ``install-all`` — try to install every server with a known recipe.
+- ``restart`` — tear down running clients so the next edit re-spawns.
+- ``which <server_id>`` — print the resolved binary path for one server.
+- ``list`` — print the registry of supported servers.
+
+The handlers are kept here (rather than in
+``hermes_cli/main.py``) so the LSP module ships self-contained.
+"""
+from __future__ import annotations
+
+import argparse
+import sys
+from typing import Optional
+
+
+def register_subparser(subparsers: argparse._SubParsersAction) -> None:
+    """Wire the ``hermes lsp`` subcommand tree into the main argparse."""
+    parser = subparsers.add_parser(
+        "lsp",
+        help="Language Server Protocol management",
+        description=(
+            "Manage the LSP layer that powers post-write semantic "
+            "diagnostics in write_file/patch."
+        ),
+    )
+    sub = parser.add_subparsers(dest="lsp_command")
+
+    sub_status = sub.add_parser("status", help="Show LSP service status")
+    sub_status.add_argument(
+        "--json", action="store_true", help="Emit machine-readable JSON"
+    )
+
+    sub_list = sub.add_parser("list", help="List supported language servers")
+    sub_list.add_argument(
+        "--installed-only",
+        action="store_true",
+        help="Only show servers whose binary is currently available",
+    )
+
+    sub_install = sub.add_parser("install", help="Install a server binary")
+    sub_install.add_argument("server", help="Server id (e.g. pyright, gopls)")
+
+    sub_install_all = sub.add_parser(
+        "install-all",
+        help="Install every server with a known auto-install recipe",
+    )
+    sub_install_all.add_argument(
+        "--include-manual",
+        action="store_true",
+        help="Even attempt servers marked manual-install (best effort)",
+    )
+
+    sub_restart = sub.add_parser(
+        "restart",
+        help="Tear down running LSP clients (next edit re-spawns)",
+    )
+
+    sub_which = sub.add_parser("which", help="Print binary path for a server")
+    sub_which.add_argument("server", help="Server id")
+
+    parser.set_defaults(func=run_lsp_command)
+
+
+def run_lsp_command(args: argparse.Namespace) -> int:
+    """Top-level dispatcher for ``hermes lsp <subcommand>``."""
+    sub = getattr(args, "lsp_command", None) or "status"
+    try:
+        if sub == "status":
+            return _cmd_status(getattr(args, "json", False))
+        if sub == "list":
+            return _cmd_list(getattr(args, "installed_only", False))
+        if sub == "install":
+            return _cmd_install(args.server)
+        if sub == "install-all":
+            return _cmd_install_all(getattr(args, "include_manual", False))
+        if sub == "restart":
+            return _cmd_restart()
+        if sub == "which":
+            return _cmd_which(args.server)
+        sys.stderr.write(f"unknown lsp subcommand: {sub}\n")
+        return 2
+    except KeyboardInterrupt:
+        return 130
+
+
+def _cmd_status(emit_json: bool) -> int:
+    from agent.lsp import get_service
+    from agent.lsp.servers import SERVERS
+    from agent.lsp.install import detect_status
+
+    svc = get_service()
+    service_active = svc is not None
+    info = svc.get_status() if svc is not None else {"enabled": False}
+
+    if emit_json:
+        import json
+        payload = {
+            "service": info,
+            "registry": [
+                {
+                    "server_id": s.server_id,
+                    "extensions": list(s.extensions),
+                    "description": s.description,
+                    "binary_status": detect_status(_recipe_pkg_for(s.server_id)),
+                }
+                for s in SERVERS
+            ],
+        }
+        sys.stdout.write(json.dumps(payload, indent=2) + "\n")
+        return 0
+
+    out = []
+    out.append("LSP Service")
+    out.append("===========")
+    out.append(f"  enabled:         {info.get('enabled', False)}")
+    if service_active:
+        out.append(f"  wait_mode:       {info.get('wait_mode')}")
+        out.append(f"  wait_timeout:    {info.get('wait_timeout')}s")
+        out.append(f"  install_strategy:{info.get('install_strategy')}")
+        clients = info.get("clients") or []
+        if clients:
+            out.append(f"  active clients:  {len(clients)}")
+            for c in clients:
+                out.append(
+                    f"    - {c['server_id']:20s} state={c['state']:10s} root={c['workspace_root']}"
+                )
+        else:
+            out.append("  active clients:  none")
+        broken = info.get("broken") or []
+        if broken:
+            out.append(f"  broken pairs:    {len(broken)}")
+            for b in broken:
+                out.append(f"    - {b}")
+        disabled = info.get("disabled_servers") or []
+        if disabled:
+            out.append(f"  disabled in cfg: {', '.join(disabled)}")
+    out.append("")
+    out.append("Registered Servers")
+    out.append("==================")
+    for s in SERVERS:
+        pkg = _recipe_pkg_for(s.server_id)
+        status = detect_status(pkg)
+        marker = {
+            "installed": "✓",
+            "missing": "·",
+            "manual-only": "?",
+        }.get(status, " ")
+        ext_summary = ", ".join(list(s.extensions)[:5])
+        if len(s.extensions) > 5:
+            ext_summary += f", … (+{len(s.extensions) - 5})"
+        out.append(
+            f"  {marker} {s.server_id:24s} [{status:11s}] {ext_summary}"
+        )
+        if s.description:
+            out.append(f"      {s.description}")
+    sys.stdout.write("\n".join(out) + "\n")
+    return 0
+
+
+def _cmd_list(installed_only: bool) -> int:
+    from agent.lsp.servers import SERVERS
+    from agent.lsp.install import detect_status
+
+    for s in SERVERS:
+        pkg = _recipe_pkg_for(s.server_id)
+        status = detect_status(pkg)
+        if installed_only and status != "installed":
+            continue
+        sys.stdout.write(
+            f"{s.server_id:24s} [{status:11s}] {','.join(s.extensions)}\n"
+        )
+    return 0
+
+
+def _cmd_install(server_id: str) -> int:
+    from agent.lsp.install import try_install, INSTALL_RECIPES, detect_status
+    pkg = _recipe_pkg_for(server_id)
+    pre_status = detect_status(pkg)
+    if pre_status == "installed":
+        sys.stdout.write(f"{server_id} already installed\n")
+        return 0
+    sys.stdout.write(f"installing {server_id} (pkg={pkg}) ...\n")
+    sys.stdout.flush()
+    bin_path = try_install(pkg, "auto")
+    if bin_path is None:
+        recipe = INSTALL_RECIPES.get(pkg)
+        if recipe and recipe.get("strategy") == "manual":
+            sys.stderr.write(
+                f"{server_id}: this server requires a manual install. "
+                f"See documentation.\n"
+            )
+        else:
+            sys.stderr.write(f"{server_id}: install failed (see logs).\n")
+        return 1
+    sys.stdout.write(f"installed: {bin_path}\n")
+    return 0
+
+
+def _cmd_install_all(include_manual: bool) -> int:
+    from agent.lsp.servers import SERVERS
+    from agent.lsp.install import try_install, INSTALL_RECIPES, detect_status
+
+    rc = 0
+    for s in SERVERS:
+        pkg = _recipe_pkg_for(s.server_id)
+        recipe = INSTALL_RECIPES.get(pkg)
+        if recipe is None:
+            continue
+        if recipe.get("strategy") == "manual" and not include_manual:
+            continue
+        if detect_status(pkg) == "installed":
+            sys.stdout.write(f"  {s.server_id:24s} already installed\n")
+            continue
+        sys.stdout.write(f"  installing {s.server_id} (pkg={pkg}) ... ")
+        sys.stdout.flush()
+        path = try_install(pkg, "auto")
+        if path:
+            sys.stdout.write(f"ok ({path})\n")
+        else:
+            sys.stdout.write("FAILED\n")
+            rc = 1
+    return rc
+
+
+def _cmd_restart() -> int:
+    from agent.lsp import shutdown_service
+
+    shutdown_service()
+    sys.stdout.write("LSP service shut down. Next edit will respawn clients.\n")
+    return 0
+
+
+def _cmd_which(server_id: str) -> int:
+    from agent.lsp.install import INSTALL_RECIPES, hermes_lsp_bin_dir
+    import os
+    import shutil as _shutil
+
+    recipe = INSTALL_RECIPES.get(server_id)
+    bin_name = (recipe or {}).get("bin", server_id)
+    staged = hermes_lsp_bin_dir() / bin_name
+    if staged.exists():
+        sys.stdout.write(str(staged) + "\n")
+        return 0
+    on_path = _shutil.which(bin_name)
+    if on_path:
+        sys.stdout.write(on_path + "\n")
+        return 0
+    sys.stderr.write(f"{server_id}: not installed\n")
+    return 1
+
+
+def _recipe_pkg_for(server_id: str) -> str:
+    """Map a registry ``server_id`` to its install-recipe package key."""
+    # The mapping lives here (not in install.py) because it's a CLI
+    # convenience layer.  Most server_ids are also their own recipe
+    # key, but a few differ (e.g. ``vue-language-server`` →
+    # ``@vue/language-server``).
+    aliases = {
+        "vue-language-server": "@vue/language-server",
+        "astro-language-server": "@astrojs/language-server",
+        "dockerfile-ls": "dockerfile-language-server-nodejs",
+        "typescript": "typescript-language-server",
+    }
+    return aliases.get(server_id, server_id)
--- a/agent/lsp/client.py
+++ b/agent/lsp/client.py
@ -0,0 +1,930 @@
+"""Async LSP client over stdin/stdout.
+
+One :class:`LSPClient` corresponds to one ``(language_server, workspace_root)``
+pair — exactly what OpenCode keys clients on, and the same shape Claude
+Code uses.  The client owns a child process, drives the JSON-RPC
+exchange, and exposes:
+
+- :meth:`open_file` / :meth:`change_file` — text document sync
+- :meth:`wait_for_diagnostics` — block until the server emits fresh
+  diagnostics for a specific file (or a timeout fires)
+- :meth:`diagnostics_for` — read the current per-file diagnostic store
+- :meth:`shutdown` — graceful close + SIGTERM/SIGKILL fallback
+
+The class is designed for async use from a single asyncio event loop.
+The :class:`agent.lsp.manager.LSPService` runs an event loop in a
+background thread so the synchronous file_operations layer can call
+into it via :func:`agent.lsp.manager.LSPService.touch_file`.
+
+Implementation notes:
+
+- Push diagnostics are stored per-URI in :attr:`_push_diagnostics` from
+  ``textDocument/publishDiagnostics`` notifications.  Pull diagnostics
+  go in :attr:`_pull_diagnostics`.  The merged view dedupes by content.
+
+- Whole-document sync.  Even when the server advertises incremental
+  sync, we send a single ``contentChanges`` entry replacing the
+  entire document.  Pretending to be incremental while sending a
+  full replacement is well-tolerated by every major server and saves
+  range bookkeeping.  See OpenCode's ``client.ts:584-659`` for the
+  same trick.
+
+- The "touch-file dance": every ``open_file`` call also fires a
+  ``workspace/didChangeWatchedFiles`` notification (CREATED on the
+  first open, CHANGED thereafter).  Some servers (clangd, eslint)
+  only re-scan when this notification fires, even though the LSP spec
+  doesn't strictly require it.
+
+- ``ContentModified`` (-32801) errors get retried with exponential
+  backoff up to 3 times.  This matches Claude Code's
+  ``LSPServerInstance.sendRequest``.
+"""
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+from pathlib import Path
+from typing import Any, Awaitable, Callable, Dict, List, Optional, Set
+from urllib.parse import quote, unquote
+
+from agent.lsp.protocol import (
+    ERROR_CONTENT_MODIFIED,
+    ERROR_METHOD_NOT_FOUND,
+    LSPProtocolError,
+    LSPRequestError,
+    classify_message,
+    encode_message,
+    make_error_response,
+    make_notification,
+    make_request,
+    make_response,
+    read_message,
+)
+
+logger = logging.getLogger("agent.lsp.client")
+
+# Timeouts (seconds) — mirror OpenCode's constants, scaled to seconds.
+INITIALIZE_TIMEOUT = 45.0
+DIAGNOSTICS_DOCUMENT_WAIT = 5.0
+DIAGNOSTICS_FULL_WAIT = 10.0
+DIAGNOSTICS_REQUEST_TIMEOUT = 3.0
+PUSH_DEBOUNCE = 0.15
+SHUTDOWN_GRACE = 1.0  # seconds between SIGTERM and SIGKILL
+
+# Retry policy for transient ContentModified errors.
+MAX_CONTENT_MODIFIED_RETRIES = 3
+RETRY_BASE_DELAY = 0.5  # 0.5, 1.0, 2.0 — exponential
+
+
+def file_uri(path: str) -> str:
+    """Return ``file://`` URI for an absolute filesystem path.
+
+    Mirrors Node's ``pathToFileURL`` — handles spaces, unicode, and
+    Windows drive letters (``C:\\foo`` → ``file:///C:/foo``).
+    """
+    abs_path = os.path.abspath(path)
+    if os.name == "nt":
+        # Windows: backslash → forward slash, prepend extra slash so
+        # the drive letter shows up as part of the path component.
+        abs_path = abs_path.replace("\\", "/")
+        if not abs_path.startswith("/"):
+            abs_path = "/" + abs_path
+    return "file://" + quote(abs_path, safe="/:")
+
+
+def uri_to_path(uri: str) -> str:
+    """Inverse of :func:`file_uri`."""
+    if not uri.startswith("file://"):
+        return uri
+    raw = uri[len("file://"):]
+    if os.name == "nt" and raw.startswith("/") and len(raw) > 2 and raw[2] == ":":
+        raw = raw[1:]  # strip leading slash before drive letter
+    return os.path.normpath(unquote(raw))
+
+
+def _end_position(text: str) -> Dict[str, int]:
+    """Return the LSP Position at the end of ``text``.
+
+    Used to construct a single-range "replace whole document" change
+    for ``textDocument/didChange`` regardless of the server's declared
+    sync mode.
+    """
+    if not text:
+        return {"line": 0, "character": 0}
+    lines = text.splitlines(keepends=False)
+    last_line = len(lines) - 1
+    last_col = len(lines[-1]) if lines else 0
+    # If the text ends with a trailing newline, ``splitlines`` won't
+    # represent it.  The end position is then the start of the next
+    # (empty) line — line index is len(lines), column 0.
+    if text.endswith(("\n", "\r")):
+        return {"line": last_line + 1, "character": 0}
+    return {"line": last_line, "character": last_col}
+
+
+class LSPClient:
+    """Async LSP client tied to one server process and one workspace root.
+
+    Lifecycle:
+
+        c = LSPClient(server_id, workspace_root, command, args, init_options)
+        await c.start()       # spawn + initialize
+        ver = await c.open_file("/path/to/foo.py")
+        await c.wait_for_diagnostics("/path/to/foo.py", ver)
+        diags = c.diagnostics_for("/path/to/foo.py")
+        await c.shutdown()
+    """
+
+    # ------------------------------------------------------------------
+    # construction + lifecycle
+    # ------------------------------------------------------------------
+
+    def __init__(
+        self,
+        *,
+        server_id: str,
+        workspace_root: str,
+        command: List[str],
+        env: Optional[Dict[str, str]] = None,
+        cwd: Optional[str] = None,
+        initialization_options: Optional[Dict[str, Any]] = None,
+        seed_diagnostics_on_first_push: bool = False,
+    ) -> None:
+        self.server_id = server_id
+        self.workspace_root = workspace_root
+        self._command = list(command)
+        self._env = env
+        self._cwd = cwd or workspace_root
+        self._init_options = initialization_options or {}
+        self._seed_first_push = seed_diagnostics_on_first_push
+
+        # Process + streams
+        self._proc: Optional[asyncio.subprocess.Process] = None
+        self._stderr_task: Optional[asyncio.Task] = None
+        self._reader_task: Optional[asyncio.Task] = None
+
+        # Request/response correlation
+        self._next_id: int = 0
+        self._pending: Dict[int, asyncio.Future] = {}
+
+        # Server-side request handlers (server → client requests).
+        # Kept small and explicit; everything else returns method-not-found.
+        self._request_handlers: Dict[str, Callable[[Any], Awaitable[Any]]] = {
+            "window/workDoneProgress/create": self._handle_work_done_create,
+            "workspace/configuration": self._handle_workspace_configuration,
+            "client/registerCapability": self._handle_register_capability,
+            "client/unregisterCapability": self._handle_unregister_capability,
+            "workspace/workspaceFolders": self._handle_workspace_folders,
+            "workspace/diagnostic/refresh": self._handle_diagnostic_refresh,
+        }
+        # Notifications (server → client) we care about.
+        self._notification_handlers: Dict[str, Callable[[Any], None]] = {
+            "textDocument/publishDiagnostics": self._handle_publish_diagnostics,
+            # Everything else (window/showMessage, $/progress, etc.)
+            # is silently dropped by default.
+        }
+
+        # Tracked file state — required for didChange version bumps.
+        self._files: Dict[str, Dict[str, Any]] = {}
+        # Diagnostic stores, keyed by file path (NOT URI).
+        self._push_diagnostics: Dict[str, List[Dict[str, Any]]] = {}
+        self._pull_diagnostics: Dict[str, List[Dict[str, Any]]] = {}
+        # Per-path "last published" time so wait-for-fresh logic works.
+        self._published: Dict[str, float] = {}
+        # Per-path version of the latest push (matches our didChange
+        # version when the server respects it).
+        self._published_version: Dict[str, int] = {}
+        # First-push seen flag, for typescript-style seed-on-first-push.
+        self._first_push_seen: Set[str] = set()
+        # Capability registrations — only diagnostic ones are tracked.
+        self._diagnostic_registrations: Dict[str, Dict[str, Any]] = {}
+
+        # State machine
+        self._state: str = "stopped"
+        self._initialize_result: Optional[Dict[str, Any]] = None
+        self._sync_kind: int = 1  # 1=Full, 2=Incremental
+        self._stopping: bool = False
+
+        # Push event for waiters.
+        self._push_event = asyncio.Event()
+        # Monotonic counter incremented on every publishDiagnostics push.
+        # Waiters snapshot it on entry and treat any increase as
+        # "something happened, recheck the predicate".  Avoids the
+        # asyncio.Event sticky-state trap.
+        self._push_counter = 0
+        # Registration change event so wait_for_diagnostics can re-loop
+        # when the server announces a new dynamic provider.
+        self._registration_event = asyncio.Event()
+
+    @property
+    def is_running(self) -> bool:
+        return self._state == "running" and self._proc is not None and self._proc.returncode is None
+
+    @property
+    def state(self) -> str:
+        return self._state
+
+    async def start(self) -> None:
+        """Spawn the server and complete the initialize handshake.
+
+        Raises any exception encountered during spawn/init.  On failure
+        the process is killed and the client is left in state
+        ``"error"`` — re-call ``start()`` to retry.
+        """
+        if self._state in ("running", "starting"):
+            return
+        self._state = "starting"
+        try:
+            await self._spawn()
+            await self._initialize()
+            self._state = "running"
+        except Exception:
+            self._state = "error"
+            await self._cleanup_process()
+            raise
+
+    async def _spawn(self) -> None:
+        env = dict(os.environ)
+        if self._env:
+            env.update(self._env)
+
+        try:
+            self._proc = await asyncio.create_subprocess_exec(
+                self._command[0],
+                *self._command[1:],
+                stdin=asyncio.subprocess.PIPE,
+                stdout=asyncio.subprocess.PIPE,
+                stderr=asyncio.subprocess.PIPE,
+                env=env,
+                cwd=self._cwd,
+            )
+        except FileNotFoundError as e:
+            raise LSPProtocolError(
+                f"LSP server binary not found: {self._command[0]} ({e})"
+            ) from e
+
+        # Drain stderr at debug level — if we don't, the pipe buffer
+        # fills and the server hangs.
+        self._stderr_task = asyncio.create_task(self._drain_stderr())
+        # Start the reader loop.
+        self._reader_task = asyncio.create_task(self._reader_loop())
+
+    async def _drain_stderr(self) -> None:
+        if self._proc is None or self._proc.stderr is None:
+            return
+        try:
+            while True:
+                line = await self._proc.stderr.readline()
+                if not line:
+                    break
+                text = line.decode("utf-8", errors="replace").rstrip()
+                if text:
+                    logger.debug("[%s] stderr: %s", self.server_id, text[:1000])
+        except (asyncio.CancelledError, OSError):
+            pass
+
+    async def _reader_loop(self) -> None:
+        if self._proc is None or self._proc.stdout is None:
+            return
+        try:
+            while True:
+                msg = await read_message(self._proc.stdout)
+                if msg is None:
+                    logger.debug("[%s] server closed stdout cleanly", self.server_id)
+                    break
+                kind, key = classify_message(msg)
+                if kind == "response":
+                    self._dispatch_response(key, msg)
+                elif kind == "request":
+                    asyncio.create_task(self._dispatch_request(key, msg))
+                elif kind == "notification":
+                    self._dispatch_notification(key, msg)
+                else:
+                    logger.warning("[%s] dropping invalid message: %r", self.server_id, msg)
+        except LSPProtocolError as e:
+            logger.warning("[%s] protocol error in reader loop: %s", self.server_id, e)
+        except (asyncio.CancelledError, OSError):
+            pass
+        finally:
+            # Wake up any pending requests so they can fail fast.
+            for fut in list(self._pending.values()):
+                if not fut.done():
+                    fut.set_exception(LSPProtocolError("server connection closed"))
+            self._pending.clear()
+
+    async def _initialize(self) -> None:
+        params = {
+            "rootUri": file_uri(self.workspace_root),
+            "rootPath": self.workspace_root,
+            "processId": os.getpid(),
+            "workspaceFolders": [
+                {"name": "workspace", "uri": file_uri(self.workspace_root)}
+            ],
+            "initializationOptions": self._init_options,
+            "capabilities": {
+                "window": {"workDoneProgress": True},
+                "workspace": {
+                    "configuration": True,
+                    "workspaceFolders": True,
+                    "didChangeWatchedFiles": {"dynamicRegistration": True},
+                    "diagnostics": {"refreshSupport": False},
+                },
+                "textDocument": {
+                    "synchronization": {
+                        "dynamicRegistration": False,
+                        "didOpen": True,
+                        "didChange": True,
+                        "didSave": True,
+                        "willSave": False,
+                        "willSaveWaitUntil": False,
+                    },
+                    "diagnostic": {
+                        "dynamicRegistration": True,
+                        "relatedDocumentSupport": True,
+                    },
+                    "publishDiagnostics": {
+                        "relatedInformation": True,
+                        "tagSupport": {"valueSet": [1, 2]},
+                        "versionSupport": True,
+                        "codeDescriptionSupport": True,
+                        "dataSupport": False,
+                    },
+                    "hover": {"contentFormat": ["markdown", "plaintext"]},
+                    "definition": {"linkSupport": True},
+                    "references": {},
+                    "documentSymbol": {"hierarchicalDocumentSymbolSupport": True},
+                },
+                "general": {"positionEncodings": ["utf-16"]},
+            },
+        }
+
+        result = await asyncio.wait_for(
+            self._send_request("initialize", params),
+            timeout=INITIALIZE_TIMEOUT,
+        )
+        self._initialize_result = result
+        self._sync_kind = self._extract_sync_kind(result.get("capabilities") or {})
+
+        await self._send_notification("initialized", {})
+        if self._init_options:
+            # Some servers (vtsls, eslint) want config pushed via
+            # didChangeConfiguration even if it was sent in
+            # initializationOptions.
+            await self._send_notification(
+                "workspace/didChangeConfiguration",
+                {"settings": self._init_options},
+            )
+
+    @staticmethod
+    def _extract_sync_kind(capabilities: dict) -> int:
+        sync = capabilities.get("textDocumentSync")
+        if isinstance(sync, int):
+            return sync
+        if isinstance(sync, dict):
+            change = sync.get("change")
+            if isinstance(change, int):
+                return change
+        return 1  # default to Full
+
+    async def shutdown(self) -> None:
+        """Best-effort graceful shutdown.
+
+        Sends ``shutdown`` + ``exit``, then SIGTERMs/SIGKILLs the
+        process if it doesn't exit cleanly.  Idempotent.
+        """
+        if self._stopping:
+            return
+        self._stopping = True
+        try:
+            if self.is_running:
+                try:
+                    await asyncio.wait_for(self._send_request("shutdown", None), timeout=2.0)
+                except (asyncio.TimeoutError, LSPRequestError, LSPProtocolError):
+                    pass
+                try:
+                    await self._send_notification("exit", None)
+                except Exception:
+                    pass
+        finally:
+            self._state = "stopped"
+            await self._cleanup_process()
+
+    async def _cleanup_process(self) -> None:
+        if self._reader_task is not None and not self._reader_task.done():
+            self._reader_task.cancel()
+            try:
+                await self._reader_task
+            except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                pass
+        if self._stderr_task is not None and not self._stderr_task.done():
+            self._stderr_task.cancel()
+            try:
+                await self._stderr_task
+            except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                pass
+        proc = self._proc
+        self._proc = None
+        if proc is None:
+            return
+        if proc.returncode is None:
+            try:
+                proc.terminate()
+                try:
+                    await asyncio.wait_for(proc.wait(), timeout=SHUTDOWN_GRACE)
+                except asyncio.TimeoutError:
+                    try:
+                        proc.kill()
+                        await proc.wait()
+                    except ProcessLookupError:
+                        pass
+            except ProcessLookupError:
+                pass
+
+    # ------------------------------------------------------------------
+    # request / notification plumbing
+    # ------------------------------------------------------------------
+
+    async def _send_request(self, method: str, params: Any) -> Any:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            raise LSPProtocolError(f"cannot send {method!r}: stdin closed")
+        loop = asyncio.get_running_loop()
+        req_id = self._next_id
+        self._next_id += 1
+        fut: asyncio.Future = loop.create_future()
+        self._pending[req_id] = fut
+        try:
+            self._proc.stdin.write(encode_message(make_request(req_id, method, params)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError) as e:
+            self._pending.pop(req_id, None)
+            raise LSPProtocolError(f"send failed for {method!r}: {e}") from e
+        try:
+            return await fut
+        finally:
+            self._pending.pop(req_id, None)
+
+    async def _send_request_with_retry(self, method: str, params: Any, *, timeout: float) -> Any:
+        """Send a request, retrying on ``ContentModified`` (-32801).
+
+        Other errors propagate.  The retry policy matches Claude Code's
+        ``LSPServerInstance.sendRequest`` — 3 attempts with delays
+        0.5s, 1.0s, 2.0s.
+        """
+        for attempt in range(MAX_CONTENT_MODIFIED_RETRIES + 1):
+            try:
+                return await asyncio.wait_for(self._send_request(method, params), timeout=timeout)
+            except LSPRequestError as e:
+                if e.code == ERROR_CONTENT_MODIFIED and attempt < MAX_CONTENT_MODIFIED_RETRIES:
+                    await asyncio.sleep(RETRY_BASE_DELAY * (2 ** attempt))
+                    continue
+                raise
+
+    async def _send_notification(self, method: str, params: Any) -> None:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            return
+        try:
+            self._proc.stdin.write(encode_message(make_notification(method, params)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError) as e:
+            logger.debug("[%s] notify %s failed: %s", self.server_id, method, e)
+
+    async def _send_response(self, req_id: Any, result: Any) -> None:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            return
+        try:
+            self._proc.stdin.write(encode_message(make_response(req_id, result)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError):
+            pass
+
+    async def _send_error_response(self, req_id: Any, code: int, message: str) -> None:
+        if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
+            return
+        try:
+            self._proc.stdin.write(encode_message(make_error_response(req_id, code, message)))
+            await self._proc.stdin.drain()
+        except (BrokenPipeError, ConnectionResetError, OSError):
+            pass
+
+    def _dispatch_response(self, req_id: int, msg: dict) -> None:
+        fut = self._pending.get(req_id)
+        if fut is None or fut.done():
+            return
+        if "error" in msg:
+            err = msg["error"] or {}
+            fut.set_exception(
+                LSPRequestError(
+                    code=int(err.get("code", -32000)),
+                    message=str(err.get("message", "unknown")),
+                    data=err.get("data"),
+                )
+            )
+        else:
+            fut.set_result(msg.get("result"))
+
+    async def _dispatch_request(self, req_id: Any, msg: dict) -> None:
+        method = msg.get("method", "")
+        params = msg.get("params")
+        handler = self._request_handlers.get(method)
+        if handler is None:
+            await self._send_error_response(req_id, ERROR_METHOD_NOT_FOUND, f"method not found: {method}")
+            return
+        try:
+            result = await handler(params)
+        except Exception as e:  # noqa: BLE001 — protocol must not blow up
+            logger.warning("[%s] request handler %s failed: %s", self.server_id, method, e)
+            await self._send_error_response(req_id, -32000, f"handler failed: {e}")
+            return
+        await self._send_response(req_id, result)
+
+    def _dispatch_notification(self, method: str, msg: dict) -> None:
+        handler = self._notification_handlers.get(method)
+        if handler is None:
+            return
+        try:
+            handler(msg.get("params"))
+        except Exception as e:  # noqa: BLE001
+            logger.debug("[%s] notification handler %s failed: %s", self.server_id, method, e)
+
+    # ------------------------------------------------------------------
+    # built-in server-→-client request handlers
+    # ------------------------------------------------------------------
+
+    async def _handle_work_done_create(self, params: Any) -> Any:
+        # Acknowledge progress tokens — required by some servers.
+        return None
+
+    async def _handle_workspace_configuration(self, params: Any) -> Any:
+        # Walk dotted sections through initializationOptions.  Mirrors
+        # OpenCode's `client.ts:198-220` — return null when missing.
+        if not isinstance(params, dict):
+            return [None]
+        items = params.get("items") or []
+        out: List[Any] = []
+        for item in items:
+            if not isinstance(item, dict):
+                out.append(None)
+                continue
+            section = item.get("section")
+            if not section or not self._init_options:
+                out.append(self._init_options or None)
+                continue
+            cur: Any = self._init_options
+            for part in str(section).split("."):
+                if isinstance(cur, dict) and part in cur:
+                    cur = cur[part]
+                else:
+                    cur = None
+                    break
+            out.append(cur)
+        return out
+
+    async def _handle_register_capability(self, params: Any) -> Any:
+        if not isinstance(params, dict):
+            return None
+        for reg in params.get("registrations") or []:
+            if not isinstance(reg, dict):
+                continue
+            method = reg.get("method")
+            reg_id = reg.get("id")
+            if method == "textDocument/diagnostic" and reg_id:
+                self._diagnostic_registrations[str(reg_id)] = reg
+                self._registration_event.set()
+        return None
+
+    async def _handle_unregister_capability(self, params: Any) -> Any:
+        if not isinstance(params, dict):
+            return None
+        for unreg in params.get("unregisterations") or []:
+            if not isinstance(unreg, dict):
+                continue
+            reg_id = unreg.get("id")
+            if reg_id:
+                self._diagnostic_registrations.pop(str(reg_id), None)
+        return None
+
+    async def _handle_workspace_folders(self, params: Any) -> Any:
+        return [{"name": "workspace", "uri": file_uri(self.workspace_root)}]
+
+    async def _handle_diagnostic_refresh(self, params: Any) -> Any:
+        # We don't honour refresh — we re-pull on every touchFile.
+        return None
+
+    # ------------------------------------------------------------------
+    # publishDiagnostics handler
+    # ------------------------------------------------------------------
+
+    def _handle_publish_diagnostics(self, params: Any) -> None:
+        if not isinstance(params, dict):
+            return
+        uri = params.get("uri")
+        if not isinstance(uri, str):
+            return
+        path = uri_to_path(uri)
+        diagnostics = params.get("diagnostics") or []
+        if not isinstance(diagnostics, list):
+            diagnostics = []
+        version = params.get("version")
+        loop_time = asyncio.get_event_loop().time()
+
+        if self._seed_first_push and path not in self._first_push_seen:
+            # First push: seed without firing the event so a waiter
+            # doesn't resolve on the very first push (which arrives
+            # before the user-triggered didChange could've produced
+            # fresh diagnostics).
+            self._first_push_seen.add(path)
+            self._push_diagnostics[path] = diagnostics
+            self._published[path] = loop_time
+            if isinstance(version, int):
+                self._published_version[path] = version
+            return
+
+        self._push_diagnostics[path] = diagnostics
+        self._published[path] = loop_time
+        if isinstance(version, int):
+            self._published_version[path] = version
+        self._first_push_seen.add(path)
+        # Bump the monotonic push counter and wake every waiter.  We
+        # keep the Event sticky-set so any wait already in progress
+        # resolves; waiters re-check their predicate after waking and
+        # decide whether to keep waiting.  ``_push_counter`` is what
+        # they actually compare against to detect a fresh event.
+        self._push_counter += 1
+        self._push_event.set()
+
+    # ------------------------------------------------------------------
+    # public file-sync API
+    # ------------------------------------------------------------------
+
+    async def open_file(self, path: str, *, language_id: str = "plaintext") -> int:
+        """Send didOpen (first time) or didChange (subsequent) for ``path``.
+
+        Returns the new document version number that the agent's
+        ``wait_for_diagnostics`` should match against.
+        """
+        if not self.is_running:
+            raise LSPProtocolError("client not running")
+
+        abs_path = os.path.abspath(path)
+        try:
+            text = Path(abs_path).read_text(encoding="utf-8", errors="replace")
+        except OSError as e:
+            raise LSPProtocolError(f"cannot read {abs_path}: {e}") from e
+
+        uri = file_uri(abs_path)
+        existing = self._files.get(abs_path)
+
+        if existing is not None:
+            # Re-open: bump version, fire didChangeWatchedFiles + didChange.
+            await self._send_notification(
+                "workspace/didChangeWatchedFiles",
+                {"changes": [{"uri": uri, "type": 2}]},  # 2 = CHANGED
+            )
+            new_version = existing["version"] + 1
+            old_text = existing["text"]
+            content_changes: List[Dict[str, Any]]
+            if self._sync_kind == 2:
+                content_changes = [
+                    {
+                        "range": {
+                            "start": {"line": 0, "character": 0},
+                            "end": _end_position(old_text),
+                        },
+                        "text": text,
+                    }
+                ]
+            else:
+                content_changes = [{"text": text}]
+            await self._send_notification(
+                "textDocument/didChange",
+                {
+                    "textDocument": {"uri": uri, "version": new_version},
+                    "contentChanges": content_changes,
+                },
+            )
+            self._files[abs_path] = {"version": new_version, "text": text}
+            return new_version
+
+        # First open: didChangeWatchedFiles CREATED + didOpen.
+        await self._send_notification(
+            "workspace/didChangeWatchedFiles",
+            {"changes": [{"uri": uri, "type": 1}]},  # 1 = CREATED
+        )
+        # Clear any stale push/pull entries — fresh open should start
+        # from scratch.
+        self._push_diagnostics.pop(abs_path, None)
+        self._pull_diagnostics.pop(abs_path, None)
+        self._published.pop(abs_path, None)
+        self._published_version.pop(abs_path, None)
+        await self._send_notification(
+            "textDocument/didOpen",
+            {
+                "textDocument": {
+                    "uri": uri,
+                    "languageId": language_id,
+                    "version": 0,
+                    "text": text,
+                }
+            },
+        )
+        self._files[abs_path] = {"version": 0, "text": text}
+        return 0
+
+    async def save_file(self, path: str) -> None:
+        """Send didSave for ``path``.  Some linters re-scan only on save."""
+        if not self.is_running:
+            return
+        abs_path = os.path.abspath(path)
+        await self._send_notification(
+            "textDocument/didSave",
+            {"textDocument": {"uri": file_uri(abs_path)}},
+        )
+
+    # ------------------------------------------------------------------
+    # diagnostics: pull + wait
+    # ------------------------------------------------------------------
+
+    async def _pull_document_diagnostics(self, path: str) -> None:
+        """Send ``textDocument/diagnostic`` for one file.
+
+        Stores results into :attr:`_pull_diagnostics`.  Silently
+        no-ops on errors (server may not support the pull endpoint).
+        """
+        try:
+            params: Dict[str, Any] = {
+                "textDocument": {"uri": file_uri(os.path.abspath(path))}
+            }
+            result = await self._send_request_with_retry(
+                "textDocument/diagnostic",
+                params,
+                timeout=DIAGNOSTICS_REQUEST_TIMEOUT,
+            )
+        except (LSPRequestError, LSPProtocolError, asyncio.TimeoutError) as e:
+            logger.debug("[%s] document diagnostic pull failed: %s", self.server_id, e)
+            return
+        if not isinstance(result, dict):
+            return
+        items = result.get("items")
+        if isinstance(items, list):
+            self._pull_diagnostics[os.path.abspath(path)] = items
+        related = result.get("relatedDocuments")
+        if isinstance(related, dict):
+            for uri, sub in related.items():
+                if not isinstance(sub, dict):
+                    continue
+                sub_items = sub.get("items")
+                if isinstance(sub_items, list):
+                    self._pull_diagnostics[uri_to_path(uri)] = sub_items
+
+    async def wait_for_diagnostics(
+        self,
+        path: str,
+        version: int,
+        *,
+        mode: str = "document",
+    ) -> None:
+        """Wait for the server to publish diagnostics for ``path`` at ``version``.
+
+        ``mode`` is ``"document"`` (5s budget, document pulls) or
+        ``"full"`` (10s budget, also workspace pulls).  Best-effort —
+        returns silently on timeout.  Does NOT throw if the server
+        doesn't support pull diagnostics; we still get the push side.
+        """
+        budget = DIAGNOSTICS_FULL_WAIT if mode == "full" else DIAGNOSTICS_DOCUMENT_WAIT
+        deadline = asyncio.get_event_loop().time() + budget
+        abs_path = os.path.abspath(path)
+
+        while True:
+            remaining = deadline - asyncio.get_event_loop().time()
+            if remaining <= 0:
+                return
+
+            # Concurrent: document pull + push wait.
+            pull_task = asyncio.create_task(self._pull_document_diagnostics(abs_path))
+            push_task = asyncio.create_task(self._wait_for_fresh_push(abs_path, version, remaining))
+            done, pending = await asyncio.wait(
+                {pull_task, push_task},
+                timeout=remaining,
+                return_when=asyncio.FIRST_COMPLETED,
+            )
+            for t in pending:
+                t.cancel()
+            for t in pending:
+                try:
+                    await t
+                except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                    pass
+
+            # If we got a fresh push for our version, we're done.
+            current_v = self._published_version.get(abs_path)
+            if abs_path in self._published and (
+                current_v is None or current_v >= version
+            ):
+                return
+
+            # Pull may have populated _pull_diagnostics — that's also
+            # success.
+            if abs_path in self._pull_diagnostics:
+                return
+
+            # Loop until budget runs out.
+
+    async def _wait_for_fresh_push(self, path: str, version: int, timeout: float) -> None:
+        """Wait until a publishDiagnostics arrives for ``path`` at ``version``+."""
+        deadline = asyncio.get_event_loop().time() + timeout
+        baseline = self._push_counter
+        while True:
+            current_v = self._published_version.get(path)
+            if path in self._published and (current_v is None or current_v >= version):
+                # Debounce — wait a tick in case more diagnostics arrive
+                # immediately after.  TS often emits in pairs.  We
+                # snapshot the counter so we wake on a *new* push, not
+                # on the one that satisfied us a moment ago.
+                debounce_baseline = self._push_counter
+                debounce_deadline = asyncio.get_event_loop().time() + PUSH_DEBOUNCE
+                while self._push_counter == debounce_baseline:
+                    remaining = debounce_deadline - asyncio.get_event_loop().time()
+                    if remaining <= 0:
+                        break
+                    self._push_event.clear()
+                    try:
+                        await asyncio.wait_for(self._push_event.wait(), timeout=remaining)
+                    except asyncio.TimeoutError:
+                        break
+                return
+            remaining = deadline - asyncio.get_event_loop().time()
+            if remaining <= 0:
+                return
+            if self._push_counter > baseline:
+                # New event arrived but predicate still false — re-check
+                # immediately without waiting again.
+                baseline = self._push_counter
+                continue
+            self._push_event.clear()
+            try:
+                await asyncio.wait_for(self._push_event.wait(), timeout=min(remaining, 0.5))
+            except asyncio.TimeoutError:
+                continue
+
+    def diagnostics_for(self, path: str) -> List[Dict[str, Any]]:
+        """Return current merged + deduped diagnostics for one file.
+
+        Diagnostics from push and pull stores are concatenated and
+        deduplicated by ``(severity, code, message, range)`` content
+        key.  Empty list if the server hasn't published anything.
+        """
+        abs_path = os.path.abspath(path)
+        push = self._push_diagnostics.get(abs_path) or []
+        pull = self._pull_diagnostics.get(abs_path) or []
+        return _dedupe(push, pull)
+
+
+def _dedupe(*lists: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
+    seen: Set[str] = set()
+    out: List[Dict[str, Any]] = []
+    for lst in lists:
+        for d in lst:
+            if not isinstance(d, dict):
+                continue
+            key = _diagnostic_key(d)
+            if key in seen:
+                continue
+            seen.add(key)
+            out.append(d)
+    return out
+
+
+def _diagnostic_key(d: Dict[str, Any]) -> str:
+    """Content-equality key for a diagnostic.
+
+    Matches the structural-equality used in claude-code's
+    ``areDiagnosticsEqual`` — message + severity + source + code +
+    range coords.  The range is reduced to a tuple to keep the key
+    stable across dict orderings.
+    """
+    rng = d.get("range") or {}
+    start = rng.get("start") or {}
+    end = rng.get("end") or {}
+    code = d.get("code")
+    if code is not None and not isinstance(code, str):
+        code = str(code)
+    return "\x00".join(
+        [
+            str(d.get("severity") or 1),
+            str(code or ""),
+            str(d.get("source") or ""),
+            str(d.get("message") or "").strip(),
+            f"{start.get('line', 0)}:{start.get('character', 0)}-{end.get('line', 0)}:{end.get('character', 0)}",
+        ]
+    )
+
+
+__all__ = [
+    "LSPClient",
+    "file_uri",
+    "uri_to_path",
+    "INITIALIZE_TIMEOUT",
+    "DIAGNOSTICS_DOCUMENT_WAIT",
+    "DIAGNOSTICS_FULL_WAIT",
+]
--- a/agent/lsp/eventlog.py
+++ b/agent/lsp/eventlog.py
@ -0,0 +1,213 @@
+"""Structured logging with steady-state silence for the LSP layer.
+
+The LSP layer fires on every write_file/patch.  In a busy session
+that's hundreds of events.  We want users to be able to ``rg`` the
+log for "did LSP fire on that edit?" without drowning in noise.
+
+The level model:
+
+- ``DEBUG`` for steady-state events that have no novel signal:
+  ``clean``, ``feature off``, ``extension not mapped``, ``no project
+  root for already-announced file``, ``server unavailable for
+  already-announced binary``.  These never reach ``agent.log`` at the
+  default INFO threshold.
+
+- ``INFO`` for state transitions worth surfacing exactly once per
+  session: ``active for <root>`` the first time a (server_id,
+  workspace_root) client starts, ``no project root for <path>``
+  the first time we see that file.  Plus every diagnostic event
+  (those are inherently rare and per-edit, exactly what users grep
+  for).
+
+- ``WARNING`` for action-required failures: ``server unavailable``
+  (binary not on PATH) the first time per (server_id, binary),
+  ``no server configured`` once per language.  Per-call WARNING for
+  timeouts and unexpected bridge exceptions.
+
+The dedup is in-process module-level sets.  Each set grows at most by
+the number of distinct (server_id, root) and (server_id, binary)
+pairs touched in one Python process — bytes of memory in even an
+aggressive monorepo session.  Bounded LRU was rejected: evicting an
+entry would risk re-firing the WARNING/INFO line we explicitly want
+to suppress.
+
+Grep recipe::
+
+    tail -f ~/.hermes/logs/agent.log | rg 'lsp\\['
+"""
+from __future__ import annotations
+
+import logging
+import os
+import threading
+from typing import Tuple
+
+# Dedicated logger name so the documented grep recipe survives a
+# ``logging.getLogger(__name__)`` rename of any internal module.
+event_log = logging.getLogger("hermes.lint.lsp")
+
+# ---------------------------------------------------------------------------
+# Once-per-X dedup sets
+# ---------------------------------------------------------------------------
+
+_announce_lock = threading.Lock()
+_announced_active: set = set()        # keys: (server_id, workspace_root)
+_announced_unavailable: set = set()   # keys: (server_id, binary_path_or_name)
+_announced_no_root: set = set()       # keys: (server_id, file_path)
+_announced_no_server: set = set()     # keys: (server_id,)
+
+
+def _short_path(file_path: str) -> str:
+    """Render *file_path* relative to the cwd when sensible, else absolute.
+
+    Keeps log lines readable for the common case (the user is inside
+    the project they're editing) without emitting brittle ``../../..``
+    chains for the cross-tree case.
+    """
+    if not file_path:
+        return file_path
+    try:
+        rel = os.path.relpath(file_path)
+    except ValueError:
+        return file_path
+    if rel.startswith(".." + os.sep) or rel == "..":
+        return file_path
+    return rel
+
+
+def _emit(server_id: str, level: int, message: str) -> None:
+    event_log.log(level, "lsp[%s] %s", server_id, message)
+
+
+def _announce_once(bucket: set, key: Tuple) -> bool:
+    """Return True if *key* has not been announced for *bucket* yet.
+
+    Atomically marks the key as announced so concurrent callers
+    cannot both win the race and double-log.
+    """
+    with _announce_lock:
+        if key in bucket:
+            return False
+        bucket.add(key)
+        return True
+
+
+# ---------------------------------------------------------------------------
+# Public event helpers — call these from the LSP layer.
+# ---------------------------------------------------------------------------
+
+
+def log_clean(server_id: str, file_path: str) -> None:
+    """No diagnostics emitted for *file_path*.  DEBUG (silent at default)."""
+    _emit(server_id, logging.DEBUG, f"clean ({_short_path(file_path)})")
+
+
+def log_disabled(server_id: str, file_path: str, reason: str) -> None:
+    """LSP intentionally skipped for this file (feature off, ext unmapped,
+    backend not local, etc.).  DEBUG."""
+    _emit(server_id, logging.DEBUG, f"skipped: {reason} ({_short_path(file_path)})")
+
+
+def log_active(server_id: str, workspace_root: str) -> None:
+    """A new LSP client started for (server_id, workspace_root).
+
+    INFO once per (server_id, workspace_root); DEBUG thereafter.
+    Lets users verify "is LSP actually running?" with a single grep.
+    """
+    key = (server_id, workspace_root)
+    if _announce_once(_announced_active, key):
+        _emit(server_id, logging.INFO, f"active for {workspace_root}")
+    else:
+        _emit(server_id, logging.DEBUG, f"reused client for {workspace_root}")
+
+
+def log_diagnostics(server_id: str, file_path: str, count: int) -> None:
+    """Diagnostics arrived for a file.  INFO every time — these are the
+    failure signals users actually want to grep for, and they are
+    inherently rare per edit."""
+    _emit(server_id, logging.INFO, f"{count} diags ({_short_path(file_path)})")
+
+
+def log_no_project_root(server_id: str, file_path: str) -> None:
+    """File had no recognised project marker.  INFO once per file,
+    DEBUG thereafter."""
+    key = (server_id, file_path)
+    if _announce_once(_announced_no_root, key):
+        _emit(server_id, logging.INFO, f"no project root for {_short_path(file_path)}")
+    else:
+        _emit(server_id, logging.DEBUG, f"no project root for {_short_path(file_path)}")
+
+
+def log_server_unavailable(server_id: str, binary_or_pkg: str) -> None:
+    """The server binary couldn't be resolved.  WARNING once per
+    (server_id, binary), DEBUG thereafter so a hundred subsequent
+    .py edits don't spam the log."""
+    key = (server_id, binary_or_pkg)
+    if _announce_once(_announced_unavailable, key):
+        _emit(
+            server_id,
+            logging.WARNING,
+            f"server unavailable: {binary_or_pkg} not found "
+            "(install via `hermes lsp install <id>` or set lsp.servers.<id>.command)",
+        )
+    else:
+        _emit(server_id, logging.DEBUG, f"server still unavailable: {binary_or_pkg}")
+
+
+def log_no_server_configured(server_id: str) -> None:
+    """No spawn recipe for this language.  WARNING once."""
+    if _announce_once(_announced_no_server, (server_id,)):
+        _emit(server_id, logging.WARNING, "no server configured")
+
+
+def log_timeout(server_id: str, file_path: str, kind: str = "diagnostics") -> None:
+    """A request to the server timed out.  WARNING every time — these are
+    inherently novel events worth surfacing on each occurrence."""
+    _emit(
+        server_id,
+        logging.WARNING,
+        f"{kind} timed out for {_short_path(file_path)}",
+    )
+
+
+def log_server_error(server_id: str, file_path: str, exc: BaseException) -> None:
+    """An unexpected exception bubbled out of the LSP layer.  WARNING."""
+    _emit(
+        server_id,
+        logging.WARNING,
+        f"unexpected error for {_short_path(file_path)}: {type(exc).__name__}: {exc}",
+    )
+
+
+def log_spawn_failed(server_id: str, workspace_root: str, exc: BaseException) -> None:
+    """The LSP server failed to spawn or initialize.  WARNING."""
+    _emit(
+        server_id,
+        logging.WARNING,
+        f"spawn/initialize failed for {workspace_root}: {type(exc).__name__}: {exc}",
+    )
+
+
+def reset_announce_caches() -> None:
+    """Test-only: clear the dedup caches.  Production code never calls this."""
+    with _announce_lock:
+        _announced_active.clear()
+        _announced_unavailable.clear()
+        _announced_no_root.clear()
+        _announced_no_server.clear()
+
+
+__all__ = [
+    "event_log",
+    "log_clean",
+    "log_disabled",
+    "log_active",
+    "log_diagnostics",
+    "log_no_project_root",
+    "log_server_unavailable",
+    "log_no_server_configured",
+    "log_timeout",
+    "log_server_error",
+    "log_spawn_failed",
+    "reset_announce_caches",
+]
--- a/agent/lsp/install.py
+++ b/agent/lsp/install.py
@ -0,0 +1,347 @@
+"""Auto-installation of LSP server binaries.
+
+Tries to install missing servers using whatever package manager is
+appropriate.  All installs go to a Hermes-owned bin staging dir,
+``<HERMES_HOME>/lsp/bin/``, so we don't pollute the user's global
+toolchain.
+
+Strategies:
+
+- ``auto`` — attempt to install with the best available package
+  manager.  This is the default.
+- ``manual`` — never install; if a binary is missing, the server is
+  silently skipped and the user is told about it via ``hermes lsp
+  status``.
+- ``off`` — same as ``manual`` for now (kept distinct so we can
+  evolve behavior later, e.g. logging differently).
+
+The actual installs happen synchronously the first time a server is
+needed and concurrent calls to :func:`try_install` for the same
+package are deduplicated via a per-package lock.
+
+Failure modes are non-fatal: every install path is wrapped in
+try/except and returns ``None`` on failure.  The tool layer then
+falls back to its in-process syntax checker, exactly as if the user
+hadn't enabled LSP at all.
+"""
+from __future__ import annotations
+
+import logging
+import os
+import shutil
+import subprocess
+import sys
+import threading
+from pathlib import Path
+from typing import Dict, Optional
+
+logger = logging.getLogger("agent.lsp.install")
+
+# Package-name → install-strategy hint registry.  Each entry is a
+# tuple of strategy name + package name + executable name.  When the
+# install completes, we look for the executable in
+# ``<HERMES_HOME>/lsp/bin/`` first, then on PATH.
+INSTALL_RECIPES: Dict[str, Dict[str, str]] = {
+    # Python
+    "pyright": {"strategy": "npm", "pkg": "pyright", "bin": "pyright-langserver"},
+    # JS/TS family
+    "typescript-language-server": {
+        "strategy": "npm",
+        "pkg": "typescript-language-server",
+        "bin": "typescript-language-server",
+    },
+    "@vue/language-server": {
+        "strategy": "npm",
+        "pkg": "@vue/language-server",
+        "bin": "vue-language-server",
+    },
+    "svelte-language-server": {
+        "strategy": "npm",
+        "pkg": "svelte-language-server",
+        "bin": "svelteserver",
+    },
+    "@astrojs/language-server": {
+        "strategy": "npm",
+        "pkg": "@astrojs/language-server",
+        "bin": "astro-ls",
+    },
+    "yaml-language-server": {
+        "strategy": "npm",
+        "pkg": "yaml-language-server",
+        "bin": "yaml-language-server",
+    },
+    "bash-language-server": {
+        "strategy": "npm",
+        "pkg": "bash-language-server",
+        "bin": "bash-language-server",
+    },
+    "intelephense": {"strategy": "npm", "pkg": "intelephense", "bin": "intelephense"},
+    "dockerfile-language-server-nodejs": {
+        "strategy": "npm",
+        "pkg": "dockerfile-language-server-nodejs",
+        "bin": "docker-langserver",
+    },
+    # Go
+    "gopls": {"strategy": "go", "pkg": "golang.org/x/tools/gopls@latest", "bin": "gopls"},
+    # Rust — too heavy (hundreds of MB to bootstrap).  We do NOT
+    # auto-install rust-analyzer; users install via rustup.
+    "rust-analyzer": {"strategy": "manual", "pkg": "", "bin": "rust-analyzer"},
+    # C/C++ — manual (clangd ships with LLVM, very heavy)
+    "clangd": {"strategy": "manual", "pkg": "", "bin": "clangd"},
+    # Lua — manual (LuaLS is platform-specific binaries from GitHub
+    # releases; complex enough that we punt to the user)
+    "lua-language-server": {"strategy": "manual", "pkg": "", "bin": "lua-language-server"},
+}
+
+
+_install_locks: Dict[str, threading.Lock] = {}
+_install_results: Dict[str, Optional[str]] = {}
+_install_lock_meta = threading.Lock()
+
+
+def hermes_lsp_bin_dir() -> Path:
+    """Return the Hermes-owned bin staging dir for LSP servers."""
+    home = os.environ.get("HERMES_HOME")
+    if home is None:
+        home = os.path.join(os.path.expanduser("~"), ".hermes")
+    p = Path(home) / "lsp" / "bin"
+    p.mkdir(parents=True, exist_ok=True)
+    return p
+
+
+def _existing_binary(name: str) -> Optional[str]:
+    """Probe the staging dir + PATH for a binary named ``name``."""
+    staged = hermes_lsp_bin_dir() / name
+    if staged.exists() and os.access(staged, os.X_OK):
+        return str(staged)
+    on_path = shutil.which(name)
+    if on_path:
+        return on_path
+    return None
+
+
+def _get_lock(pkg: str) -> threading.Lock:
+    with _install_lock_meta:
+        lock = _install_locks.get(pkg)
+        if lock is None:
+            lock = threading.Lock()
+            _install_locks[pkg] = lock
+        return lock
+
+
+def try_install(pkg: str, strategy: str = "auto") -> Optional[str]:
+    """Try to install ``pkg`` and return the binary path if successful.
+
+    ``strategy`` is ``"auto"``, ``"manual"``, or ``"off"``.  In
+    ``manual``/``off`` mode, this function only probes for an
+    existing binary and returns ``None`` if not found.
+
+    The install is cached per-package — a second call returns the
+    same path (or ``None``) without reinstalling.  Concurrent calls
+    are serialized.
+    """
+    if strategy not in ("auto",):
+        # Only ``auto`` triggers an actual install.  In manual/off,
+        # we still check whether the binary already exists.
+        recipe = INSTALL_RECIPES.get(pkg, {})
+        bin_name = recipe.get("bin", pkg)
+        return _existing_binary(bin_name)
+
+    if pkg in _install_results:
+        return _install_results[pkg]
+
+    lock = _get_lock(pkg)
+    with lock:
+        # Double-check after acquiring lock.
+        if pkg in _install_results:
+            return _install_results[pkg]
+        result = _do_install(pkg)
+        _install_results[pkg] = result
+        return result
+
+
+def _do_install(pkg: str) -> Optional[str]:
+    recipe = INSTALL_RECIPES.get(pkg)
+    if recipe is None:
+        # Not in our registry — best-effort: just probe PATH.
+        return shutil.which(pkg)
+
+    strategy = recipe.get("strategy", "manual")
+    bin_name = recipe.get("bin", pkg)
+
+    # Check if already present (shutil.which or staging dir)
+    existing = _existing_binary(bin_name)
+    if existing:
+        return existing
+
+    if strategy == "manual":
+        logger.debug("[install] %s requires manual install (recipe=%s)", pkg, recipe)
+        return None
+
+    if strategy == "npm":
+        return _install_npm(recipe.get("pkg", pkg), bin_name)
+    if strategy == "go":
+        return _install_go(recipe.get("pkg", pkg), bin_name)
+    if strategy == "pip":
+        return _install_pip(recipe.get("pkg", pkg), bin_name)
+
+    logger.warning("[install] unknown strategy %r for %s", strategy, pkg)
+    return None
+
+
+def _install_npm(pkg: str, bin_name: str) -> Optional[str]:
+    """Install an npm package into our staging dir.
+
+    Uses ``npm install --prefix`` so the binaries land in
+    ``<staging>/node_modules/.bin/<bin_name>`` and we symlink them up
+    one level for direct PATH-style access.
+    """
+    npm = shutil.which("npm")
+    if npm is None:
+        logger.info("[install] cannot install %s: npm not on PATH", pkg)
+        return None
+    staging = hermes_lsp_bin_dir().parent  # <HERMES_HOME>/lsp/
+    try:
+        logger.info("[install] npm install --prefix %s %s", staging, pkg)
+        proc = subprocess.run(
+            [npm, "install", "--prefix", str(staging), "--silent", "--no-fund", "--no-audit", pkg],
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=300,
+        )
+        if proc.returncode != 0:
+            logger.warning(
+                "[install] npm install failed for %s: %s", pkg, proc.stderr.strip()[:500]
+            )
+            return None
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.warning("[install] npm install errored for %s: %s", pkg, e)
+        return None
+
+    # Find the bin
+    nm_bin = staging / "node_modules" / ".bin" / bin_name
+    if os.name == "nt":
+        # On Windows npm sometimes drops `.cmd` shims
+        candidates = [nm_bin, nm_bin.with_suffix(".cmd")]
+    else:
+        candidates = [nm_bin]
+    for c in candidates:
+        if c.exists():
+            # Symlink into our `lsp/bin/` for stable PATH access.
+            link = hermes_lsp_bin_dir() / c.name
+            if not link.exists():
+                try:
+                    link.symlink_to(c)
+                except (OSError, NotImplementedError):
+                    # Symlinks fail on some Windows setups — copy instead.
+                    try:
+                        shutil.copy2(c, link)
+                    except OSError:
+                        return str(c)
+            return str(link if link.exists() else c)
+    logger.warning("[install] npm install for %s succeeded but bin %s not found", pkg, bin_name)
+    return None
+
+
+def _install_go(pkg: str, bin_name: str) -> Optional[str]:
+    """Install a Go module to GOBIN=<staging>."""
+    go = shutil.which("go")
+    if go is None:
+        logger.info("[install] cannot install %s: go not on PATH", pkg)
+        return None
+    staging = hermes_lsp_bin_dir()
+    env = dict(os.environ)
+    env["GOBIN"] = str(staging)
+    try:
+        logger.info("[install] go install %s (GOBIN=%s)", pkg, staging)
+        proc = subprocess.run(
+            [go, "install", pkg],
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=600,
+            env=env,
+        )
+        if proc.returncode != 0:
+            logger.warning(
+                "[install] go install failed for %s: %s", pkg, proc.stderr.strip()[:500]
+            )
+            return None
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.warning("[install] go install errored for %s: %s", pkg, e)
+        return None
+    bin_path = staging / bin_name
+    if os.name == "nt":
+        bin_path = bin_path.with_suffix(".exe")
+    if bin_path.exists():
+        return str(bin_path)
+    logger.warning("[install] go install for %s succeeded but bin %s not found", pkg, bin_name)
+    return None
+
+
+def _install_pip(pkg: str, bin_name: str) -> Optional[str]:
+    """Install a Python package into a hermes-owned target dir.
+
+    We avoid polluting the user's site-packages by using
+    ``pip install --target``.  Bins go into
+    ``<staging>/python-packages/bin/`` which we symlink into
+    ``<staging>/bin``.  Note: this only works for packages that ship a
+    console script.
+    """
+    pip_target = hermes_lsp_bin_dir().parent / "python-packages"
+    pip_target.mkdir(parents=True, exist_ok=True)
+    try:
+        logger.info("[install] pip install --target %s %s", pip_target, pkg)
+        proc = subprocess.run(
+            [sys.executable, "-m", "pip", "install", "--target", str(pip_target), "--quiet", pkg],
+            check=False,
+            capture_output=True,
+            text=True,
+            timeout=300,
+        )
+        if proc.returncode != 0:
+            logger.warning(
+                "[install] pip install failed for %s: %s", pkg, proc.stderr.strip()[:500]
+            )
+            return None
+    except (subprocess.TimeoutExpired, OSError) as e:
+        logger.warning("[install] pip install errored for %s: %s", pkg, e)
+        return None
+    # Look for the script
+    bin_path = pip_target / "bin" / bin_name
+    if bin_path.exists():
+        link = hermes_lsp_bin_dir() / bin_name
+        if not link.exists():
+            try:
+                link.symlink_to(bin_path)
+            except (OSError, NotImplementedError):
+                try:
+                    shutil.copy2(bin_path, link)
+                except OSError:
+                    return str(bin_path)
+        return str(link if link.exists() else bin_path)
+    return None
+
+
+def detect_status(pkg: str) -> str:
+    """Return ``installed``, ``missing``, or ``manual-only`` for a package.
+
+    Used by the ``hermes lsp status`` CLI to give users a quick
+    overview of what's available without spawning anything.
+    """
+    recipe = INSTALL_RECIPES.get(pkg)
+    bin_name = recipe.get("bin", pkg) if recipe else pkg
+    if _existing_binary(bin_name):
+        return "installed"
+    if recipe and recipe.get("strategy") == "manual":
+        return "manual-only"
+    return "missing"
+
+
+__all__ = [
+    "INSTALL_RECIPES",
+    "try_install",
+    "detect_status",
+    "hermes_lsp_bin_dir",
+]
--- a/agent/lsp/manager.py
+++ b/agent/lsp/manager.py
@ -0,0 +1,607 @@
+"""Service-level orchestration for LSP clients.
+
+The :class:`LSPService` is the bridge between the synchronous
+file_operations layer and the async :class:`agent.lsp.client.LSPClient`.
+
+Design choices:
+
+- A **single asyncio event loop** runs in a background thread.  All
+  client work happens on that loop.  Synchronous callers from
+  ``tools/file_operations.py`` use :meth:`get_diagnostics_sync` to
+  open + wait + drain in one blocking call.
+
+- One client per ``(server_id, workspace_root)`` key.  Lazy spawn:
+  the first request for a key spawns the client; subsequent requests
+  re-use it.
+
+- A **broken-set** records ``(server_id, workspace_root)`` pairs that
+  failed to spawn or initialize.  These are never retried for the
+  life of the service.  Mirrors OpenCode's design.
+
+- A **delta baseline** map keeps "diagnostics-as-of-the-last-snapshot"
+  per file.  ``snapshot_baseline()`` is called BEFORE a write; the
+  next ``get_diagnostics_sync()`` returns only diagnostics that
+  weren't in the baseline.  This is the lift from Claude Code's
+  ``beforeFileEdited`` / ``getNewDiagnostics`` pattern, except wired
+  to the local LSP layer instead of MCP IDE RPC.
+
+The service is **off by default** — call :meth:`is_active` to check
+whether it's actually doing anything.  When LSP is disabled in
+config, when no git workspace can be detected, when all configured
+servers are missing binaries and auto-install is off, ``is_active``
+returns False and the file_operations layer falls through to the
+in-process syntax check.
+"""
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+import threading
+import time
+from concurrent.futures import Future as ConcurrentFuture
+from typing import Any, Dict, List, Optional, Tuple
+
+from agent.lsp import eventlog
+from agent.lsp.client import (
+    DIAGNOSTICS_DOCUMENT_WAIT,
+    LSPClient,
+    file_uri,
+)
+from agent.lsp.servers import (
+    ServerContext,
+    ServerDef,
+    SpawnSpec,
+    find_server_for_file,
+    language_id_for,
+)
+from agent.lsp.workspace import (
+    clear_cache,
+    is_inside_workspace,
+    resolve_workspace_for_file,
+)
+
+logger = logging.getLogger("agent.lsp.manager")
+
+DEFAULT_IDLE_TIMEOUT = 600  # seconds; servers idle for >10min get reaped
+
+
+class _BackgroundLoop:
+    """A daemon thread that owns one asyncio event loop.
+
+    Provides :meth:`run` for synchronous callers — submits a coroutine
+    to the loop and blocks until it finishes (or a timeout fires).
+    """
+
+    def __init__(self) -> None:
+        self._loop: Optional[asyncio.AbstractEventLoop] = None
+        self._thread: Optional[threading.Thread] = None
+        self._ready = threading.Event()
+
+    def start(self) -> None:
+        if self._thread is not None:
+            return
+        self._thread = threading.Thread(
+            target=self._run_forever,
+            name="hermes-lsp-loop",
+            daemon=True,
+        )
+        self._thread.start()
+        self._ready.wait(timeout=5.0)
+
+    def _run_forever(self) -> None:
+        loop = asyncio.new_event_loop()
+        self._loop = loop
+        asyncio.set_event_loop(loop)
+        self._ready.set()
+        try:
+            loop.run_forever()
+        finally:
+            try:
+                loop.close()
+            except Exception:  # noqa: BLE001
+                pass
+
+    def run(self, coro, *, timeout: Optional[float] = None) -> Any:
+        """Submit a coroutine to the loop and block until done.
+
+        Returns the coroutine's result, or raises its exception.
+        """
+        if self._loop is None:
+            raise RuntimeError("background loop not started")
+        fut: ConcurrentFuture = asyncio.run_coroutine_threadsafe(coro, self._loop)
+        try:
+            return fut.result(timeout=timeout)
+        except Exception:
+            fut.cancel()
+            raise
+
+    def stop(self) -> None:
+        loop = self._loop
+        if loop is None:
+            return
+        try:
+            loop.call_soon_threadsafe(loop.stop)
+        except RuntimeError:
+            pass
+        if self._thread is not None:
+            self._thread.join(timeout=2.0)
+        self._loop = None
+        self._thread = None
+
+
+class LSPService:
+    """The process-wide LSP service.
+
+    Created once via :meth:`create_from_config`; the
+    :func:`agent.lsp.get_service` accessor manages the singleton.
+    Most callers should use that accessor rather than constructing
+    :class:`LSPService` directly.
+    """
+
+    # ------------------------------------------------------------------
+    # construction + factory
+    # ------------------------------------------------------------------
+
+    def __init__(
+        self,
+        *,
+        enabled: bool,
+        wait_mode: str,
+        wait_timeout: float,
+        install_strategy: str,
+        binary_overrides: Optional[Dict[str, List[str]]] = None,
+        env_overrides: Optional[Dict[str, Dict[str, str]]] = None,
+        init_overrides: Optional[Dict[str, Dict[str, Any]]] = None,
+        disabled_servers: Optional[List[str]] = None,
+        idle_timeout: float = DEFAULT_IDLE_TIMEOUT,
+    ) -> None:
+        self._enabled = enabled
+        self._wait_mode = wait_mode if wait_mode in ("document", "full") else "document"
+        self._wait_timeout = wait_timeout
+        self._install_strategy = install_strategy
+        self._binary_overrides = binary_overrides or {}
+        self._env_overrides = env_overrides or {}
+        self._init_overrides = init_overrides or {}
+        self._disabled_servers = set(disabled_servers or [])
+        self._idle_timeout = idle_timeout
+
+        self._loop = _BackgroundLoop()
+        if self._enabled:
+            self._loop.start()
+
+        # Per-(server_id, workspace_root) state
+        self._clients: Dict[Tuple[str, str], LSPClient] = {}
+        self._broken: set = set()
+        self._spawning: Dict[Tuple[str, str], asyncio.Future] = {}
+        self._last_used: Dict[Tuple[str, str], float] = {}
+        self._state_lock = threading.Lock()
+
+        # Delta baseline: file path → snapshot of diagnostics taken
+        # immediately before a write.  ``get_diagnostics_sync`` filters
+        # out anything in the baseline so the agent only sees errors
+        # introduced by the current edit.
+        self._delta_baseline: Dict[str, List[Dict[str, Any]]] = {}
+
+    @classmethod
+    def create_from_config(cls) -> Optional["LSPService"]:
+        """Build a service from ``hermes_cli.config`` settings.
+
+        Returns ``None`` if the config can't be loaded.  The service
+        itself returns ``is_active()`` False when LSP is disabled.
+        """
+        try:
+            from hermes_cli.config import load_config
+            cfg = load_config()
+        except Exception as e:  # noqa: BLE001
+            logger.debug("LSP config load failed: %s", e)
+            return None
+
+        lsp_cfg = (cfg.get("lsp") or {}) if isinstance(cfg, dict) else {}
+        if not isinstance(lsp_cfg, dict):
+            lsp_cfg = {}
+
+        enabled = bool(lsp_cfg.get("enabled", True))
+        wait_mode = lsp_cfg.get("wait_mode", "document")
+        wait_timeout = float(lsp_cfg.get("wait_timeout", DIAGNOSTICS_DOCUMENT_WAIT))
+        install_strategy = lsp_cfg.get("install_strategy", "auto")
+        servers_cfg = lsp_cfg.get("servers") or {}
+        disabled = []
+        binary_overrides: Dict[str, List[str]] = {}
+        env_overrides: Dict[str, Dict[str, str]] = {}
+        init_overrides: Dict[str, Dict[str, Any]] = {}
+        if isinstance(servers_cfg, dict):
+            for name, sub in servers_cfg.items():
+                if not isinstance(sub, dict):
+                    continue
+                if sub.get("disabled"):
+                    disabled.append(name)
+                cmd = sub.get("command")
+                if isinstance(cmd, list) and cmd:
+                    binary_overrides[name] = cmd
+                env = sub.get("env")
+                if isinstance(env, dict):
+                    env_overrides[name] = {k: str(v) for k, v in env.items()}
+                init = sub.get("initialization_options")
+                if isinstance(init, dict):
+                    init_overrides[name] = init
+
+        return cls(
+            enabled=enabled,
+            wait_mode=wait_mode,
+            wait_timeout=wait_timeout,
+            install_strategy=install_strategy,
+            binary_overrides=binary_overrides,
+            env_overrides=env_overrides,
+            init_overrides=init_overrides,
+            disabled_servers=disabled,
+        )
+
+    # ------------------------------------------------------------------
+    # public API
+    # ------------------------------------------------------------------
+
+    def is_active(self) -> bool:
+        """Return True iff this service should be consulted at all."""
+        return self._enabled
+
+    def enabled_for(self, file_path: str) -> bool:
+        """Return True iff LSP should run for this specific file.
+
+        Gates on workspace detection (file or cwd inside a git worktree),
+        on whether any registered server matches the extension, and
+        on whether the (server_id, workspace_root) pair is in the
+        broken-set from a previous spawn failure.
+
+        Files in already-broken pairs return False so the file_operations
+        layer skips the LSP path entirely — no spawn attempts, no
+        timeout cost — until the service is restarted (``hermes lsp
+        restart``) or the process exits.
+        """
+        if not self._enabled:
+            return False
+        srv = find_server_for_file(file_path)
+        if srv is None or srv.server_id in self._disabled_servers:
+            return False
+        ws_root, gated_in = resolve_workspace_for_file(file_path)
+        if not (ws_root and gated_in):
+            return False
+        # Broken-set short-circuit.  Use the per-server root if we can
+        # compute one cheaply; otherwise fall back to the workspace
+        # root as the broken key (which is what _get_or_spawn would
+        # have used anyway when it failed).
+        try:
+            per_server_root = srv.resolve_root(file_path, ws_root) or ws_root
+        except Exception:  # noqa: BLE001
+            per_server_root = ws_root
+        if (srv.server_id, per_server_root) in self._broken:
+            return False
+        return True
+
+    def snapshot_baseline(self, file_path: str) -> None:
+        """Snapshot current diagnostics for ``file_path`` as the delta baseline.
+
+        Called BEFORE a write so the next ``get_diagnostics_sync()``
+        can filter out pre-existing errors.  Best-effort — failures
+        are silently swallowed so a flaky server can't break a write.
+
+        Outer timeouts (e.g. server hangs during initialize) mark the
+        (server_id, workspace_root) pair as broken so subsequent edits
+        skip it instantly instead of re-paying the timeout cost.
+        """
+        if not self.enabled_for(file_path):
+            return
+        try:
+            diags = self._loop.run(self._snapshot_async(file_path), timeout=8.0)
+            self._delta_baseline[os.path.abspath(file_path)] = diags or []
+        except Exception as e:  # noqa: BLE001
+            logger.debug("baseline snapshot failed for %s: %s", file_path, e)
+            self._mark_broken_for_file(file_path, e)
+            self._delta_baseline[os.path.abspath(file_path)] = []
+
+    def get_diagnostics_sync(
+        self,
+        file_path: str,
+        *,
+        delta: bool = True,
+        timeout: Optional[float] = None,
+    ) -> List[Dict[str, Any]]:
+        """Synchronously open ``file_path`` in the right server, wait for
+        diagnostics, return them.
+
+        If ``delta`` is True (default), the result is filtered against
+        any baseline previously captured via :meth:`snapshot_baseline`.
+        Diagnostics present in the baseline are removed so the caller
+        only sees errors introduced by the current edit.
+
+        Returns an empty list when LSP is disabled, when no workspace
+        can be detected, when no server matches, or when the server
+        can't be spawned.  Never raises.
+        """
+        if not self.enabled_for(file_path):
+            return []
+
+        # Resolve server_id eagerly so we can emit structured logs even
+        # when the request errors out below.
+        srv = find_server_for_file(file_path)
+        server_id = srv.server_id if srv else "?"
+
+        try:
+            t = timeout if timeout is not None else self._wait_timeout + 2.0
+            diags = self._loop.run(self._open_and_wait_async(file_path), timeout=t) or []
+        except asyncio.TimeoutError as e:
+            eventlog.log_timeout(server_id, file_path)
+            logger.debug("LSP diagnostics timeout for %s: %s", file_path, e)
+            self._mark_broken_for_file(file_path, e)
+            return []
+        except Exception as e:  # noqa: BLE001
+            eventlog.log_server_error(server_id, file_path, e)
+            logger.debug("LSP diagnostics fetch failed for %s: %s", file_path, e)
+            self._mark_broken_for_file(file_path, e)
+            return []
+
+        abs_path = os.path.abspath(file_path)
+        if delta:
+            baseline = self._delta_baseline.get(abs_path) or []
+            if baseline:
+                seen = {_diag_key(d) for d in baseline}
+                diags = [d for d in diags if _diag_key(d) not in seen]
+            # Roll baseline forward — next call returns deltas relative
+            # to the just-emitted state, mirroring claude-code's
+            # diagnosticTracking.
+            try:
+                fresh = self._loop.run(self._current_diags_async(file_path), timeout=2.0) or []
+            except Exception:  # noqa: BLE001
+                fresh = []
+            if fresh:
+                self._delta_baseline[abs_path] = fresh
+
+        if diags:
+            eventlog.log_diagnostics(server_id, file_path, len(diags))
+        else:
+            eventlog.log_clean(server_id, file_path)
+        return diags
+
+    def _mark_broken_for_file(self, file_path: str, exc: BaseException) -> None:
+        """Mark the (server_id, workspace_root) pair as broken so subsequent
+        edits skip it instantly instead of re-paying timeout cost.
+
+        Called when the outer ``_loop.run`` timeout cancels an in-flight
+        spawn/initialize that the inner ``_get_or_spawn`` task was still
+        holding open.  Without this, every subsequent write would re-enter
+        the spawn path and re-pay the full ``snapshot_baseline``
+        timeout (8s) until the binary is fixed.
+
+        Also kills any orphan client process that survived the cancelled
+        future, and emits a single eventlog WARNING so the user knows
+        which server gave up.
+
+        ``exc`` is whatever exception the outer wrapper caught — used
+        only for logging, never re-raised.
+        """
+        srv = find_server_for_file(file_path)
+        if srv is None:
+            return
+        ws_root, gated = resolve_workspace_for_file(file_path)
+        if not (ws_root and gated):
+            return
+        try:
+            per_server_root = srv.resolve_root(file_path, ws_root) or ws_root
+        except Exception:  # noqa: BLE001
+            per_server_root = ws_root
+        key = (srv.server_id, per_server_root)
+        already_broken = key in self._broken
+        self._broken.add(key)
+
+        # Kill any client we managed to spawn before the timeout.  The
+        # cancelled future never reached the broken-set add inside
+        # ``_get_or_spawn`` so the client may still be hanging in
+        # ``_clients`` with a half-initialized state.
+        with self._state_lock:
+            client = self._clients.pop(key, None)
+        if client is not None:
+            try:
+                # Fire-and-forget shutdown — give it a second to cleanup,
+                # but don't block.  We're already on a slow path.
+                self._loop.run(client.shutdown(), timeout=1.0)
+            except Exception:  # noqa: BLE001
+                pass
+
+        if not already_broken:
+            eventlog.log_spawn_failed(srv.server_id, per_server_root, exc)
+
+    def shutdown(self) -> None:
+        """Tear down all clients and stop the background loop."""
+        if not self._enabled:
+            return
+        try:
+            self._loop.run(self._shutdown_async(), timeout=10.0)
+        except Exception as e:  # noqa: BLE001
+            logger.debug("LSP shutdown error: %s", e)
+        self._loop.stop()
+        clear_cache()
+
+    # ------------------------------------------------------------------
+    # async internals
+    # ------------------------------------------------------------------
+
+    async def _snapshot_async(self, file_path: str) -> List[Dict[str, Any]]:
+        client = await self._get_or_spawn(file_path)
+        if client is None:
+            return []
+        try:
+            version = await client.open_file(file_path, language_id=language_id_for(file_path))
+            await client.wait_for_diagnostics(file_path, version, mode=self._wait_mode)
+        except Exception as e:  # noqa: BLE001
+            logger.debug("snapshot open/wait failed: %s", e)
+            return []
+        self._last_used[(client.server_id, client.workspace_root)] = time.time()
+        return list(client.diagnostics_for(file_path))
+
+    async def _open_and_wait_async(self, file_path: str) -> List[Dict[str, Any]]:
+        client = await self._get_or_spawn(file_path)
+        if client is None:
+            return []
+        try:
+            version = await client.open_file(file_path, language_id=language_id_for(file_path))
+            await client.save_file(file_path)
+            await client.wait_for_diagnostics(file_path, version, mode=self._wait_mode)
+        except Exception as e:  # noqa: BLE001
+            logger.debug("open/wait failed for %s: %s", file_path, e)
+            return []
+        self._last_used[(client.server_id, client.workspace_root)] = time.time()
+        return list(client.diagnostics_for(file_path))
+
+    async def _current_diags_async(self, file_path: str) -> List[Dict[str, Any]]:
+        ws, gated = resolve_workspace_for_file(file_path)
+        srv = find_server_for_file(file_path)
+        if not (ws and gated and srv):
+            return []
+        with self._state_lock:
+            client = self._clients.get((srv.server_id, ws))
+        if client is None:
+            return []
+        return list(client.diagnostics_for(file_path))
+
+    async def _get_or_spawn(self, file_path: str) -> Optional[LSPClient]:
+        srv = find_server_for_file(file_path)
+        if srv is None:
+            return None
+        if srv.server_id in self._disabled_servers:
+            eventlog.log_disabled(srv.server_id, file_path, "disabled in config")
+            return None
+        ws_root, gated = resolve_workspace_for_file(file_path)
+        if not (ws_root and gated):
+            eventlog.log_no_project_root(srv.server_id, file_path)
+            return None
+        per_server_root = srv.resolve_root(file_path, ws_root)
+        if per_server_root is None:
+            eventlog.log_disabled(
+                srv.server_id, file_path, "exclude marker hit (server gated off)"
+            )
+            return None  # exclude marker hit, server gated off
+
+        key = (srv.server_id, per_server_root)
+        if key in self._broken:
+            return None
+        with self._state_lock:
+            client = self._clients.get(key)
+            if client is not None and client.is_running:
+                eventlog.log_active(srv.server_id, per_server_root)
+                return client
+            spawning = self._spawning.get(key)
+        if spawning is not None:
+            try:
+                return await spawning
+            except Exception:  # noqa: BLE001
+                return None
+
+        # Begin spawn
+        loop = asyncio.get_running_loop()
+        spawn_future: asyncio.Future = loop.create_future()
+        with self._state_lock:
+            self._spawning[key] = spawn_future
+        try:
+            ctx = ServerContext(
+                workspace_root=per_server_root,
+                install_strategy=self._install_strategy,
+                binary_overrides=self._binary_overrides,
+                env_overrides=self._env_overrides,
+                init_overrides=self._init_overrides,
+            )
+            spec = srv.build_spawn(per_server_root, ctx)
+            if spec is None:
+                # ``build_spawn`` returns None when the binary can't be
+                # located (auto-install disabled, manual-only server,
+                # or install attempt failed).  Surface this once via
+                # the structured logger so the user can act on it.
+                eventlog.log_server_unavailable(srv.server_id, srv.server_id)
+                self._broken.add(key)
+                spawn_future.set_result(None)
+                return None
+            client = LSPClient(
+                server_id=srv.server_id,
+                workspace_root=spec.workspace_root,
+                command=spec.command,
+                env=spec.env,
+                cwd=spec.cwd,
+                initialization_options=spec.initialization_options,
+                seed_diagnostics_on_first_push=spec.seed_diagnostics_on_first_push or srv.seed_first_push,
+            )
+            try:
+                await client.start()
+            except Exception as e:  # noqa: BLE001
+                eventlog.log_spawn_failed(srv.server_id, per_server_root, e)
+                self._broken.add(key)
+                spawn_future.set_result(None)
+                return None
+            with self._state_lock:
+                self._clients[key] = client
+            self._last_used[key] = time.time()
+            eventlog.log_active(srv.server_id, per_server_root)
+            spawn_future.set_result(client)
+            return client
+        finally:
+            with self._state_lock:
+                self._spawning.pop(key, None)
+
+    async def _shutdown_async(self) -> None:
+        with self._state_lock:
+            clients = list(self._clients.values())
+            self._clients.clear()
+            self._broken.clear()
+            self._last_used.clear()
+        await asyncio.gather(
+            *(c.shutdown() for c in clients),
+            return_exceptions=True,
+        )
+
+    # ------------------------------------------------------------------
+    # status / introspection (used by ``hermes lsp status``)
+    # ------------------------------------------------------------------
+
+    def get_status(self) -> Dict[str, Any]:
+        """Return a snapshot of the service for the CLI status command."""
+        with self._state_lock:
+            clients = [
+                {
+                    "server_id": k[0],
+                    "workspace_root": k[1],
+                    "state": c.state,
+                    "running": c.is_running,
+                }
+                for k, c in self._clients.items()
+            ]
+            broken = list(self._broken)
+        return {
+            "enabled": self._enabled,
+            "wait_mode": self._wait_mode,
+            "wait_timeout": self._wait_timeout,
+            "install_strategy": self._install_strategy,
+            "clients": clients,
+            "broken": broken,
+            "disabled_servers": sorted(self._disabled_servers),
+        }
+
+
+def _diag_key(d: Dict[str, Any]) -> str:
+    """Content equality key used for delta filtering.  Mirrors
+    :func:`agent.lsp.client._diagnostic_key`."""
+    rng = d.get("range") or {}
+    start = rng.get("start") or {}
+    end = rng.get("end") or {}
+    code = d.get("code")
+    if code is not None and not isinstance(code, str):
+        code = str(code)
+    return "\x00".join(
+        [
+            str(d.get("severity") or 1),
+            str(code or ""),
+            str(d.get("source") or ""),
+            str(d.get("message") or "").strip(),
+            f"{start.get('line', 0)}:{start.get('character', 0)}-{end.get('line', 0)}:{end.get('character', 0)}",
+        ]
+    )
+
+
+__all__ = ["LSPService"]
--- a/agent/lsp/protocol.py
+++ b/agent/lsp/protocol.py
@ -0,0 +1,196 @@
+"""Minimal LSP JSON-RPC 2.0 framer over async streams.
+
+LSP wire format:
+
+    Content-Length: <bytes>\\r\\n
+    \\r\\n
+    <utf-8 JSON body>
+
+The body is a JSON-RPC 2.0 envelope: request, response, or notification.
+
+This module replaces what ``vscode-jsonrpc/node`` would do in a
+TypeScript implementation.  We keep it deliberately small — just the
+framer + envelope helpers — so :class:`agent.lsp.client.LSPClient` can
+focus on protocol semantics.
+"""
+from __future__ import annotations
+
+import asyncio
+import json
+import logging
+from typing import Any, Optional, Tuple
+
+logger = logging.getLogger("agent.lsp.protocol")
+
+# LSP error codes we care about.  Full list in
+# https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#errorCodes
+ERROR_CONTENT_MODIFIED = -32801
+ERROR_REQUEST_CANCELLED = -32800
+ERROR_METHOD_NOT_FOUND = -32601
+
+
+class LSPProtocolError(Exception):
+    """Raised when the wire protocol is violated.
+
+    Distinct from :class:`LSPRequestError` which represents a server
+    returning a JSON-RPC error response — that's protocol-conformant.
+    This exception means the framing or envelope itself is broken.
+    """
+
+
+class LSPRequestError(Exception):
+    """Raised when an LSP request returns an error response.
+
+    Carries the JSON-RPC ``code``, ``message``, and optional ``data``.
+    """
+
+    def __init__(self, code: int, message: str, data: Any = None) -> None:
+        super().__init__(f"LSP error {code}: {message}")
+        self.code = code
+        self.message = message
+        self.data = data
+
+
+def encode_message(obj: dict) -> bytes:
+    """Encode a JSON-RPC envelope as a Content-Length framed byte string.
+
+    The body is encoded as compact UTF-8 JSON (no spaces between
+    separators) — matches what ``vscode-jsonrpc`` emits and keeps the
+    Content-Length count exact.
+    """
+    body = json.dumps(obj, separators=(",", ":"), ensure_ascii=False).encode("utf-8")
+    header = f"Content-Length: {len(body)}\r\n\r\n".encode("ascii")
+    return header + body
+
+
+async def read_message(reader: asyncio.StreamReader) -> Optional[dict]:
+    """Read one Content-Length framed JSON-RPC message from the stream.
+
+    Returns ``None`` on clean EOF (server closed stdout cleanly between
+    messages — typical shutdown).  Raises :class:`LSPProtocolError` on
+    malformed framing.
+
+    The reader is advanced to just past the JSON body on success.
+    """
+    headers: dict = {}
+    header_bytes = 0
+    while True:
+        try:
+            line = await reader.readuntil(b"\r\n")
+        except asyncio.IncompleteReadError as e:
+            # EOF while reading headers.  If we hadn't started a header
+            # block, treat as clean EOF; otherwise the framing is bad.
+            if not e.partial and not headers:
+                return None
+            raise LSPProtocolError(
+                f"unexpected EOF while reading LSP headers (partial={e.partial!r})"
+            ) from e
+        # Defensive cap against a server streaming headers without ever
+        # emitting CRLF-CRLF.  Caps total header bytes at 8 KiB — a
+        # well-behaved server fits in well under 200 bytes.
+        header_bytes += len(line)
+        if header_bytes > 8192:
+            raise LSPProtocolError(
+                f"LSP header block exceeded 8 KiB without terminator"
+            )
+        line = line[:-2]  # strip CRLF
+        if not line:
+            break  # blank line ends header block
+        try:
+            key, _, value = line.decode("ascii").partition(":")
+        except UnicodeDecodeError as e:
+            raise LSPProtocolError(f"non-ASCII LSP header: {line!r}") from e
+        if not key:
+            raise LSPProtocolError(f"malformed LSP header line: {line!r}")
+        headers[key.strip().lower()] = value.strip()
+
+    cl = headers.get("content-length")
+    if cl is None:
+        raise LSPProtocolError(f"LSP message missing Content-Length: {headers!r}")
+    try:
+        n = int(cl)
+    except ValueError as e:
+        raise LSPProtocolError(f"non-integer Content-Length: {cl!r}") from e
+    if n < 0 or n > 64 * 1024 * 1024:  # 64 MiB sanity cap
+        raise LSPProtocolError(f"unreasonable Content-Length: {n}")
+
+    try:
+        body = await reader.readexactly(n)
+    except asyncio.IncompleteReadError as e:
+        raise LSPProtocolError(
+            f"truncated LSP body: expected {n} bytes, got {len(e.partial)}"
+        ) from e
+
+    try:
+        return json.loads(body.decode("utf-8"))
+    except json.JSONDecodeError as e:
+        raise LSPProtocolError(f"invalid JSON in LSP body: {e}") from e
+    except UnicodeDecodeError as e:
+        raise LSPProtocolError(f"non-UTF-8 LSP body: {e}") from e
+
+
+def make_request(req_id: int, method: str, params: Any) -> dict:
+    """Build a JSON-RPC 2.0 request envelope."""
+    msg: dict = {"jsonrpc": "2.0", "id": req_id, "method": method}
+    if params is not None:
+        msg["params"] = params
+    return msg
+
+
+def make_notification(method: str, params: Any) -> dict:
+    """Build a JSON-RPC 2.0 notification envelope (no ``id``)."""
+    msg: dict = {"jsonrpc": "2.0", "method": method}
+    if params is not None:
+        msg["params"] = params
+    return msg
+
+
+def make_response(req_id: Any, result: Any) -> dict:
+    """Build a JSON-RPC 2.0 success response envelope."""
+    return {"jsonrpc": "2.0", "id": req_id, "result": result}
+
+
+def make_error_response(req_id: Any, code: int, message: str, data: Any = None) -> dict:
+    """Build a JSON-RPC 2.0 error response envelope."""
+    err: dict = {"code": code, "message": message}
+    if data is not None:
+        err["data"] = data
+    return {"jsonrpc": "2.0", "id": req_id, "error": err}
+
+
+def classify_message(msg: dict) -> Tuple[str, Any]:
+    """Return ``(kind, key)`` where kind is one of ``request``,
+    ``response``, ``notification``, ``invalid``.
+
+    The key is the request id for request/response, the method name
+    for notifications, and ``None`` for invalid messages.
+    """
+    if not isinstance(msg, dict):
+        return "invalid", None
+    if msg.get("jsonrpc") != "2.0":
+        return "invalid", None
+    has_id = "id" in msg
+    has_method = "method" in msg
+    if has_id and has_method:
+        return "request", msg["id"]
+    if has_id and ("result" in msg or "error" in msg):
+        return "response", msg["id"]
+    if has_method and not has_id:
+        return "notification", msg["method"]
+    return "invalid", None
+
+
+__all__ = [
+    "ERROR_CONTENT_MODIFIED",
+    "ERROR_REQUEST_CANCELLED",
+    "ERROR_METHOD_NOT_FOUND",
+    "LSPProtocolError",
+    "LSPRequestError",
+    "encode_message",
+    "read_message",
+    "make_request",
+    "make_notification",
+    "make_response",
+    "make_error_response",
+    "classify_message",
+]
--- a/agent/lsp/reporter.py
+++ b/agent/lsp/reporter.py
@ -0,0 +1,78 @@
+"""Format LSP diagnostics for inclusion in tool output.
+
+The model sees a compact, severity-filtered, line-bounded summary of
+diagnostics introduced by the latest edit.  Format matches what
+OpenCode's ``lsp/diagnostic.ts`` and Claude Code's
+``formatDiagnosticsSummary`` produce — ``<diagnostics>`` blocks with
+1-indexed line/column, capped at ``MAX_PER_FILE`` errors.
+"""
+from __future__ import annotations
+
+from typing import Any, Dict, List
+
+# Severity-1 only by default — warnings/info/hints would flood the
+# agent.  Lift this in config under ``lsp.severities`` if needed.
+SEVERITY_NAMES = {1: "ERROR", 2: "WARN", 3: "INFO", 4: "HINT"}
+DEFAULT_SEVERITIES = frozenset({1})  # ERROR only
+
+MAX_PER_FILE = 20
+MAX_TOTAL_CHARS = 4000
+
+
+def format_diagnostic(d: Dict[str, Any]) -> str:
+    """One-line representation of a single diagnostic."""
+    sev = SEVERITY_NAMES.get(d.get("severity") or 1, "ERROR")
+    rng = d.get("range") or {}
+    start = rng.get("start") or {}
+    line = int(start.get("line", 0)) + 1
+    col = int(start.get("character", 0)) + 1
+    msg = str(d.get("message") or "").rstrip()
+    code = d.get("code")
+    code_part = f" [{code}]" if code not in (None, "") else ""
+    source = d.get("source")
+    source_part = f" ({source})" if source else ""
+    return f"{sev} [{line}:{col}] {msg}{code_part}{source_part}"
+
+
+def report_for_file(
+    file_path: str,
+    diagnostics: List[Dict[str, Any]],
+    *,
+    severities: frozenset = DEFAULT_SEVERITIES,
+    max_per_file: int = MAX_PER_FILE,
+) -> str:
+    """Build a ``<diagnostics file=...>`` block for one file.
+
+    Returns an empty string when no diagnostics pass the severity
+    filter, so callers can do ``if block:`` to skip empty cases.
+    """
+    if not diagnostics:
+        return ""
+    filtered = [d for d in diagnostics if (d.get("severity") or 1) in severities]
+    if not filtered:
+        return ""
+    limited = filtered[:max_per_file]
+    extra = len(filtered) - len(limited)
+    lines = [format_diagnostic(d) for d in limited]
+    body = "\n".join(lines)
+    if extra > 0:
+        body += f"\n... and {extra} more"
+    return f"<diagnostics file=\"{file_path}\">\n{body}\n</diagnostics>"
+
+
+def truncate(s: str, *, limit: int = MAX_TOTAL_CHARS) -> str:
+    """Hard-cap a formatted summary string."""
+    if len(s) <= limit:
+        return s
+    marker = "\n…[truncated]"
+    return s[: limit - len(marker)] + marker
+
+
+__all__ = [
+    "SEVERITY_NAMES",
+    "DEFAULT_SEVERITIES",
+    "MAX_PER_FILE",
+    "format_diagnostic",
+    "report_for_file",
+    "truncate",
+]
--- a/agent/lsp/servers.py
+++ b/agent/lsp/servers.py
--- a/agent/lsp/workspace.py
+++ b/agent/lsp/workspace.py
@ -0,0 +1,223 @@
+"""Workspace and project-root resolution for LSP.
+
+Two concerns live here:
+
+1. **Workspace gate** — the upper-level "is this directory a project?"
+   check.  Hermes only runs LSP when the cwd (or the file being edited)
+   sits inside a git worktree.  Files outside any git root never
+   trigger LSP, even if a server is configured.  This keeps Telegram
+   gateway users on user-home cwd's from spawning daemons.
+
+2. **NearestRoot** — the per-server project-root walk.  Each language
+   server cares about a different marker (``pyproject.toml`` for
+   Python, ``Cargo.toml`` for Rust, ``go.mod`` for Go, etc.) and
+   wants the directory containing that marker.  ``nearest_root()``
+   walks up from a starting path looking for any of a list of marker
+   files, optionally bailing if an exclude marker shows up first.
+"""
+from __future__ import annotations
+
+import logging
+import os
+from pathlib import Path
+from typing import Iterable, Optional, Tuple
+
+logger = logging.getLogger("agent.lsp.workspace")
+
+# Cache: cwd → (worktree_root, is_git) so repeated calls don't re-stat.
+# Cleared on shutdown.  Keyed by absolute resolved path so symlink
+# folds collapse to one entry.
+_workspace_cache: dict = {}
+
+
+def normalize_path(path: str) -> str:
+    """Normalize a path for use as a stable map key.
+
+    Resolves ``~``, makes absolute, and collapses ``.``/``..``.  We do
+    NOT resolve symlinks here — symlink stability matters for some
+    LSP servers (rust-analyzer cares about Cargo workspace identity)
+    and we want the canonical path the user typed when possible.
+    """
+    return os.path.abspath(os.path.expanduser(path))
+
+
+def find_git_worktree(start: str) -> Optional[str]:
+    """Walk up from ``start`` looking for a ``.git`` entry (file or dir).
+
+    Returns the directory containing ``.git``, or ``None`` if no git
+    root is found before hitting the filesystem root.
+
+    A ``.git`` *file* (not directory) means we're inside a git
+    worktree set up via ``git worktree add`` — both forms count.
+    """
+    try:
+        start_path = Path(normalize_path(start))
+        if start_path.is_file():
+            start_path = start_path.parent
+    except (OSError, RuntimeError, ValueError):
+        # Pathological input (loop in symlinks, encoding error, etc.) —
+        # bail out rather than crash the lint hook.
+        return None
+
+    # Cache check
+    cached = _workspace_cache.get(str(start_path))
+    if cached is not None:
+        root, _is_git = cached
+        return root
+
+    cur = start_path
+    # Defensive cap: the deepest reasonable monorepo is well under 64
+    # levels.  Caps the walk so a pathological cwd or a symlink cycle
+    # we somehow traverse can't keep us looping.
+    for _ in range(64):
+        git_marker = cur / ".git"
+        try:
+            if git_marker.exists():
+                resolved = str(cur)
+                _workspace_cache[str(start_path)] = (resolved, True)
+                return resolved
+        except OSError:
+            # Permission error on a parent dir — bail out cleanly.
+            break
+        parent = cur.parent
+        if parent == cur:
+            break
+        cur = parent
+
+    _workspace_cache[str(start_path)] = (None, False)
+    return None
+
+
+def is_inside_workspace(path: str, workspace_root: str) -> bool:
+    """Return True iff ``path`` is inside (or equal to) ``workspace_root``.
+
+    Uses absolute paths but does not resolve symlinks — a file accessed
+    via a symlink that points outside the workspace still counts as
+    outside.  This is the conservative interpretation; matches LSP
+    behaviour where servers reject didOpen for unrelated files.
+    """
+    p = normalize_path(path)
+    root = normalize_path(workspace_root)
+    if p == root:
+        return True
+    # Use os.path.commonpath to handle case-insensitive filesystems
+    # correctly on macOS/Windows.
+    try:
+        common = os.path.commonpath([p, root])
+    except ValueError:
+        # Different drives on Windows.
+        return False
+    return common == root
+
+
+def nearest_root(
+    start: str,
+    markers: Iterable[str],
+    *,
+    excludes: Optional[Iterable[str]] = None,
+    ceiling: Optional[str] = None,
+) -> Optional[str]:
+    """Walk up from ``start`` looking for any of the given marker files.
+
+    Returns the **directory containing** the first matched marker, or
+    ``None`` if no marker is found before hitting ``ceiling`` (or the
+    filesystem root if no ceiling).
+
+    If ``excludes`` is provided and an exclude marker matches *first*
+    in the upward walk, returns ``None`` — the server is gated off
+    for that file.  Mirrors OpenCode's NearestRoot exclude semantics
+    (e.g. typescript skips deno projects when ``deno.json`` is found
+    before ``package.json``).
+    """
+    start_path = Path(normalize_path(start))
+    try:
+        if start_path.is_file():
+            start_path = start_path.parent
+    except (OSError, RuntimeError, ValueError):
+        return None
+    ceiling_path = Path(normalize_path(ceiling)) if ceiling else None
+
+    markers_list = list(markers)
+    excludes_list = list(excludes) if excludes else []
+
+    cur = start_path
+    # Defensive cap matching ``find_git_worktree``.  Bounded walk
+    # protects against pathological inputs even though the
+    # parent-equality stop normally terminates within ~10 steps.
+    for _ in range(64):
+        # Check excludes first — if an exclude is found at this level,
+        # the server is gated off for this file.
+        for exc in excludes_list:
+            try:
+                if (cur / exc).exists():
+                    return None
+            except OSError:
+                continue
+        # Then check markers.
+        for marker in markers_list:
+            try:
+                if (cur / marker).exists():
+                    return str(cur)
+            except OSError:
+                continue
+        # Stop conditions.
+        if ceiling_path is not None and cur == ceiling_path:
+            return None
+        parent = cur.parent
+        if parent == cur:
+            return None
+        cur = parent
+    return None
+
+
+def resolve_workspace_for_file(
+    file_path: str,
+    *,
+    cwd: Optional[str] = None,
+) -> Tuple[Optional[str], bool]:
+    """Resolve the workspace root for a file.
+
+    Returns ``(workspace_root, gated_in)`` where ``gated_in`` is True
+    iff LSP should run for this file at all.  Currently the gate is
+    "file is inside a git worktree found by walking up from cwd OR
+    from the file itself".
+
+    The cwd path takes precedence — if the agent was launched in a
+    git project, that worktree is the workspace, and any edit inside
+    it (regardless of where the file lives) is in-scope.  If the cwd
+    isn't in a git worktree, we try the file's own location as a
+    fallback.
+
+    Returns ``(None, False)`` when neither path is in a git worktree.
+    """
+    cwd = cwd or os.getcwd()
+    cwd_root = find_git_worktree(cwd)
+    if cwd_root is not None:
+        if is_inside_workspace(file_path, cwd_root):
+            return cwd_root, True
+        # File is outside the cwd's worktree — try the file's own
+        # location as a secondary anchor.  Useful for monorepos where
+        # the user opens an unrelated checkout.
+    file_root = find_git_worktree(file_path)
+    if file_root is not None:
+        return file_root, True
+    return None, False
+
+
+def clear_cache() -> None:
+    """Clear the workspace-resolution cache.
+
+    Called on service shutdown so a subsequent re-init doesn't pick
+    up stale results from a previous session.
+    """
+    _workspace_cache.clear()
+
+
+__all__ = [
+    "find_git_worktree",
+    "is_inside_workspace",
+    "nearest_root",
+    "normalize_path",
+    "resolve_workspace_for_file",
+    "clear_cache",
+]