hermes-agent/tests/agent/lsp/test_broken_set.py
Teknium 83b93898c2
feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168)
* feat(lsp): semantic diagnostics from real language servers in write_file/patch

Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.

LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.

The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.

New module agent/lsp/ split into:

- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
  ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
  root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
  with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
  spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}

Wired into tools/file_operations.py:

- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
  syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged

Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).

Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.

Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.

Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.

* feat(lsp): structured logging, backend gate, defensive walk caps

Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.

agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.

agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.

tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.

agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.

agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.

Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
  silence (clean writes stay DEBUG), state-transition INFO-once
  semantics (active for, no project root), action-required
  WARNING-once (server unavailable), per-call WARNING (timeouts,
  spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
  _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
  the LSP layer for non-local backends and route correctly for
  LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
  exercising the 8 KiB cap.

Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
  ... reportReturnType (Pyright)" through the full path, then patch
  fix removes it on the next call.

* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field

Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:

1. atexit cleanup of spawned language servers
   ----------------------------------------------------------------
   ``agent/lsp/__init__.get_service`` now registers an ``atexit``
   handler on first creation that tears down the LSPService on
   Python exit.  Without this, every ``hermes chat`` exit was
   leaking pyright/gopls/etc. processes for a few seconds while
   their stdout buffers drained -- they got reaped by the kernel
   eventually but a watchful ``ps aux`` would catch them.

   The handler runs once per process (gated by
   ``_atexit_registered``); idempotent ``shutdown_service``
   ensures double-fire is a no-op.  Errors during shutdown are
   swallowed at debug level since by the time atexit fires the
   user has already seen the agent's final response.

2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
   ----------------------------------------------------------------
   Previously the LSP layer folded its diagnostic block into the
   ``lint.output`` string, conflating the syntax-check tier with
   the semantic tier.  The agent (and any downstream parsers) now
   read syntax errors and semantic errors as independent signals:

       {
         "bytes_written": 42,
         "lint": {"status": "ok", "output": ""},
         "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
       }

   ``_check_lint_delta`` returns to its original two-tier shape
   (syntax check + delta filter); ``write_file`` and
   ``patch_replace`` independently fetch LSP diagnostics via
   ``_maybe_lsp_diagnostics`` and pass them into the new field.
   ``patch_replace`` propagates the inner write_file's
   ``lsp_diagnostics`` so the outer PatchResult carries the patch's
   delta correctly.

Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
  fires once and only once across N get_service calls; the
  registered callable is our internal shutdown wrapper;
  shutdown_service is idempotent and safe when never started;
  exceptions during shutdown are swallowed; inactive service is
  cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
  / PatchResult dataclass shape, to_dict include/omit semantics,
  channel separation (lint and lsp_diagnostics carry independent
  signals), write_file populates the field via
  _maybe_lsp_diagnostics only when the syntax tier is clean,
  patch_replace propagates the field forward from its internal
  write_file.

Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
  error introduced -> lint clean (parses), lsp_diagnostics carries
  the pyright reportReturnType block; patch fix -> both fields
  clean again.

* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write

Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.

The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.

Fix:

- New ``_mark_broken_for_file`` helper at the service layer marks
  the (server_id, workspace_root) pair broken from the OUTSIDE when
  the outer timeout fires. Called from the except branches in
  ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
  + generic Exception). Also kills any orphan client process that
  survived the cancelled future, fire-and-forget with a 1s ceiling.

- ``enabled_for`` now consults the broken-set BEFORE returning True.
  Files in already-broken (server_id, root) pairs short-circuit to
  False, so the file_operations layer skips the LSP path entirely
  with no spawn cost. Until the service is restarted (``hermes lsp
  restart``) or the process exits.

- A single eventlog WARNING is emitted on first mark-broken so the
  user knows which server gave up. Subsequent edits in the same
  project stay silent.

Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.

Validation:

- Live retest of the wedged-binary scenario: 5 sequential writes,
  first 8.88s (the one snapshot timeout), subsequent four ~0.84s
  (no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
  patch fix all behave correctly with the new broken-set logic.

Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
2026-05-12 16:31:54 -07:00

213 lines
6.7 KiB
Python

"""Tests for the broken-set short-circuit added to handle outer-timeout failures.
When ``snapshot_baseline`` or ``get_diagnostics_sync`` time out from the
service layer (because a language server hangs during initialize, or
the binary is wedged), the inner spawn task is cancelled — but the
inner exception handler that adds to ``_broken`` never runs. Without
the service-layer fallback added in this module, every subsequent
edit re-pays the full timeout cost until the process exits.
This module verifies:
- ``_mark_broken_for_file`` adds the right key
- ``enabled_for`` short-circuits on broken keys
- a missing binary is broken-set'd after one snapshot attempt
"""
from __future__ import annotations
import os
import sys
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
from agent.lsp.manager import LSPService
from agent.lsp.servers import SERVERS, ServerContext, ServerDef, SpawnSpec
from agent.lsp.workspace import clear_cache
@pytest.fixture(autouse=True)
def _clear_workspace_cache():
clear_cache()
yield
clear_cache()
def _make_git_workspace(tmp_path: Path) -> Path:
"""Build a minimal git repo with a pyproject so pyright's root resolver fires."""
repo = tmp_path / "repo"
repo.mkdir()
(repo / ".git").mkdir()
(repo / "pyproject.toml").write_text("[project]\nname='t'\n")
return repo
def test_mark_broken_for_file_adds_correct_key(tmp_path, monkeypatch):
"""``_mark_broken_for_file`` keys the broken-set on
(server_id, per_server_root) so subsequent ``enabled_for`` calls
for files in the same project skip immediately."""
repo = _make_git_workspace(tmp_path)
monkeypatch.chdir(str(repo))
src = repo / "x.py"
src.write_text("")
svc = LSPService(
enabled=True,
wait_mode="document",
wait_timeout=2.0,
install_strategy="manual",
)
try:
svc._mark_broken_for_file(str(src), RuntimeError("simulated"))
# The pyright server resolves to the repo root via pyproject.toml.
assert ("pyright", str(repo)) in svc._broken
finally:
svc.shutdown()
def test_enabled_for_returns_false_after_broken(tmp_path, monkeypatch):
"""Once a (server_id, root) pair is in the broken-set,
``enabled_for`` returns False so the file_operations layer skips
the LSP path entirely."""
repo = _make_git_workspace(tmp_path)
monkeypatch.chdir(str(repo))
src = repo / "x.py"
src.write_text("")
svc = LSPService(
enabled=True,
wait_mode="document",
wait_timeout=2.0,
install_strategy="manual",
)
try:
# Initially enabled.
assert svc.enabled_for(str(src)) is True
# Mark broken.
svc._mark_broken_for_file(str(src), RuntimeError("simulated"))
# Now disabled — the broken-set short-circuits.
assert svc.enabled_for(str(src)) is False
finally:
svc.shutdown()
def test_enabled_for_other_file_in_same_project_also_skipped(tmp_path, monkeypatch):
"""The broken key is (server_id, root), so ALL files routed through
the same server in the same project are skipped — not just the one
that triggered the failure."""
repo = _make_git_workspace(tmp_path)
monkeypatch.chdir(str(repo))
a = repo / "a.py"
a.write_text("")
b = repo / "b.py"
b.write_text("")
svc = LSPService(
enabled=True,
wait_mode="document",
wait_timeout=2.0,
install_strategy="manual",
)
try:
svc._mark_broken_for_file(str(a), RuntimeError("simulated"))
# Both files in the same project skip pyright now.
assert svc.enabled_for(str(a)) is False
assert svc.enabled_for(str(b)) is False
finally:
svc.shutdown()
def test_unrelated_project_not_affected_by_broken(tmp_path, monkeypatch):
"""Marking pyright broken for project A must NOT affect project B."""
repo_a = _make_git_workspace(tmp_path)
repo_b = tmp_path / "repo-b"
repo_b.mkdir()
(repo_b / ".git").mkdir()
(repo_b / "pyproject.toml").write_text("[project]\nname='b'\n")
a_src = repo_a / "x.py"
a_src.write_text("")
b_src = repo_b / "x.py"
b_src.write_text("")
monkeypatch.chdir(str(repo_a))
svc = LSPService(
enabled=True,
wait_mode="document",
wait_timeout=2.0,
install_strategy="manual",
)
try:
svc._mark_broken_for_file(str(a_src), RuntimeError("simulated"))
# Project A skipped.
assert svc.enabled_for(str(a_src)) is False
# Project B still enabled — the broken key is per-project.
monkeypatch.chdir(str(repo_b))
assert svc.enabled_for(str(b_src)) is True
finally:
svc.shutdown()
def test_mark_broken_handles_missing_server_silently(tmp_path):
"""If the file extension doesn't match any registered server,
``_mark_broken_for_file`` no-ops — nothing to mark."""
svc = LSPService(
enabled=True,
wait_mode="document",
wait_timeout=2.0,
install_strategy="manual",
)
try:
# No registered server for .xyz; must not raise.
svc._mark_broken_for_file(str(tmp_path / "weird.xyz"), RuntimeError("x"))
assert len(svc._broken) == 0
finally:
svc.shutdown()
def test_mark_broken_handles_no_workspace_silently(tmp_path):
"""File outside any git worktree → no workspace → no key to add."""
src = tmp_path / "orphan.py"
src.write_text("")
svc = LSPService(
enabled=True,
wait_mode="document",
wait_timeout=2.0,
install_strategy="manual",
)
try:
svc._mark_broken_for_file(str(src), RuntimeError("x"))
assert len(svc._broken) == 0
finally:
svc.shutdown()
def test_snapshot_failure_marks_broken_via_outer_timeout(tmp_path, monkeypatch):
"""End-to-end: ``snapshot_baseline``'s outer ``_loop.run`` timeout
triggers ``_mark_broken_for_file``, so a second call to
``enabled_for`` returns False."""
repo = _make_git_workspace(tmp_path)
monkeypatch.chdir(str(repo))
src = repo / "x.py"
src.write_text("")
svc = LSPService(
enabled=True,
wait_mode="document",
wait_timeout=2.0,
install_strategy="manual",
)
try:
# Force the inner snapshot coroutine to raise.
async def boom(_path):
raise RuntimeError("outer-timeout simulated")
with patch.object(svc, "_snapshot_async", boom):
assert svc.enabled_for(str(src)) is True
svc.snapshot_baseline(str(src))
# After the failure, the file's pair is in the broken-set and
# ``enabled_for`` skips it.
assert ("pyright", str(repo)) in svc._broken
assert svc.enabled_for(str(src)) is False
finally:
svc.shutdown()