hermes-agent/tests/agent/lsp/test_protocol.py
Teknium 83b93898c2
feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168)
* feat(lsp): semantic diagnostics from real language servers in write_file/patch

Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.

LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.

The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.

New module agent/lsp/ split into:

- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
  ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
  root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
  with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
  spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}

Wired into tools/file_operations.py:

- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
  syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged

Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).

Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.

Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.

Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.

* feat(lsp): structured logging, backend gate, defensive walk caps

Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.

agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.

agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.

tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.

agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.

agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.

Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
  silence (clean writes stay DEBUG), state-transition INFO-once
  semantics (active for, no project root), action-required
  WARNING-once (server unavailable), per-call WARNING (timeouts,
  spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
  _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
  the LSP layer for non-local backends and route correctly for
  LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
  exercising the 8 KiB cap.

Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
  ... reportReturnType (Pyright)" through the full path, then patch
  fix removes it on the next call.

* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field

Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:

1. atexit cleanup of spawned language servers
   ----------------------------------------------------------------
   ``agent/lsp/__init__.get_service`` now registers an ``atexit``
   handler on first creation that tears down the LSPService on
   Python exit.  Without this, every ``hermes chat`` exit was
   leaking pyright/gopls/etc. processes for a few seconds while
   their stdout buffers drained -- they got reaped by the kernel
   eventually but a watchful ``ps aux`` would catch them.

   The handler runs once per process (gated by
   ``_atexit_registered``); idempotent ``shutdown_service``
   ensures double-fire is a no-op.  Errors during shutdown are
   swallowed at debug level since by the time atexit fires the
   user has already seen the agent's final response.

2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
   ----------------------------------------------------------------
   Previously the LSP layer folded its diagnostic block into the
   ``lint.output`` string, conflating the syntax-check tier with
   the semantic tier.  The agent (and any downstream parsers) now
   read syntax errors and semantic errors as independent signals:

       {
         "bytes_written": 42,
         "lint": {"status": "ok", "output": ""},
         "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
       }

   ``_check_lint_delta`` returns to its original two-tier shape
   (syntax check + delta filter); ``write_file`` and
   ``patch_replace`` independently fetch LSP diagnostics via
   ``_maybe_lsp_diagnostics`` and pass them into the new field.
   ``patch_replace`` propagates the inner write_file's
   ``lsp_diagnostics`` so the outer PatchResult carries the patch's
   delta correctly.

Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
  fires once and only once across N get_service calls; the
  registered callable is our internal shutdown wrapper;
  shutdown_service is idempotent and safe when never started;
  exceptions during shutdown are swallowed; inactive service is
  cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
  / PatchResult dataclass shape, to_dict include/omit semantics,
  channel separation (lint and lsp_diagnostics carry independent
  signals), write_file populates the field via
  _maybe_lsp_diagnostics only when the syntax tier is clean,
  patch_replace propagates the field forward from its internal
  write_file.

Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
  error introduced -> lint clean (parses), lsp_diagnostics carries
  the pyright reportReturnType block; patch fix -> both fields
  clean again.

* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write

Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.

The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.

Fix:

- New ``_mark_broken_for_file`` helper at the service layer marks
  the (server_id, workspace_root) pair broken from the OUTSIDE when
  the outer timeout fires. Called from the except branches in
  ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
  + generic Exception). Also kills any orphan client process that
  survived the cancelled future, fire-and-forget with a 1s ceiling.

- ``enabled_for`` now consults the broken-set BEFORE returning True.
  Files in already-broken (server_id, root) pairs short-circuit to
  False, so the file_operations layer skips the LSP path entirely
  with no spawn cost. Until the service is restarted (``hermes lsp
  restart``) or the process exits.

- A single eventlog WARNING is emitted on first mark-broken so the
  user knows which server gave up. Subsequent edits in the same
  project stay silent.

Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.

Validation:

- Live retest of the wedged-binary scenario: 5 sequential writes,
  first 8.88s (the one snapshot timeout), subsequent four ~0.84s
  (no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
  patch fix all behave correctly with the new broken-set logic.

Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
2026-05-12 16:31:54 -07:00

197 lines
6.4 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

"""Tests for the LSP protocol framing layer.
The framer is small but load-bearing — Content-Length parsing is the
single most common reason for hand-rolled LSP clients to silently
deadlock. These tests exercise:
- exact wire format of outgoing messages (encode_message)
- partial-read tolerance + EOF handling (read_message)
- envelope helpers (request, response, notification, error)
- message classification
"""
from __future__ import annotations
import asyncio
import json
import pytest
from agent.lsp.protocol import (
ERROR_CONTENT_MODIFIED,
ERROR_METHOD_NOT_FOUND,
LSPProtocolError,
LSPRequestError,
classify_message,
encode_message,
make_error_response,
make_notification,
make_request,
make_response,
read_message,
)
# ---------------------------------------------------------------------------
# encode_message
# ---------------------------------------------------------------------------
def test_encode_message_uses_compact_separators_and_utf8():
msg = {"jsonrpc": "2.0", "id": 1, "method": "x", "params": {"k": "ä"}}
out = encode_message(msg)
# Header is plain ASCII Content-Length CRLF CRLF
header_end = out.index(b"\r\n\r\n") + 4
header = out[:header_end].decode("ascii")
body = out[header_end:]
assert "Content-Length:" in header
declared = int(header.split("Content-Length:")[1].split("\r\n")[0].strip())
# Declared length must equal actual body bytes.
assert declared == len(body)
# Body parses as JSON and round-trips.
parsed = json.loads(body.decode("utf-8"))
assert parsed == msg
# Body uses compact separators (no spaces between kv).
assert b'"id":1' in body
def test_encode_message_handles_unicode_in_strings():
msg = {"jsonrpc": "2.0", "method": "log", "params": {"text": "🚀 ünıcödé"}}
out = encode_message(msg)
header_end = out.index(b"\r\n\r\n") + 4
declared = int(out[: out.index(b"\r\n")].split(b": ")[1])
assert declared == len(out[header_end:])
assert json.loads(out[header_end:].decode("utf-8")) == msg
# ---------------------------------------------------------------------------
# read_message
# ---------------------------------------------------------------------------
async def _stream_from_bytes(data: bytes) -> asyncio.StreamReader:
"""Build an asyncio.StreamReader pre-populated with ``data``."""
reader = asyncio.StreamReader()
reader.feed_data(data)
reader.feed_eof()
return reader
@pytest.mark.asyncio
async def test_read_message_round_trip():
msg = {"jsonrpc": "2.0", "method": "ping"}
reader = await _stream_from_bytes(encode_message(msg))
parsed = await read_message(reader)
assert parsed == msg
@pytest.mark.asyncio
async def test_read_message_clean_eof_returns_none():
reader = await _stream_from_bytes(b"")
assert await read_message(reader) is None
@pytest.mark.asyncio
async def test_read_message_truncated_body_raises():
msg = encode_message({"jsonrpc": "2.0", "method": "x"})
truncated = msg[: -3] # cut the body
reader = await _stream_from_bytes(truncated)
with pytest.raises(LSPProtocolError):
await read_message(reader)
@pytest.mark.asyncio
async def test_read_message_missing_content_length_raises():
bad = b"X-Other: 5\r\n\r\n12345"
reader = await _stream_from_bytes(bad)
with pytest.raises(LSPProtocolError):
await read_message(reader)
@pytest.mark.asyncio
async def test_read_message_two_messages_back_to_back():
a = encode_message({"jsonrpc": "2.0", "method": "a"})
b = encode_message({"jsonrpc": "2.0", "method": "b"})
reader = await _stream_from_bytes(a + b)
assert (await read_message(reader))["method"] == "a"
assert (await read_message(reader))["method"] == "b"
@pytest.mark.asyncio
async def test_read_message_rejects_runaway_header():
"""A pathological server that streams headers without ever emitting
the CRLF-CRLF terminator must not loop forever — the 8 KiB cap kicks
in and surfaces a protocol error."""
flood = (b"X-Junk: " + b"A" * 200 + b"\r\n") * 60 # ~12 KiB worth
reader = await _stream_from_bytes(flood)
with pytest.raises(LSPProtocolError) as exc:
await read_message(reader)
assert "8 KiB" in str(exc.value)
# ---------------------------------------------------------------------------
# envelope helpers
# ---------------------------------------------------------------------------
def test_make_request_includes_id_and_method():
msg = make_request(7, "ping", {"v": 1})
assert msg == {"jsonrpc": "2.0", "id": 7, "method": "ping", "params": {"v": 1}}
def test_make_request_omits_params_when_none():
msg = make_request(7, "ping", None)
assert "params" not in msg
def test_make_notification_omits_id():
msg = make_notification("log", {"line": "hi"})
assert "id" not in msg
assert msg["method"] == "log"
def test_make_response_carries_result():
msg = make_response(7, {"ok": True})
assert msg["id"] == 7 and msg["result"] == {"ok": True}
def test_make_error_response_shape():
msg = make_error_response(7, ERROR_CONTENT_MODIFIED, "stale", {"hint": "retry"})
assert msg["error"]["code"] == ERROR_CONTENT_MODIFIED
assert msg["error"]["message"] == "stale"
assert msg["error"]["data"] == {"hint": "retry"}
# ---------------------------------------------------------------------------
# classify_message
# ---------------------------------------------------------------------------
def test_classify_message_request():
msg = {"jsonrpc": "2.0", "id": 1, "method": "x"}
assert classify_message(msg) == ("request", 1)
def test_classify_message_response():
msg = {"jsonrpc": "2.0", "id": 1, "result": None}
assert classify_message(msg) == ("response", 1)
def test_classify_message_notification():
msg = {"jsonrpc": "2.0", "method": "log"}
assert classify_message(msg) == ("notification", "log")
def test_classify_message_invalid():
assert classify_message({"id": 1})[0] == "invalid"
assert classify_message({"jsonrpc": "1.0", "method": "x"})[0] == "invalid"
# ---------------------------------------------------------------------------
# LSPRequestError
# ---------------------------------------------------------------------------
def test_lsp_request_error_carries_code_and_data():
e = LSPRequestError(ERROR_METHOD_NOT_FOUND, "no", {"x": 1})
assert e.code == ERROR_METHOD_NOT_FOUND
assert e.message == "no"
assert e.data == {"x": 1}