mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-18 04:41:56 +00:00

feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168 )

* feat(lsp): semantic diagnostics from real language servers in write_file/patch

Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.

LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.

The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.

New module agent/lsp/ split into:

- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
  ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
  root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
  with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
  spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}

Wired into tools/file_operations.py:

- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
  syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged

Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).

Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.

Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.

Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.

* feat(lsp): structured logging, backend gate, defensive walk caps

Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.

agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.

agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.

tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.

agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.

agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.

Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
  silence (clean writes stay DEBUG), state-transition INFO-once
  semantics (active for, no project root), action-required
  WARNING-once (server unavailable), per-call WARNING (timeouts,
  spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
  _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
  the LSP layer for non-local backends and route correctly for
  LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
  exercising the 8 KiB cap.

Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
  ... reportReturnType (Pyright)" through the full path, then patch
  fix removes it on the next call.

* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field

Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:

1. atexit cleanup of spawned language servers
   ----------------------------------------------------------------
   ``agent/lsp/__init__.get_service`` now registers an ``atexit``
   handler on first creation that tears down the LSPService on
   Python exit.  Without this, every ``hermes chat`` exit was
   leaking pyright/gopls/etc. processes for a few seconds while
   their stdout buffers drained -- they got reaped by the kernel
   eventually but a watchful ``ps aux`` would catch them.

   The handler runs once per process (gated by
   ``_atexit_registered``); idempotent ``shutdown_service``
   ensures double-fire is a no-op.  Errors during shutdown are
   swallowed at debug level since by the time atexit fires the
   user has already seen the agent's final response.

2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
   ----------------------------------------------------------------
   Previously the LSP layer folded its diagnostic block into the
   ``lint.output`` string, conflating the syntax-check tier with
   the semantic tier.  The agent (and any downstream parsers) now
   read syntax errors and semantic errors as independent signals:

       {
         "bytes_written": 42,
         "lint": {"status": "ok", "output": ""},
         "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
       }

   ``_check_lint_delta`` returns to its original two-tier shape
   (syntax check + delta filter); ``write_file`` and
   ``patch_replace`` independently fetch LSP diagnostics via
   ``_maybe_lsp_diagnostics`` and pass them into the new field.
   ``patch_replace`` propagates the inner write_file's
   ``lsp_diagnostics`` so the outer PatchResult carries the patch's
   delta correctly.

Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
  fires once and only once across N get_service calls; the
  registered callable is our internal shutdown wrapper;
  shutdown_service is idempotent and safe when never started;
  exceptions during shutdown are swallowed; inactive service is
  cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
  / PatchResult dataclass shape, to_dict include/omit semantics,
  channel separation (lint and lsp_diagnostics carry independent
  signals), write_file populates the field via
  _maybe_lsp_diagnostics only when the syntax tier is clean,
  patch_replace propagates the field forward from its internal
  write_file.

Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
  error introduced -> lint clean (parses), lsp_diagnostics carries
  the pyright reportReturnType block; patch fix -> both fields
  clean again.

* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write

Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.

The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.

Fix:

- New ``_mark_broken_for_file`` helper at the service layer marks
  the (server_id, workspace_root) pair broken from the OUTSIDE when
  the outer timeout fires. Called from the except branches in
  ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
  + generic Exception). Also kills any orphan client process that
  survived the cancelled future, fire-and-forget with a 1s ceiling.

- ``enabled_for`` now consults the broken-set BEFORE returning True.
  Files in already-broken (server_id, root) pairs short-circuit to
  False, so the file_operations layer skips the LSP path entirely
  with no spawn cost. Until the service is restarted (``hermes lsp
  restart``) or the process exits.

- A single eventlog WARNING is emitted on first mark-broken so the
  user knows which server gave up. Subsequent edits in the same
  project stay silent.

Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.

Validation:

- Live retest of the wedged-binary scenario: 5 sequential writes,
  first 8.88s (the one snapshot timeout), subsequent four ~0.84s
  (no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
  patch fix all behave correctly with the new broken-set logic.

Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.

2026-05-12 16:31:54 -07:00

8.4 KiB

Raw Blame History

sidebar_position	title	description
16	LSP — Semantic Diagnostics	Real language servers (pyright, gopls, rust-analyzer, …) wired into the post-write lint check used by write_file and patch.

Language Server Protocol (LSP)

Hermes runs full language servers — pyright, gopls, rust-analyzer, typescript-language-server, clangd, and ~20 more — as background subprocesses and feeds their semantic diagnostics into the post-write lint check used by write_file and patch. When the agent edits a file, it sees exactly the errors that edit introduced — not just syntax errors, but type errors, undefined names, missing imports, and project-wide semantic issues the language server detects.

This is the same architecture top-tier coding agents use. Hermes ships it self-contained: no editor host required, no plugins to install, no separate daemon to manage.

When LSP runs

LSP is gated on git workspace detection. When the agent's working directory (or the file being edited) is inside a git worktree, LSP runs against that workspace. When neither is in a git repo, LSP stays dormant — useful for messaging gateways where the cwd is the user's home directory and there's no project to diagnose.

The check is layered: in-process syntax check first (microseconds), then LSP diagnostics second when syntax is clean. A flaky or missing language server can never break a write — every LSP failure path falls back silently to the syntax-only result.

Concretely, on every successful write_file or patch:

Hermes captures a baseline of current diagnostics for the file.
Performs the write.
Re-queries the language server, filters out diagnostics that were already in the baseline, and surfaces only the new ones.

The agent sees output like:

{
  "bytes_written": 42,
  "dirs_created": false,
  "lint": {"status": "ok", "output": ""},
  "lsp_diagnostics": "LSP diagnostics introduced by this edit:\n<diagnostics file=\"/path/to/foo.py\">\nERROR [42:5] Cannot find name 'foo' [reportUndefinedVariable] (Pyright)\nERROR [50:1] Argument of type \"str\" is not assignable to \"int\" [reportArgumentType] (Pyright)\n</diagnostics>"
}

The lint field carries the syntax-check result (microsecond in-process parse via ast.parse, json.loads, etc.); the lsp_diagnostics field carries the semantic diagnostics from the real language server. Two channels, independent signals — the agent sees a syntax-clean file with semantic problems as lint: ok plus a populated lsp_diagnostics.

Supported languages

Language	Server	Auto-install
Python	`pyright-langserver`	npm
TypeScript / JavaScript / JSX / TSX	`typescript-language-server`	npm
Vue	`@vue/language-server`	npm
Svelte	`svelte-language-server`	npm
Astro	`@astrojs/language-server`	npm
Go	`gopls`	`go install`
Rust	`rust-analyzer`	manual (rustup)
C / C++	`clangd`	manual (LLVM)
Bash / Zsh	`bash-language-server`	npm
YAML	`yaml-language-server`	npm
Lua	`lua-language-server`	manual (GitHub releases)
PHP	`intelephense`	npm
OCaml	`ocaml-lsp`	manual (opam)
Dockerfile	`dockerfile-language-server-nodejs`	npm
Terraform	`terraform-ls`	manual
Dart	`dart language-server`	manual (dart sdk)
Haskell	`haskell-language-server`	manual (ghcup)
Julia	`julia` + LanguageServer.jl	manual
Clojure	`clojure-lsp`	manual
Nix	`nixd`	manual
Zig	`zls`	manual
Gleam	`gleam lsp`	manual (gleam install)
Elixir	`elixir-ls`	manual
Prisma	`prisma language-server`	manual
Kotlin	`kotlin-language-server`	manual
Java	`jdtls`	manual

For "manual" entries, install the server through whatever toolchain manager makes sense for that language (rustup, ghcup, opam, brew, …). Hermes auto-detects the binary on PATH or in <HERMES_HOME>/lsp/bin/.

CLI

hermes lsp status          # service state + per-server install status
hermes lsp list            # registry, optionally --installed-only
hermes lsp install <id>    # eagerly install one server
hermes lsp install-all     # try every server with a known recipe
hermes lsp restart         # tear down running clients
hermes lsp which <id>      # print resolved binary path

hermes lsp status is the best starting point — it shows which languages will get semantic diagnostics today and which need a binary installed.

Configuration

The defaults work for typical setups; nothing to set if the binaries are on PATH.

# config.yaml
lsp:
  # Master toggle. Disabling skips the entire subsystem — no servers
  # spawn, no background event loop runs.
  enabled: true

  # How long to wait for diagnostics after each write.
  wait_mode: document      # "document" or "full"
  wait_timeout: 5.0

  # How to handle missing server binaries.
  #   auto    — install via npm/pip/go install into <HERMES_HOME>/lsp/bin
  #   manual  — only use binaries already on PATH
  install_strategy: auto

  # Per-server overrides (all optional).
  servers:
    pyright:
      disabled: false
      command: ["/abs/path/to/pyright-langserver", "--stdio"]
      env: { PYRIGHT_LOG_LEVEL: "info" }
      initialization_options:
        python:
          analysis:
            typeCheckingMode: "strict"
    typescript:
      disabled: true       # skip TS even when its extensions match

Per-server keys

disabled: true — skip this server entirely even when its extensions match a file.
command: [bin, ...args] — pin a custom binary path. Bypasses auto-install.
env: {KEY: value} — extra env vars passed to the spawned process.
initialization_options: {...} — merged into the LSP initializationOptions payload sent in the initialize handshake. Server-specific; consult the language server's docs.

Installation locations

When install_strategy: auto, Hermes installs binaries into <HERMES_HOME>/lsp/bin/. NPM packages land in <HERMES_HOME>/lsp/node_modules/ with bin symlinks one level up. Go binaries come from go install with GOBIN pointed at the staging dir.

Nothing is ever installed to /usr/local/, ~/.local/, or any other shared location — the staging dir is fully Hermes-owned and is removed when you reset the profile.

Performance characteristics

LSP servers are lazy-spawned on first use. Editing a Python file in a project that's never seen .py traffic spawns pyright; the spawn takes 1-3 seconds for most servers (rust-analyzer can take 10+ on a cold project). Subsequent edits in the same workspace re-use the running server.

The LSP layer adds a few milliseconds to clean writes when no diagnostics are emitted. When diagnostics are emitted, the wait budget is wait_timeout seconds — typically the server responds in tens of milliseconds for pyright/tsserver and a few seconds for rust-analyzer mid-indexing.

Servers are kept alive for the life of the Hermes process. There's no idle-timeout reaper — the cost of restarting the server's index on every write would be far higher than holding the daemon.

Disabling

Set lsp.enabled: false in config.yaml to disable the entire subsystem. The post-write check falls back to the in-process syntax check (ast.parse for Python, json.loads for JSON, etc.) which ships unchanged from earlier versions.

To disable a single language without disabling the whole layer:

lsp:
  servers:
    rust-analyzer:
      disabled: true

Troubleshooting

hermes lsp status shows a server as "missing"

The binary isn't on PATH and isn't in <HERMES_HOME>/lsp/bin/. Run hermes lsp install <server_id> to attempt an auto-install, or install the binary manually through the language's normal toolchain.

Server starts but never returns diagnostics

Check ~/.hermes/logs/agent.log for [agent.lsp.client] entries — both stderr from the language server and protocol errors land there. Some servers (rust-analyzer especially) need to finish a project-wide index before they emit per-file diagnostics; the first edit after server start may complete with no diagnostics, with subsequent edits picking them up.

Server crashed

A crashed server is added to the broken-set and won't be retried for the rest of the session. Run hermes lsp restart to clear the set; the next edit re-spawns.

Editing a file outside any git repo

By design, LSP only runs inside git worktrees. Run git init in the project, or accept the in-process syntax-only fallback.

8.4 KiB Raw Blame History