mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-18 04:41:56 +00:00
* feat(lsp): semantic diagnostics from real language servers in write_file/patch
Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.
LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.
The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.
New module agent/lsp/ split into:
- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}
Wired into tools/file_operations.py:
- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged
Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).
Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.
Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.
Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.
* feat(lsp): structured logging, backend gate, defensive walk caps
Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.
agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.
agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.
tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.
agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.
agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.
Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
silence (clean writes stay DEBUG), state-transition INFO-once
semantics (active for, no project root), action-required
WARNING-once (server unavailable), per-call WARNING (timeouts,
spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
_lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
the LSP layer for non-local backends and route correctly for
LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
exercising the 8 KiB cap.
Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
... reportReturnType (Pyright)" through the full path, then patch
fix removes it on the next call.
* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field
Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:
1. atexit cleanup of spawned language servers
----------------------------------------------------------------
``agent/lsp/__init__.get_service`` now registers an ``atexit``
handler on first creation that tears down the LSPService on
Python exit. Without this, every ``hermes chat`` exit was
leaking pyright/gopls/etc. processes for a few seconds while
their stdout buffers drained -- they got reaped by the kernel
eventually but a watchful ``ps aux`` would catch them.
The handler runs once per process (gated by
``_atexit_registered``); idempotent ``shutdown_service``
ensures double-fire is a no-op. Errors during shutdown are
swallowed at debug level since by the time atexit fires the
user has already seen the agent's final response.
2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
----------------------------------------------------------------
Previously the LSP layer folded its diagnostic block into the
``lint.output`` string, conflating the syntax-check tier with
the semantic tier. The agent (and any downstream parsers) now
read syntax errors and semantic errors as independent signals:
{
"bytes_written": 42,
"lint": {"status": "ok", "output": ""},
"lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
}
``_check_lint_delta`` returns to its original two-tier shape
(syntax check + delta filter); ``write_file`` and
``patch_replace`` independently fetch LSP diagnostics via
``_maybe_lsp_diagnostics`` and pass them into the new field.
``patch_replace`` propagates the inner write_file's
``lsp_diagnostics`` so the outer PatchResult carries the patch's
delta correctly.
Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
fires once and only once across N get_service calls; the
registered callable is our internal shutdown wrapper;
shutdown_service is idempotent and safe when never started;
exceptions during shutdown are swallowed; inactive service is
cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
/ PatchResult dataclass shape, to_dict include/omit semantics,
channel separation (lint and lsp_diagnostics carry independent
signals), write_file populates the field via
_maybe_lsp_diagnostics only when the syntax tier is clean,
patch_replace propagates the field forward from its internal
write_file.
Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
error introduced -> lint clean (parses), lsp_diagnostics carries
the pyright reportReturnType block; patch fix -> both fields
clean again.
* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write
Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.
The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.
Fix:
- New ``_mark_broken_for_file`` helper at the service layer marks
the (server_id, workspace_root) pair broken from the OUTSIDE when
the outer timeout fires. Called from the except branches in
``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
+ generic Exception). Also kills any orphan client process that
survived the cancelled future, fire-and-forget with a 1s ceiling.
- ``enabled_for`` now consults the broken-set BEFORE returning True.
Files in already-broken (server_id, root) pairs short-circuit to
False, so the file_operations layer skips the LSP path entirely
with no spawn cost. Until the service is restarted (``hermes lsp
restart``) or the process exits.
- A single eventlog WARNING is emitted on first mark-broken so the
user knows which server gave up. Subsequent edits in the same
project stay silent.
Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.
Validation:
- Live retest of the wedged-binary scenario: 5 sequential writes,
first 8.88s (the one snapshot timeout), subsequent four ~0.84s
(no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
patch fix all behave correctly with the new broken-set logic.
Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
213 lines
7.6 KiB
Python
213 lines
7.6 KiB
Python
"""Structured logging with steady-state silence for the LSP layer.
|
|
|
|
The LSP layer fires on every write_file/patch. In a busy session
|
|
that's hundreds of events. We want users to be able to ``rg`` the
|
|
log for "did LSP fire on that edit?" without drowning in noise.
|
|
|
|
The level model:
|
|
|
|
- ``DEBUG`` for steady-state events that have no novel signal:
|
|
``clean``, ``feature off``, ``extension not mapped``, ``no project
|
|
root for already-announced file``, ``server unavailable for
|
|
already-announced binary``. These never reach ``agent.log`` at the
|
|
default INFO threshold.
|
|
|
|
- ``INFO`` for state transitions worth surfacing exactly once per
|
|
session: ``active for <root>`` the first time a (server_id,
|
|
workspace_root) client starts, ``no project root for <path>``
|
|
the first time we see that file. Plus every diagnostic event
|
|
(those are inherently rare and per-edit, exactly what users grep
|
|
for).
|
|
|
|
- ``WARNING`` for action-required failures: ``server unavailable``
|
|
(binary not on PATH) the first time per (server_id, binary),
|
|
``no server configured`` once per language. Per-call WARNING for
|
|
timeouts and unexpected bridge exceptions.
|
|
|
|
The dedup is in-process module-level sets. Each set grows at most by
|
|
the number of distinct (server_id, root) and (server_id, binary)
|
|
pairs touched in one Python process — bytes of memory in even an
|
|
aggressive monorepo session. Bounded LRU was rejected: evicting an
|
|
entry would risk re-firing the WARNING/INFO line we explicitly want
|
|
to suppress.
|
|
|
|
Grep recipe::
|
|
|
|
tail -f ~/.hermes/logs/agent.log | rg 'lsp\\['
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
import os
|
|
import threading
|
|
from typing import Tuple
|
|
|
|
# Dedicated logger name so the documented grep recipe survives a
|
|
# ``logging.getLogger(__name__)`` rename of any internal module.
|
|
event_log = logging.getLogger("hermes.lint.lsp")
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Once-per-X dedup sets
|
|
# ---------------------------------------------------------------------------
|
|
|
|
_announce_lock = threading.Lock()
|
|
_announced_active: set = set() # keys: (server_id, workspace_root)
|
|
_announced_unavailable: set = set() # keys: (server_id, binary_path_or_name)
|
|
_announced_no_root: set = set() # keys: (server_id, file_path)
|
|
_announced_no_server: set = set() # keys: (server_id,)
|
|
|
|
|
|
def _short_path(file_path: str) -> str:
|
|
"""Render *file_path* relative to the cwd when sensible, else absolute.
|
|
|
|
Keeps log lines readable for the common case (the user is inside
|
|
the project they're editing) without emitting brittle ``../../..``
|
|
chains for the cross-tree case.
|
|
"""
|
|
if not file_path:
|
|
return file_path
|
|
try:
|
|
rel = os.path.relpath(file_path)
|
|
except ValueError:
|
|
return file_path
|
|
if rel.startswith(".." + os.sep) or rel == "..":
|
|
return file_path
|
|
return rel
|
|
|
|
|
|
def _emit(server_id: str, level: int, message: str) -> None:
|
|
event_log.log(level, "lsp[%s] %s", server_id, message)
|
|
|
|
|
|
def _announce_once(bucket: set, key: Tuple) -> bool:
|
|
"""Return True if *key* has not been announced for *bucket* yet.
|
|
|
|
Atomically marks the key as announced so concurrent callers
|
|
cannot both win the race and double-log.
|
|
"""
|
|
with _announce_lock:
|
|
if key in bucket:
|
|
return False
|
|
bucket.add(key)
|
|
return True
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Public event helpers — call these from the LSP layer.
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def log_clean(server_id: str, file_path: str) -> None:
|
|
"""No diagnostics emitted for *file_path*. DEBUG (silent at default)."""
|
|
_emit(server_id, logging.DEBUG, f"clean ({_short_path(file_path)})")
|
|
|
|
|
|
def log_disabled(server_id: str, file_path: str, reason: str) -> None:
|
|
"""LSP intentionally skipped for this file (feature off, ext unmapped,
|
|
backend not local, etc.). DEBUG."""
|
|
_emit(server_id, logging.DEBUG, f"skipped: {reason} ({_short_path(file_path)})")
|
|
|
|
|
|
def log_active(server_id: str, workspace_root: str) -> None:
|
|
"""A new LSP client started for (server_id, workspace_root).
|
|
|
|
INFO once per (server_id, workspace_root); DEBUG thereafter.
|
|
Lets users verify "is LSP actually running?" with a single grep.
|
|
"""
|
|
key = (server_id, workspace_root)
|
|
if _announce_once(_announced_active, key):
|
|
_emit(server_id, logging.INFO, f"active for {workspace_root}")
|
|
else:
|
|
_emit(server_id, logging.DEBUG, f"reused client for {workspace_root}")
|
|
|
|
|
|
def log_diagnostics(server_id: str, file_path: str, count: int) -> None:
|
|
"""Diagnostics arrived for a file. INFO every time — these are the
|
|
failure signals users actually want to grep for, and they are
|
|
inherently rare per edit."""
|
|
_emit(server_id, logging.INFO, f"{count} diags ({_short_path(file_path)})")
|
|
|
|
|
|
def log_no_project_root(server_id: str, file_path: str) -> None:
|
|
"""File had no recognised project marker. INFO once per file,
|
|
DEBUG thereafter."""
|
|
key = (server_id, file_path)
|
|
if _announce_once(_announced_no_root, key):
|
|
_emit(server_id, logging.INFO, f"no project root for {_short_path(file_path)}")
|
|
else:
|
|
_emit(server_id, logging.DEBUG, f"no project root for {_short_path(file_path)}")
|
|
|
|
|
|
def log_server_unavailable(server_id: str, binary_or_pkg: str) -> None:
|
|
"""The server binary couldn't be resolved. WARNING once per
|
|
(server_id, binary), DEBUG thereafter so a hundred subsequent
|
|
.py edits don't spam the log."""
|
|
key = (server_id, binary_or_pkg)
|
|
if _announce_once(_announced_unavailable, key):
|
|
_emit(
|
|
server_id,
|
|
logging.WARNING,
|
|
f"server unavailable: {binary_or_pkg} not found "
|
|
"(install via `hermes lsp install <id>` or set lsp.servers.<id>.command)",
|
|
)
|
|
else:
|
|
_emit(server_id, logging.DEBUG, f"server still unavailable: {binary_or_pkg}")
|
|
|
|
|
|
def log_no_server_configured(server_id: str) -> None:
|
|
"""No spawn recipe for this language. WARNING once."""
|
|
if _announce_once(_announced_no_server, (server_id,)):
|
|
_emit(server_id, logging.WARNING, "no server configured")
|
|
|
|
|
|
def log_timeout(server_id: str, file_path: str, kind: str = "diagnostics") -> None:
|
|
"""A request to the server timed out. WARNING every time — these are
|
|
inherently novel events worth surfacing on each occurrence."""
|
|
_emit(
|
|
server_id,
|
|
logging.WARNING,
|
|
f"{kind} timed out for {_short_path(file_path)}",
|
|
)
|
|
|
|
|
|
def log_server_error(server_id: str, file_path: str, exc: BaseException) -> None:
|
|
"""An unexpected exception bubbled out of the LSP layer. WARNING."""
|
|
_emit(
|
|
server_id,
|
|
logging.WARNING,
|
|
f"unexpected error for {_short_path(file_path)}: {type(exc).__name__}: {exc}",
|
|
)
|
|
|
|
|
|
def log_spawn_failed(server_id: str, workspace_root: str, exc: BaseException) -> None:
|
|
"""The LSP server failed to spawn or initialize. WARNING."""
|
|
_emit(
|
|
server_id,
|
|
logging.WARNING,
|
|
f"spawn/initialize failed for {workspace_root}: {type(exc).__name__}: {exc}",
|
|
)
|
|
|
|
|
|
def reset_announce_caches() -> None:
|
|
"""Test-only: clear the dedup caches. Production code never calls this."""
|
|
with _announce_lock:
|
|
_announced_active.clear()
|
|
_announced_unavailable.clear()
|
|
_announced_no_root.clear()
|
|
_announced_no_server.clear()
|
|
|
|
|
|
__all__ = [
|
|
"event_log",
|
|
"log_clean",
|
|
"log_disabled",
|
|
"log_active",
|
|
"log_diagnostics",
|
|
"log_no_project_root",
|
|
"log_server_unavailable",
|
|
"log_no_server_configured",
|
|
"log_timeout",
|
|
"log_server_error",
|
|
"log_spawn_failed",
|
|
"reset_announce_caches",
|
|
]
|