hermes-agent/tests/agent
Teknium 83b93898c2
feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168)
* feat(lsp): semantic diagnostics from real language servers in write_file/patch

Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server,
clangd, bash-language-server, ...) into the post-write lint check used by write_file
and patch. The model now sees type errors, undefined names, missing imports, and
project-wide semantic issues introduced by its edits, not just syntax errors.

LSP is gated on git workspace detection: when the agent's cwd or the file being
edited is inside a git worktree, LSP runs against that workspace; otherwise the
existing in-process syntax checks are the only tier. This keeps users on
user-home cwds (Telegram/Discord gateway chats) from spawning daemons.

The post-write check is layered: in-process syntax check first (microseconds),
then LSP semantic diagnostics second when syntax is clean. Diagnostics are
delta-filtered against a baseline captured at write start, so the agent only
sees errors its edit introduced. A flaky/missing language server can never
break a write -- every LSP failure path falls back silently to the syntax-only
result.

New module agent/lsp/ split into:

- protocol.py: Content-Length JSON-RPC framer + envelope helpers
- client.py: async LSPClient (spawn, initialize, didOpen/didChange,
  ContentModified retry, push/pull diagnostic stores)
- workspace.py: git worktree walk-up + per-server NearestRoot resolver
- servers.py: registry of 26 language servers (extension match,
  root resolver, spawn builder per language)
- install.py: auto-install dispatch (npm install --prefix, go install
  with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/
- manager.py: LSPService (per-(server_id, root) client registry, lazy
  spawn, broken-set, in-flight dedupe, sync facade for tools layer)
- reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file)
- cli.py: hermes lsp {status,list,install,install-all,restart,which}

Wired into tools/file_operations.py:

- write_file/patch_replace now call _snapshot_lsp_baseline before write
- _check_lint_delta gains a third tier: LSP semantic diagnostics when
  syntax is clean
- All LSP code paths swallow exceptions; write_file's contract unchanged

Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true),
wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server
overrides (disabled, command, env, initialization_options).

Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and
read_message round-trip, EOF/truncation/missing Content-Length), workspace
gate (git walk-up, exclude markers, fallback to file location), reporter
(severity filter, max-per-file cap, truncation), service-level delta filter,
and an in-process mock LSP server that exercises the full client lifecycle
including didChange version bumps, dedup, crash recovery, and idempotent
teardown.

Live E2E verified end-to-end through ShellFileOperations: pyright
auto-installed via npm into HERMES_HOME, baseline captured, type error
introduced, single delta diagnostic surfaced with correct line/column/code/
source, then patch fix removes the diagnostic from the output.

Docs: new website/docs/user-guide/features/lsp.md page covering supported
languages, configuration knobs, performance characteristics, and
troubleshooting; cli-commands.md updated with the 'hermes lsp' reference;
sidebar updated.

* feat(lsp): structured logging, backend gate, defensive walk caps

Cherry-picks the substantive ideas from #24155 (different scope, same
problem space) onto our PR.

agent/lsp/eventlog.py (new): dedicated structured logger
``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets
keep a 1000-write session at exactly ONE INFO line ("active for
<root>") at the default INFO threshold; clean writes log at DEBUG so
they never reach agent.log under normal config. State transitions
(server starts, no project root for a file, server unavailable) fire
at INFO/WARNING once per (server_id, key); novel events (timeouts,
unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``.

agent/lsp/manager.py: wire the eventlog into _get_or_spawn and
get_diagnostics_sync so users can answer "did LSP fire on this edit?"
with a single grep, plus surface "binary not on PATH" warnings once
instead of silently retrying every write.

tools/file_operations.py: backend-type gate. ``_lsp_local_only()``
returns False for non-local backends (Docker / Modal / SSH /
Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics``
now skip entirely on remote envs. The host-side language server
can't see files inside a sandbox, so this prevents pretending to
lint a file the host process can't open.

agent/lsp/protocol.py: 8 KiB cap on the header block in
``read_message``. A pathological server that streams headers
without ever emitting CRLF-CRLF would have looped forever consuming
bytes; now raises ``LSPProtocolError`` instead.

agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and
``nearest_root`` upward walks, plus try/except containment around
``Path(...).resolve()`` and child ``.exists()`` calls. Defensive
against pathological inputs (symlink loops, encoding errors,
permission failures mid-walk) — the lint hook is hot-path code and
must never raise.

Tests:
- tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state
  silence (clean writes stay DEBUG), state-transition INFO-once
  semantics (active for, no project root), action-required
  WARNING-once (server unavailable), per-call WARNING (timeouts,
  spawn failures), and the "1000 clean writes => 1 INFO" contract.
- tests/agent/lsp/test_backend_gate.py: 5 tests verifying
  _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip
  the LSP layer for non-local backends and route correctly for
  LocalEnvironment.
- tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header
  exercising the 8 KiB cap.

Validation:
- 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap)
- 198/198 pass when run alongside existing file_operations tests
- Live E2E re-run with pyright still surfaces "ERROR [2:12] Type
  ... reportReturnType (Pyright)" through the full path, then patch
  fix removes it on the next call.

* feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field

Two improvements salvaged from #24414's plugin-form alternative,
keeping our core-integrated design:

1. atexit cleanup of spawned language servers
   ----------------------------------------------------------------
   ``agent/lsp/__init__.get_service`` now registers an ``atexit``
   handler on first creation that tears down the LSPService on
   Python exit.  Without this, every ``hermes chat`` exit was
   leaking pyright/gopls/etc. processes for a few seconds while
   their stdout buffers drained -- they got reaped by the kernel
   eventually but a watchful ``ps aux`` would catch them.

   The handler runs once per process (gated by
   ``_atexit_registered``); idempotent ``shutdown_service``
   ensures double-fire is a no-op.  Errors during shutdown are
   swallowed at debug level since by the time atexit fires the
   user has already seen the agent's final response.

2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult
   ----------------------------------------------------------------
   Previously the LSP layer folded its diagnostic block into the
   ``lint.output`` string, conflating the syntax-check tier with
   the semantic tier.  The agent (and any downstream parsers) now
   read syntax errors and semantic errors as independent signals:

       {
         "bytes_written": 42,
         "lint": {"status": "ok", "output": ""},
         "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..."
       }

   ``_check_lint_delta`` returns to its original two-tier shape
   (syntax check + delta filter); ``write_file`` and
   ``patch_replace`` independently fetch LSP diagnostics via
   ``_maybe_lsp_diagnostics`` and pass them into the new field.
   ``patch_replace`` propagates the inner write_file's
   ``lsp_diagnostics`` so the outer PatchResult carries the patch's
   delta correctly.

Tests: 19 new
- tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration
  fires once and only once across N get_service calls; the
  registered callable is our internal shutdown wrapper;
  shutdown_service is idempotent and safe when never started;
  exceptions during shutdown are swallowed; inactive service is
  cached so we don't rebuild on every check.
- tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult
  / PatchResult dataclass shape, to_dict include/omit semantics,
  channel separation (lint and lsp_diagnostics carry independent
  signals), write_file populates the field via
  _maybe_lsp_diagnostics only when the syntax tier is clean,
  patch_replace propagates the field forward from its internal
  write_file.

Validation:
- 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field)
- 217/217 pass with file_operations + LSP combined
- Live E2E reverified: clean writes -> both fields empty/none; type
  error introduced -> lint clean (parses), lsp_diagnostics carries
  the pyright reportReturnType block; patch fix -> both fields
  clean again.

* fix(lsp): broken-set short-circuit so a wedged server isn't paid every write

Discovered while auditing failure paths: a language server binary that
hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY
subsequent write to re-pay the 8s snapshot_baseline timeout. Five
writes = ~64s of dead time.

The bug: ``_get_or_spawn`` adds the (server_id, root) pair to
``_broken`` inside its inner exception handler, but when the OUTER
``_loop.run`` timeout fires, it cancels the inner task before that
handler runs. The pair never makes it to broken-set, so the next
write re-enters the spawn path and re-pays the timeout.

Fix:

- New ``_mark_broken_for_file`` helper at the service layer marks
  the (server_id, workspace_root) pair broken from the OUTSIDE when
  the outer timeout fires. Called from the except branches in
  ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError
  + generic Exception). Also kills any orphan client process that
  survived the cancelled future, fire-and-forget with a 1s ceiling.

- ``enabled_for`` now consults the broken-set BEFORE returning True.
  Files in already-broken (server_id, root) pairs short-circuit to
  False, so the file_operations layer skips the LSP path entirely
  with no spawn cost. Until the service is restarted (``hermes lsp
  restart``) or the process exits.

- A single eventlog WARNING is emitted on first mark-broken so the
  user knows which server gave up. Subsequent edits in the same
  project stay silent.

Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the
key shape (server_id, per_server_root), enabled_for short-circuit,
sibling-file skip in same project, project isolation (broken in
A doesn't affect B), graceful no-op for missing-server / no-workspace,
and an end-to-end test that snapshots after a failure and verifies
the next ``enabled_for`` returns False.

Validation:

- Live retest of the wedged-binary scenario: 5 sequential writes,
  first 8.88s (the one snapshot timeout), subsequent four ~0.84s
  (no LSP cost). Down from 5x12.85s = 64s before this fix.
- 99/99 LSP tests pass (92 prior + 7 broken-set)
- 224/224 pass with file_operations + LSP combined
- Happy path E2E reverified — clean write, type error introduced,
  patch fix all behave correctly with the new broken-set logic.

Note: the FIRST write to a wedged binary still pays 8s (the
snapshot_baseline timeout). We could shorten that, but pyright/
tsserver normally take 2-3s and slow CI rust-analyzer can need
5+ seconds, so 8s is the conservative ceiling. Subsequent writes
are instant.
2026-05-12 16:31:54 -07:00
..
lsp feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168) 2026-05-12 16:31:54 -07:00
transports fix(xai): omit reasoning.effort for grok models that reject it (#23435) 2026-05-10 15:21:30 -07:00
__init__.py test: add unit tests for 8 modules (batch 2) 2026-02-26 13:54:20 +03:00
test_anthropic_adapter.py fix: avoid unsupported anthropic context beta by default 2026-05-07 05:43:20 -07:00
test_anthropic_keychain.py fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain 2026-04-24 07:14:00 -07:00
test_arcee_trinity_overrides.py test(arcee): cover Trinity Large Thinking temperature + compression overrides 2026-05-05 17:23:45 -07:00
test_auxiliary_client.py fix(dashboard): UI polish — modals, layout, consistency, test fixes 2026-05-12 13:59:22 -04:00
test_auxiliary_client_anthropic_custom.py fix(anthropic): complete third-party Anthropic-compatible provider support (#12846) 2026-04-19 22:43:09 -07:00
test_auxiliary_config_bridge.py fix(tests): pin UTF-8 encoding when reading source files on Windows 2026-05-09 02:47:28 -07:00
test_auxiliary_main_first.py fix(copilot): send vision header for Copilot vision requests 2026-04-27 08:35:50 -07:00
test_auxiliary_named_custom_providers.py fix(fallback): let custom_providers shadow built-in aliases 2026-04-30 20:18:44 -07:00
test_auxiliary_transport_autodetect.py fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027) 2026-04-28 06:50:14 -07:00
test_bedrock_1m_context.py test: remove 50 stale/broken tests to unblock CI (#22098) 2026-05-08 14:55:40 -07:00
test_bedrock_adapter.py fix(bedrock): preserve reasoningContent across converse normalization 2026-05-07 05:17:16 -07:00
test_bedrock_integration.py fix(agent): handle aws_sdk auth type in resolve_provider_client 2026-04-24 07:26:07 -07:00
test_codex_cloudflare_headers.py fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765) 2026-04-29 23:23:50 -07:00
test_compress_focus.py fix: resolve CI test failures — add missing functions, fix stale tests (#9483) 2026-04-14 01:43:45 -07:00
test_compressor_image_tokens.py feat(image-input): native multimodal routing based on model vision capability (#16506) 2026-04-27 06:27:59 -07:00
test_context_compressor.py fix(context_compressor): treat streaming premature-close as transient error 2026-05-09 17:52:51 -07:00
test_context_compressor_summary_continuity.py fix(compression): preserve iterative summary continuity 2026-05-05 04:42:44 -07:00
test_context_engine.py feat: wire context engine plugin slot into agent and plugin system 2026-04-10 19:15:50 -07:00
test_context_references.py fix(agent): fall back when rg is blocked for @folder references 2026-04-20 01:56:41 -07:00
test_copilot_acp_client.py fix(ci): recover 38 failing tests on main (#17642) 2026-04-29 20:05:32 -07:00
test_credential_pool.py fix(auth): shorten credential 401 cooldown 2026-05-07 06:15:33 -07:00
test_credential_pool_routing.py refactor: remove smart_model_routing feature (#12732) 2026-04-19 18:12:55 -07:00
test_crossloop_client_cache.py refactor(tests): re-architect tests + fix CI failures (#5946) 2026-04-07 17:19:07 -07:00
test_curator.py fix(skills): keep manual skills out of curator 2026-05-04 02:19:28 -07:00
test_curator_activity.py fix: use skill activity in curator status 2026-04-30 10:31:47 -07:00
test_curator_backup.py fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731) 2026-05-02 01:29:57 -07:00
test_curator_classification.py feat(curator): hint at hermes curator pin in the rename block (#23212) 2026-05-10 06:44:53 -07:00
test_curator_reports.py fix(curator): rewrite cron job skill refs after consolidation (#18253) 2026-04-30 23:04:50 -07:00
test_deepseek_anthropic_thinking.py test(anthropic): regression guard for DeepSeek /anthropic thinking replay 2026-04-29 08:10:29 -07:00
test_direct_provider_url_detection.py fix: restrict provider URL detection to exact hostname matches 2026-04-20 22:14:29 -07:00
test_display.py fix(cli): honor positive tool preview length 2026-05-07 05:26:28 -07:00
test_display_emoji.py feat(tools): centralize tool emoji metadata in registry + skin integration 2026-03-15 20:21:21 -07:00
test_error_classifier.py fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664) 2026-05-09 17:54:07 -07:00
test_external_skills.py feat(skills): support external skill directories via config (#3678) 2026-03-29 00:33:30 -07:00
test_external_skills_dirs_cache.py perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138) 2026-05-08 16:39:32 -07:00
test_gemini_cloudcode.py fix(gemini): assign unique stream indices to parallel tool calls 2026-04-20 02:10:53 -07:00
test_gemini_fast_fallback.py Prefer fallback for Gemini CloudCode rate limits 2026-05-05 10:14:48 -07:00
test_gemini_free_tier_gate.py feat(gemini): block free-tier keys at setup + surface guidance on 429 (#15100) 2026-04-24 04:46:17 -07:00
test_gemini_native_adapter.py fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133) 2026-04-24 05:35:17 -07:00
test_gemini_schema.py fix(gemini): drop integer/number/boolean enums from tool schemas (#15082) 2026-04-24 03:40:00 -07:00
test_i18n.py feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914) 2026-05-10 07:14:14 -07:00
test_image_gen_registry.py feat(plugins): pluggable image_gen backends + OpenAI provider (#13799) 2026-04-21 21:30:10 -07:00
test_image_routing.py fix(image-routing): sniff magic bytes for image MIME, ignore misleading suffix 2026-05-07 05:58:11 -07:00
test_insights.py test: stop testing mutable data — convert change-detectors to invariants (#13363) 2026-04-20 23:20:33 -07:00
test_kimi_coding_anthropic_thinking.py fix(anthropic): broaden Kimi thinking-suppression to custom endpoints (#17455) 2026-04-29 06:35:42 -07:00
test_local_stream_timeout.py fix(agent): recognize Tailscale CGNAT (100.64.0.0/10) as local for Ollama timeouts 2026-04-22 14:46:10 -07:00
test_markdown_tables.py fix(cli): vertical fallback for markdown tables wider than terminal (#23948) 2026-05-11 16:49:13 -07:00
test_memory_provider.py fix(memory): add write origin metadata 2026-04-24 14:37:55 -07:00
test_memory_session_switch.py feat(hindsight): probe API for update_mode='append' support, dedupe across processes 2026-05-05 15:09:59 -07:00
test_memory_user_id.py feat(hindsight): richer session-scoped retain metadata 2026-04-22 05:27:10 -07:00
test_minimax_auxiliary_url.py fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983) 2026-04-07 22:23:28 -07:00
test_minimax_provider.py feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path 2026-05-05 13:40:01 -07:00
test_model_metadata.py Use nous portal as model metadata authority (#24502) 2026-05-12 11:59:31 -07:00
test_model_metadata_local_ctx.py fix(tui): show correct context length 2026-04-28 12:27:36 -07:00
test_model_metadata_ssl.py fix(auth): honor SSL CA env vars across httpx + requests callsites 2026-04-24 03:00:33 -07:00
test_models_dev.py perf(models_dev): cache-first lookup, skip network when disk cache is fresh (#22808) 2026-05-09 13:32:38 -07:00
test_moonshot_schema.py fix(moonshot): also strip nullable/enum after anyOf collapse 2026-04-30 23:14:31 -07:00
test_nous_rate_guard.py fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898) 2026-04-26 04:53:42 -07:00
test_onboarding.py docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507) 2026-04-29 08:08:36 -07:00
test_openrouter_response_cache.py fix(openrouter): use canonical X-Title attribution header 2026-05-05 10:13:34 -07:00
test_plugin_llm.py feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194) 2026-05-10 07:09:28 -07:00
test_prompt_builder.py fix(webui): add platform hint for MEDIA rendering 2026-05-09 02:22:40 -07:00
test_prompt_caching.py feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828) 2026-05-11 11:14:56 -07:00
test_prompt_caching_live.py feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828) 2026-05-11 11:14:56 -07:00
test_proxy_and_url_validation.py fix(agent): normalize socks:// env proxies for httpx/anthropic 2026-04-21 05:52:46 -07:00
test_rate_limit_tracker.py feat: capture provider rate limit headers and show in /usage (#6541) 2026-04-09 03:43:14 -07:00
test_redact.py feat: replace kimi-k2.5 with kimi-k2.6 on OpenRouter and Nous Portal (#13148) 2026-04-20 11:49:54 -07:00
test_shell_hooks.py feat: shell hooks — wire shell scripts as Hermes hook callbacks 2026-04-20 20:53:51 -07:00
test_shell_hooks_consent.py fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322) 2026-04-26 20:48:35 -07:00
test_skill_commands.py test(skills): cover additional rescan paths in skill_commands cache (#14536) 2026-05-07 04:59:43 -07:00
test_skill_commands_reload.py refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool 2026-04-29 21:07:47 -07:00
test_skill_utils.py test(skill_utils): add regression tests for non-dict metadata in extract_skill_conditions 2026-04-30 20:37:15 -07:00
test_streaming_context_scrubber.py style: trim verbose comment blocks added by previous commit 2026-04-27 12:37:33 -07:00
test_subagent_progress.py feat(delegate): orchestrator role and configurable spawn depth (default flat) 2026-04-21 14:23:45 -07:00
test_subagent_stop_hook.py feat: shell hooks — wire shell scripts as Hermes hook callbacks 2026-04-20 20:53:51 -07:00
test_subdirectory_hints.py fix(agent): catch PermissionError in subdirectory hint discovery 2026-04-09 03:10:30 -07:00
test_think_scrubber.py fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924) (#20184) 2026-05-05 04:33:38 -07:00
test_title_generator.py fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
test_tool_guardrails.py fix(agent): make tool loop guardrails warning-first 2026-04-30 20:43:15 -07:00
test_unsupported_parameter_retry.py test: remove 50 stale/broken tests to unblock CI (#22098) 2026-05-08 14:55:40 -07:00
test_unsupported_temperature_retry.py refactor(memory): remove flush_memories entirely (#15696) 2026-04-25 08:21:14 -07:00
test_usage_pricing.py fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies 2026-04-22 17:40:49 -07:00
test_vision_resolved_args.py fix(vision): preserve explicit provider auth with custom base_url 2026-05-04 05:05:43 -07:00