hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-18 04:41:56 +00:00

History

Teknium 83b93898c2 feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168 ) * feat(lsp): semantic diagnostics from real language servers in write_file/patch Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server, clangd, bash-language-server, ...) into the post-write lint check used by write_file and patch. The model now sees type errors, undefined names, missing imports, and project-wide semantic issues introduced by its edits, not just syntax errors. LSP is gated on git workspace detection: when the agent's cwd or the file being edited is inside a git worktree, LSP runs against that workspace; otherwise the existing in-process syntax checks are the only tier. This keeps users on user-home cwds (Telegram/Discord gateway chats) from spawning daemons. The post-write check is layered: in-process syntax check first (microseconds), then LSP semantic diagnostics second when syntax is clean. Diagnostics are delta-filtered against a baseline captured at write start, so the agent only sees errors its edit introduced. A flaky/missing language server can never break a write -- every LSP failure path falls back silently to the syntax-only result. New module agent/lsp/ split into: - protocol.py: Content-Length JSON-RPC framer + envelope helpers - client.py: async LSPClient (spawn, initialize, didOpen/didChange, ContentModified retry, push/pull diagnostic stores) - workspace.py: git worktree walk-up + per-server NearestRoot resolver - servers.py: registry of 26 language servers (extension match, root resolver, spawn builder per language) - install.py: auto-install dispatch (npm install --prefix, go install with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/ - manager.py: LSPService (per-(server_id, root) client registry, lazy spawn, broken-set, in-flight dedupe, sync facade for tools layer) - reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file) - cli.py: hermes lsp {status,list,install,install-all,restart,which} Wired into tools/file_operations.py: - write_file/patch_replace now call _snapshot_lsp_baseline before write - _check_lint_delta gains a third tier: LSP semantic diagnostics when syntax is clean - All LSP code paths swallow exceptions; write_file's contract unchanged Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true), wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server overrides (disabled, command, env, initialization_options). Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and read_message round-trip, EOF/truncation/missing Content-Length), workspace gate (git walk-up, exclude markers, fallback to file location), reporter (severity filter, max-per-file cap, truncation), service-level delta filter, and an in-process mock LSP server that exercises the full client lifecycle including didChange version bumps, dedup, crash recovery, and idempotent teardown. Live E2E verified end-to-end through ShellFileOperations: pyright auto-installed via npm into HERMES_HOME, baseline captured, type error introduced, single delta diagnostic surfaced with correct line/column/code/ source, then patch fix removes the diagnostic from the output. Docs: new website/docs/user-guide/features/lsp.md page covering supported languages, configuration knobs, performance characteristics, and troubleshooting; cli-commands.md updated with the 'hermes lsp' reference; sidebar updated. * feat(lsp): structured logging, backend gate, defensive walk caps Cherry-picks the substantive ideas from #24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call. * feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field Two improvements salvaged from #24414's plugin-form alternative, keeping our core-integrated design: 1. atexit cleanup of spawned language servers ---------------------------------------------------------------- ``agent/lsp/__init__.get_service`` now registers an ``atexit`` handler on first creation that tears down the LSPService on Python exit. Without this, every ``hermes chat`` exit was leaking pyright/gopls/etc. processes for a few seconds while their stdout buffers drained -- they got reaped by the kernel eventually but a watchful ``ps aux`` would catch them. The handler runs once per process (gated by ``_atexit_registered``); idempotent ``shutdown_service`` ensures double-fire is a no-op. Errors during shutdown are swallowed at debug level since by the time atexit fires the user has already seen the agent's final response. 2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult ---------------------------------------------------------------- Previously the LSP layer folded its diagnostic block into the ``lint.output`` string, conflating the syntax-check tier with the semantic tier. The agent (and any downstream parsers) now read syntax errors and semantic errors as independent signals: { "bytes_written": 42, "lint": {"status": "ok", "output": ""}, "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..." } ``_check_lint_delta`` returns to its original two-tier shape (syntax check + delta filter); ``write_file`` and ``patch_replace`` independently fetch LSP diagnostics via ``_maybe_lsp_diagnostics`` and pass them into the new field. ``patch_replace`` propagates the inner write_file's ``lsp_diagnostics`` so the outer PatchResult carries the patch's delta correctly. Tests: 19 new - tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration fires once and only once across N get_service calls; the registered callable is our internal shutdown wrapper; shutdown_service is idempotent and safe when never started; exceptions during shutdown are swallowed; inactive service is cached so we don't rebuild on every check. - tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult / PatchResult dataclass shape, to_dict include/omit semantics, channel separation (lint and lsp_diagnostics carry independent signals), write_file populates the field via _maybe_lsp_diagnostics only when the syntax tier is clean, patch_replace propagates the field forward from its internal write_file. Validation: - 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field) - 217/217 pass with file_operations + LSP combined - Live E2E reverified: clean writes -> both fields empty/none; type error introduced -> lint clean (parses), lsp_diagnostics carries the pyright reportReturnType block; patch fix -> both fields clean again. * fix(lsp): broken-set short-circuit so a wedged server isn't paid every write Discovered while auditing failure paths: a language server binary that hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY subsequent write to re-pay the 8s snapshot_baseline timeout. Five writes = ~64s of dead time. The bug: ``_get_or_spawn`` adds the (server_id, root) pair to ``_broken`` inside its inner exception handler, but when the OUTER ``_loop.run`` timeout fires, it cancels the inner task before that handler runs. The pair never makes it to broken-set, so the next write re-enters the spawn path and re-pays the timeout. Fix: - New ``_mark_broken_for_file`` helper at the service layer marks the (server_id, workspace_root) pair broken from the OUTSIDE when the outer timeout fires. Called from the except branches in ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError + generic Exception). Also kills any orphan client process that survived the cancelled future, fire-and-forget with a 1s ceiling. - ``enabled_for`` now consults the broken-set BEFORE returning True. Files in already-broken (server_id, root) pairs short-circuit to False, so the file_operations layer skips the LSP path entirely with no spawn cost. Until the service is restarted (``hermes lsp restart``) or the process exits. - A single eventlog WARNING is emitted on first mark-broken so the user knows which server gave up. Subsequent edits in the same project stay silent. Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the key shape (server_id, per_server_root), enabled_for short-circuit, sibling-file skip in same project, project isolation (broken in A doesn't affect B), graceful no-op for missing-server / no-workspace, and an end-to-end test that snapshots after a failure and verifies the next ``enabled_for`` returns False. Validation: - Live retest of the wedged-binary scenario: 5 sequential writes, first 8.88s (the one snapshot timeout), subsequent four ~0.84s (no LSP cost). Down from 5x12.85s = 64s before this fix. - 99/99 LSP tests pass (92 prior + 7 broken-set) - 224/224 pass with file_operations + LSP combined - Happy path E2E reverified — clean write, type error introduced, patch fix all behave correctly with the new broken-set logic. Note: the FIRST write to a wedged binary still pays 8s (the snapshot_baseline timeout). We could shorten that, but pyright/ tsserver normally take 2-3s and slow CI rust-analyzer can need 5+ seconds, so 8s is the conservative ceiling. Subsequent writes are instant.		2026-05-12 16:31:54 -07:00
..
lsp	feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168 )	2026-05-12 16:31:54 -07:00
transports	fix(xai): omit reasoning.effort for grok models that reject it (#23435 )	2026-05-10 15:21:30 -07:00
__init__.py	test: add unit tests for 8 modules (batch 2)	2026-02-26 13:54:20 +03:00
test_anthropic_adapter.py	fix: avoid unsupported anthropic context beta by default	2026-05-07 05:43:20 -07:00
test_anthropic_keychain.py	fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain	2026-04-24 07:14:00 -07:00
test_arcee_trinity_overrides.py	test(arcee): cover Trinity Large Thinking temperature + compression overrides	2026-05-05 17:23:45 -07:00
test_auxiliary_client.py	fix(dashboard): UI polish — modals, layout, consistency, test fixes	2026-05-12 13:59:22 -04:00
test_auxiliary_client_anthropic_custom.py	fix(anthropic): complete third-party Anthropic-compatible provider support (#12846 )	2026-04-19 22:43:09 -07:00
test_auxiliary_config_bridge.py	fix(tests): pin UTF-8 encoding when reading source files on Windows	2026-05-09 02:47:28 -07:00
test_auxiliary_main_first.py	fix(copilot): send vision header for Copilot vision requests	2026-04-27 08:35:50 -07:00
test_auxiliary_named_custom_providers.py	fix(fallback): let custom_providers shadow built-in aliases	2026-04-30 20:18:44 -07:00
test_auxiliary_transport_autodetect.py	fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027 )	2026-04-28 06:50:14 -07:00
test_bedrock_1m_context.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_bedrock_adapter.py	fix(bedrock): preserve reasoningContent across converse normalization	2026-05-07 05:17:16 -07:00
test_bedrock_integration.py	fix(agent): handle aws_sdk auth type in resolve_provider_client	2026-04-24 07:26:07 -07:00
test_codex_cloudflare_headers.py	fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765 )	2026-04-29 23:23:50 -07:00
test_compress_focus.py	fix: resolve CI test failures — add missing functions, fix stale tests (#9483 )	2026-04-14 01:43:45 -07:00
test_compressor_image_tokens.py	feat(image-input): native multimodal routing based on model vision capability (#16506 )	2026-04-27 06:27:59 -07:00
test_context_compressor.py	fix(context_compressor): treat streaming premature-close as transient error	2026-05-09 17:52:51 -07:00
test_context_compressor_summary_continuity.py	fix(compression): preserve iterative summary continuity	2026-05-05 04:42:44 -07:00
test_context_engine.py	feat: wire context engine plugin slot into agent and plugin system	2026-04-10 19:15:50 -07:00
test_context_references.py	fix(agent): fall back when rg is blocked for @folder references	2026-04-20 01:56:41 -07:00
test_copilot_acp_client.py	fix(ci): recover 38 failing tests on main (#17642 )	2026-04-29 20:05:32 -07:00
test_credential_pool.py	fix(auth): shorten credential 401 cooldown	2026-05-07 06:15:33 -07:00
test_credential_pool_routing.py	refactor: remove smart_model_routing feature (#12732 )	2026-04-19 18:12:55 -07:00
test_crossloop_client_cache.py	refactor(tests): re-architect tests + fix CI failures (#5946 )	2026-04-07 17:19:07 -07:00
test_curator.py	fix(skills): keep manual skills out of curator	2026-05-04 02:19:28 -07:00
test_curator_activity.py	fix: use skill activity in curator status	2026-04-30 10:31:47 -07:00
test_curator_backup.py	fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671 ) (#18731 )	2026-05-02 01:29:57 -07:00
test_curator_classification.py	feat(curator): hint at `hermes curator pin` in the rename block (#23212 )	2026-05-10 06:44:53 -07:00
test_curator_reports.py	fix(curator): rewrite cron job skill refs after consolidation (#18253 )	2026-04-30 23:04:50 -07:00
test_deepseek_anthropic_thinking.py	test(anthropic): regression guard for DeepSeek /anthropic thinking replay	2026-04-29 08:10:29 -07:00
test_direct_provider_url_detection.py	fix: restrict provider URL detection to exact hostname matches	2026-04-20 22:14:29 -07:00
test_display.py	fix(cli): honor positive tool preview length	2026-05-07 05:26:28 -07:00
test_display_emoji.py	feat(tools): centralize tool emoji metadata in registry + skin integration	2026-03-15 20:21:21 -07:00
test_error_classifier.py	fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664 )	2026-05-09 17:54:07 -07:00
test_external_skills.py	feat(skills): support external skill directories via config (#3678 )	2026-03-29 00:33:30 -07:00
test_external_skills_dirs_cache.py	perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138 )	2026-05-08 16:39:32 -07:00
test_gemini_cloudcode.py	fix(gemini): assign unique stream indices to parallel tool calls	2026-04-20 02:10:53 -07:00
test_gemini_fast_fallback.py	Prefer fallback for Gemini CloudCode rate limits	2026-05-05 10:14:48 -07:00
test_gemini_free_tier_gate.py	feat(gemini): block free-tier keys at setup + surface guidance on 429 (#15100 )	2026-04-24 04:46:17 -07:00
test_gemini_native_adapter.py	fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133 )	2026-04-24 05:35:17 -07:00
test_gemini_schema.py	fix(gemini): drop integer/number/boolean enums from tool schemas (#15082 )	2026-04-24 03:40:00 -07:00
test_i18n.py	feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914 )	2026-05-10 07:14:14 -07:00
test_image_gen_registry.py	feat(plugins): pluggable image_gen backends + OpenAI provider (#13799 )	2026-04-21 21:30:10 -07:00
test_image_routing.py	fix(image-routing): sniff magic bytes for image MIME, ignore misleading suffix	2026-05-07 05:58:11 -07:00
test_insights.py	test: stop testing mutable data — convert change-detectors to invariants (#13363 )	2026-04-20 23:20:33 -07:00
test_kimi_coding_anthropic_thinking.py	fix(anthropic): broaden Kimi thinking-suppression to custom endpoints (#17455 )	2026-04-29 06:35:42 -07:00
test_local_stream_timeout.py	fix(agent): recognize Tailscale CGNAT (100.64.0.0/10) as local for Ollama timeouts	2026-04-22 14:46:10 -07:00
test_markdown_tables.py	fix(cli): vertical fallback for markdown tables wider than terminal (#23948 )	2026-05-11 16:49:13 -07:00
test_memory_provider.py	fix(memory): add write origin metadata	2026-04-24 14:37:55 -07:00
test_memory_session_switch.py	feat(hindsight): probe API for update_mode='append' support, dedupe across processes	2026-05-05 15:09:59 -07:00
test_memory_user_id.py	feat(hindsight): richer session-scoped retain metadata	2026-04-22 05:27:10 -07:00
test_minimax_auxiliary_url.py	fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983 )	2026-04-07 22:23:28 -07:00
test_minimax_provider.py	feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path	2026-05-05 13:40:01 -07:00
test_model_metadata.py	Use nous portal as model metadata authority (#24502 )	2026-05-12 11:59:31 -07:00
test_model_metadata_local_ctx.py	fix(tui): show correct context length	2026-04-28 12:27:36 -07:00
test_model_metadata_ssl.py	fix(auth): honor SSL CA env vars across httpx + requests callsites	2026-04-24 03:00:33 -07:00
test_models_dev.py	perf(models_dev): cache-first lookup, skip network when disk cache is fresh (#22808 )	2026-05-09 13:32:38 -07:00
test_moonshot_schema.py	fix(moonshot): also strip nullable/enum after anyOf collapse	2026-04-30 23:14:31 -07:00
test_nous_rate_guard.py	fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898 )	2026-04-26 04:53:42 -07:00
test_onboarding.py	docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507 )	2026-04-29 08:08:36 -07:00
test_openrouter_response_cache.py	fix(openrouter): use canonical X-Title attribution header	2026-05-05 10:13:34 -07:00
test_plugin_llm.py	feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194 )	2026-05-10 07:09:28 -07:00
test_prompt_builder.py	fix(webui): add platform hint for MEDIA rendering	2026-05-09 02:22:40 -07:00
test_prompt_caching.py	feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828 )	2026-05-11 11:14:56 -07:00
test_prompt_caching_live.py	feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828 )	2026-05-11 11:14:56 -07:00
test_proxy_and_url_validation.py	fix(agent): normalize socks:// env proxies for httpx/anthropic	2026-04-21 05:52:46 -07:00
test_rate_limit_tracker.py	feat: capture provider rate limit headers and show in /usage (#6541 )	2026-04-09 03:43:14 -07:00
test_redact.py	feat: replace kimi-k2.5 with kimi-k2.6 on OpenRouter and Nous Portal (#13148 )	2026-04-20 11:49:54 -07:00
test_shell_hooks.py	feat: shell hooks — wire shell scripts as Hermes hook callbacks	2026-04-20 20:53:51 -07:00
test_shell_hooks_consent.py	fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322 )	2026-04-26 20:48:35 -07:00
test_skill_commands.py	test(skills): cover additional rescan paths in skill_commands cache (#14536 )	2026-05-07 04:59:43 -07:00
test_skill_commands_reload.py	refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool	2026-04-29 21:07:47 -07:00
test_skill_utils.py	test(skill_utils): add regression tests for non-dict metadata in extract_skill_conditions	2026-04-30 20:37:15 -07:00
test_streaming_context_scrubber.py	style: trim verbose comment blocks added by previous commit	2026-04-27 12:37:33 -07:00
test_subagent_progress.py	feat(delegate): orchestrator role and configurable spawn depth (default flat)	2026-04-21 14:23:45 -07:00
test_subagent_stop_hook.py	feat: shell hooks — wire shell scripts as Hermes hook callbacks	2026-04-20 20:53:51 -07:00
test_subdirectory_hints.py	fix(agent): catch PermissionError in subdirectory hint discovery	2026-04-09 03:10:30 -07:00
test_think_scrubber.py	fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924 ) (#20184 )	2026-05-05 04:33:38 -07:00
test_title_generator.py	fix: improve telegram topic mode setup	2026-05-04 12:07:17 -07:00
test_tool_guardrails.py	fix(agent): make tool loop guardrails warning-first	2026-04-30 20:43:15 -07:00
test_unsupported_parameter_retry.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_unsupported_temperature_retry.py	refactor(memory): remove flush_memories entirely (#15696 )	2026-04-25 08:21:14 -07:00
test_usage_pricing.py	fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies	2026-04-22 17:40:49 -07:00
test_vision_resolved_args.py	fix(vision): preserve explicit provider auth with custom base_url	2026-05-04 05:05:43 -07:00