hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-07 13:02:07 +00:00

Author	SHA1	Message	Date
EloquentBrush	f9559c39c4	fix(gateway): consult lock record argv when cmdline unreadable in scoped-lock stale check PR #24500 introduced stale-lock detection that calls `_looks_like_gateway_process` to confirm a running PID is not an unrelated process that reused the slot. On Windows neither `/proc` nor `ps` is available, so `_read_process_cmdline` always returns `None` and `_looks_like_gateway_process` always returns `False` — causing every valid Windows gateway lock to be marked stale and immediately evicted. Fix: after `_looks_like_gateway_process` returns `False`, call `_read_process_cmdline` directly. If the result is non-`None` the live cmdline was readable and confirms the PID is foreign → stale. If it is `None` (cmdline unreadable, e.g. Windows without ps), fall back to `_record_looks_like_gateway` which validates the stored `argv` the gateway wrote into the lock file at startup. Both oracles must say "not a gateway" before the lock is evicted — the same two-oracle pattern already used in `get_running_pid` (line 941). Adds a regression test that simulates a Windows host where `_looks_like_gateway_process` returns `False` for every PID and `_read_process_cmdline` returns `None`, confirming the lock is kept when the record's argv identifies it as a gateway process.	2026-05-12 16:33:09 -07:00
Teknium	24e2151cd6	chore(release): add AUTHOR_MAP entries for zccyman and Osraka	2026-05-12 16:32:57 -07:00
zccyman	88ede807c4	fix(pricing): add deepseek-v4-pro to official docs pricing table deepseek-v4-pro has been routable since v0.12 but was missing from the _OFFICIAL_DOCS_PRICING table. Sessions using this model showed as "unknown cost" in hermes insights instead of a dollar estimate. Add pricing entry using published list prices: - input: \$1.74/M tokens - output: \$3.48/M tokens - cache_read: \$0.0145/M tokens Uses standard list rates (not the 75% promo) so estimates remain accurate after promo expires 2026-05-31. Closes #24218	2026-05-12 16:32:57 -07:00
Teknium	83b93898c2	feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168 ) * feat(lsp): semantic diagnostics from real language servers in write_file/patch Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server, clangd, bash-language-server, ...) into the post-write lint check used by write_file and patch. The model now sees type errors, undefined names, missing imports, and project-wide semantic issues introduced by its edits, not just syntax errors. LSP is gated on git workspace detection: when the agent's cwd or the file being edited is inside a git worktree, LSP runs against that workspace; otherwise the existing in-process syntax checks are the only tier. This keeps users on user-home cwds (Telegram/Discord gateway chats) from spawning daemons. The post-write check is layered: in-process syntax check first (microseconds), then LSP semantic diagnostics second when syntax is clean. Diagnostics are delta-filtered against a baseline captured at write start, so the agent only sees errors its edit introduced. A flaky/missing language server can never break a write -- every LSP failure path falls back silently to the syntax-only result. New module agent/lsp/ split into: - protocol.py: Content-Length JSON-RPC framer + envelope helpers - client.py: async LSPClient (spawn, initialize, didOpen/didChange, ContentModified retry, push/pull diagnostic stores) - workspace.py: git worktree walk-up + per-server NearestRoot resolver - servers.py: registry of 26 language servers (extension match, root resolver, spawn builder per language) - install.py: auto-install dispatch (npm install --prefix, go install with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/ - manager.py: LSPService (per-(server_id, root) client registry, lazy spawn, broken-set, in-flight dedupe, sync facade for tools layer) - reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file) - cli.py: hermes lsp {status,list,install,install-all,restart,which} Wired into tools/file_operations.py: - write_file/patch_replace now call _snapshot_lsp_baseline before write - _check_lint_delta gains a third tier: LSP semantic diagnostics when syntax is clean - All LSP code paths swallow exceptions; write_file's contract unchanged Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true), wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server overrides (disabled, command, env, initialization_options). Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and read_message round-trip, EOF/truncation/missing Content-Length), workspace gate (git walk-up, exclude markers, fallback to file location), reporter (severity filter, max-per-file cap, truncation), service-level delta filter, and an in-process mock LSP server that exercises the full client lifecycle including didChange version bumps, dedup, crash recovery, and idempotent teardown. Live E2E verified end-to-end through ShellFileOperations: pyright auto-installed via npm into HERMES_HOME, baseline captured, type error introduced, single delta diagnostic surfaced with correct line/column/code/ source, then patch fix removes the diagnostic from the output. Docs: new website/docs/user-guide/features/lsp.md page covering supported languages, configuration knobs, performance characteristics, and troubleshooting; cli-commands.md updated with the 'hermes lsp' reference; sidebar updated. * feat(lsp): structured logging, backend gate, defensive walk caps Cherry-picks the substantive ideas from #24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call. * feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field Two improvements salvaged from #24414's plugin-form alternative, keeping our core-integrated design: 1. atexit cleanup of spawned language servers ---------------------------------------------------------------- ``agent/lsp/__init__.get_service`` now registers an ``atexit`` handler on first creation that tears down the LSPService on Python exit. Without this, every ``hermes chat`` exit was leaking pyright/gopls/etc. processes for a few seconds while their stdout buffers drained -- they got reaped by the kernel eventually but a watchful ``ps aux`` would catch them. The handler runs once per process (gated by ``_atexit_registered``); idempotent ``shutdown_service`` ensures double-fire is a no-op. Errors during shutdown are swallowed at debug level since by the time atexit fires the user has already seen the agent's final response. 2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult ---------------------------------------------------------------- Previously the LSP layer folded its diagnostic block into the ``lint.output`` string, conflating the syntax-check tier with the semantic tier. The agent (and any downstream parsers) now read syntax errors and semantic errors as independent signals: { "bytes_written": 42, "lint": {"status": "ok", "output": ""}, "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..." } ``_check_lint_delta`` returns to its original two-tier shape (syntax check + delta filter); ``write_file`` and ``patch_replace`` independently fetch LSP diagnostics via ``_maybe_lsp_diagnostics`` and pass them into the new field. ``patch_replace`` propagates the inner write_file's ``lsp_diagnostics`` so the outer PatchResult carries the patch's delta correctly. Tests: 19 new - tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration fires once and only once across N get_service calls; the registered callable is our internal shutdown wrapper; shutdown_service is idempotent and safe when never started; exceptions during shutdown are swallowed; inactive service is cached so we don't rebuild on every check. - tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult / PatchResult dataclass shape, to_dict include/omit semantics, channel separation (lint and lsp_diagnostics carry independent signals), write_file populates the field via _maybe_lsp_diagnostics only when the syntax tier is clean, patch_replace propagates the field forward from its internal write_file. Validation: - 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field) - 217/217 pass with file_operations + LSP combined - Live E2E reverified: clean writes -> both fields empty/none; type error introduced -> lint clean (parses), lsp_diagnostics carries the pyright reportReturnType block; patch fix -> both fields clean again. * fix(lsp): broken-set short-circuit so a wedged server isn't paid every write Discovered while auditing failure paths: a language server binary that hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY subsequent write to re-pay the 8s snapshot_baseline timeout. Five writes = ~64s of dead time. The bug: ``_get_or_spawn`` adds the (server_id, root) pair to ``_broken`` inside its inner exception handler, but when the OUTER ``_loop.run`` timeout fires, it cancels the inner task before that handler runs. The pair never makes it to broken-set, so the next write re-enters the spawn path and re-pays the timeout. Fix: - New ``_mark_broken_for_file`` helper at the service layer marks the (server_id, workspace_root) pair broken from the OUTSIDE when the outer timeout fires. Called from the except branches in ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError + generic Exception). Also kills any orphan client process that survived the cancelled future, fire-and-forget with a 1s ceiling. - ``enabled_for`` now consults the broken-set BEFORE returning True. Files in already-broken (server_id, root) pairs short-circuit to False, so the file_operations layer skips the LSP path entirely with no spawn cost. Until the service is restarted (``hermes lsp restart``) or the process exits. - A single eventlog WARNING is emitted on first mark-broken so the user knows which server gave up. Subsequent edits in the same project stay silent. Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the key shape (server_id, per_server_root), enabled_for short-circuit, sibling-file skip in same project, project isolation (broken in A doesn't affect B), graceful no-op for missing-server / no-workspace, and an end-to-end test that snapshots after a failure and verifies the next ``enabled_for`` returns False. Validation: - Live retest of the wedged-binary scenario: 5 sequential writes, first 8.88s (the one snapshot timeout), subsequent four ~0.84s (no LSP cost). Down from 5x12.85s = 64s before this fix. - 99/99 LSP tests pass (92 prior + 7 broken-set) - 224/224 pass with file_operations + LSP combined - Happy path E2E reverified — clean write, type error introduced, patch fix all behave correctly with the new broken-set logic. Note: the FIRST write to a wedged binary still pays 8s (the snapshot_baseline timeout). We could shorten that, but pyright/ tsserver normally take 2-3s and slow CI rust-analyzer can need 5+ seconds, so 8s is the conservative ceiling. Subsequent writes are instant.	2026-05-12 16:31:54 -07:00
Teknium	d89553c2d6	fix(daytona): migrate legacy-sandbox lookup to cursor-based list() (#24587 ) Daytona ships breaking SDK changes on June 10, 2026 — `list()` returns an iterator and the `page=` offset parameter is removed. We pin daytona==0.155.0 so we're past the May 24 hard-cutoff, but the legacy-sandbox resume path in DaytonaEnvironment still passes `page=1` and reads `.items` off the result. Switch to `next(iter(results), None)` against a single-result `list(labels=..., limit=1)` call. Update tests to use `iter([...])` and drop the `page=1` kwarg from list() assertions.	2026-05-12 16:31:46 -07:00
Teknium	38441a7d77	docs(camofox): expand externally-managed sessions section (#24584 ) Adds behavior detail to the existing 'Externally managed Camofox sessions' subsection in features/browser.md: - Three-row settings table (config key + env var + effect). - 'What changes when user_id is set' — soft-cleanup behavior, why DELETE /sessions/<user_id> is skipped. - 'How tab adoption works' — 4-step lookup against GET /tabs, listItemId matching, fallback to new-tab creation, no mid-run re-polling. - Picking session_key: how to attach to a specific existing tab vs share-profile-only behavior with the default per-task session_key. - Concurrency note that Camofox does not arbitrate per-tab focus.	2026-05-12 15:20:42 -07:00
Teknium	f63d520496	chore(camofox): document new env vars + AUTHOR_MAP entry Follow-up to externally managed Camofox session support: - .env.example: document CAMOFOX_URL plus the new CAMOFOX_USER_ID, CAMOFOX_SESSION_KEY, CAMOFOX_ADOPT_EXISTING_TAB env vars. - scripts/release.py: AUTHOR_MAP entry for db@project-aeon.com -> db-aeon.	2026-05-12 15:14:49 -07:00
Dan Benyamin	62fd905340	feat(browser): support externally managed Camofox sessions Allow integrations to share a visible Camofox identity with Hermes and recover existing tabs without carrying local patches. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 15:14:49 -07:00
Teknium	3955aefced	fix(install): use `--extra all` not `--all-extras`; drop lazy-covered extras from [all] (#24515 ) * fix(install): use `--extra all` not `--all-extras`; drop lazy-covered extras from [all] Two coupled fixes for the Windows install hang where uv sync built python-olm from sdist and failed on missing make. # Root cause: --all-extras vs --extra all (credit: ethernet) `uv sync --all-extras` installs every key in [project.optional- dependencies], bypassing the curated [all] extra entirely. So even when [all] excluded [matrix], [rl], [yc-bench], etc., the installer pulled them anyway because they were still defined as extras. On Windows that meant python-olm (no wheel, needs make to build from sdist) and the install died there. The right flag is `--extra all` — install just the [all] extra's contents, respecting curation. Empirically verified via dry-run: --all-extras: pulls python-olm, mautrix, ctranslate2, onnxruntime, atroposlib, tinker, wandb, modal, daytona, vercel, python-telegram-bot, discord.py, slack-bolt, dingtalk-stream, lark-oapi, anthropic, boto3, edge-tts, elevenlabs, exa-py, fal-client, faster- whisper, firecrawl-py, honcho-ai, parallel-web --extra all: pulls none of those — just [all]'s curated set Dockerfile already uses `--extra all` (with comment explaining the gotcha) — knowledge existed; the gap was install.sh / install.ps1 / setup-hermes.sh. Sites fixed: scripts/install.sh L1118, scripts/install.ps1 L809, setup-hermes.sh L245. # Companion fix: drop lazy-covered extras from [all] `tools/lazy_deps.py` already covers anthropic, bedrock, exa, firecrawl, parallel-web, fal, edge-tts, elevenlabs, modal, daytona, vercel, all messaging platforms (telegram/discord/slack/matrix/ dingtalk/feishu), honcho, and faster-whisper. They were ALSO in [all], which defeats the whole point of lazy-install — fresh installs eager-pulled them and inherited whatever was broken upstream (the matrix → python-olm → no Windows wheel chain being the proximate symptom). [all] now contains only what genuinely can't be lazy-installed: cron, cli, dev, pty, mcp, homeassistant, sms, acp, google, web, youtube. Same trim applied to [termux-all]. New regression test asserts the contract: every extra in LAZY_DEPS must NOT also appear in [all]. # Companion fix: surface uv progress + errors setup-hermes.sh's hash-verified path swallowed uv's stderr to a tempfile, identical to the install.sh bug fixed in PR #24504. Same fix applied: stream stderr through directly so users see live progress instead of staring at a frozen prompt. # Files - pyproject.toml: trim [all] and [termux-all] to non-lazy extras only. - scripts/install.sh: --all-extras → --extra all; trim _ALL_EXTRAS / _PYPI_EXTRAS to match. - scripts/install.ps1: --all-extras → --extra all; trim $allExtras / $pypiExtras to match. - setup-hermes.sh: --all-extras → --extra all; stream stderr. - tests/test_project_metadata.py: invert matrix-in-[all] assertion; add lazy-coverage contract test. - uv.lock: regenerated. # Validation 5/5 metadata tests pass. 37/37 in update_autostash + tool_token_ estimation. `uv lock --check` passes. Empirical dry-run confirms `--extra all` excludes python-olm + RL chain on the new lockfile. * fix(install): parse [all] from pyproject.toml instead of mirroring it ethernet's review point: the previous patch left two hand-mirrored copies of [all]'s contents (in install.sh's $_ALL_EXTRAS and install.ps1's $allExtras). That guarantees future drift the next time pyproject.toml's [all] changes. Now both scripts parse pyproject.toml at install time using stdlib tomllib (Python 3.11+, which the bootstrap step already requires). Single source of truth. The only purpose of the parsed list is to build the 'Tier 2: [all] minus broken extras' fallback spec — so we parse, filter against $brokenExtras, and rebuild the .[a,b,c] spec. Also: removed redundant fallback tiers. Before: Tier 1 [all] Tier 2 [all] minus broken Tier 3 PyPI-only extras (no git deps) Tier 4 [web,mcp,cron,cli,messaging,dev] Tier 5 . After: Tier 1 [all] Tier 2 [all] minus broken Tier 3 . Tier 3 (PyPI-only) and Tier 4 (dashboard+core) used to dodge the [rl] git+sdist deps and the [matrix] python-olm build. Both are no longer in [all] post-2026-05-12 lazy-install migration, so the carve-out tiers had no remaining content. Tier 4 also referenced [messaging], which is now lazy-installed — the hardcoded fallback was actually inconsistent with the new policy. Defensive fallback: if tomllib parse fails (corrupted pyproject, unexpected schema), Tier 2 collapses to '.[all]' (same as Tier 1) so the broken-extras path becomes a no-op rather than crashing. * fix(gateway): hide Matrix from setup picker on Windows Matrix is the one messaging platform that has no working install path on Windows: [matrix] -> mautrix[encryption] -> python-olm, which has Linux-only wheels and needs make + libolm to build from sdist. The [all] cleanup in this PR keeps mautrix out of fresh installs, but a user who picked Matrix in 'hermes setup gateway' would still walk into the same sdist build failure when the wizard tried to install the extra. Hide the option at the picker so users never get the chance to try. The gate lives in _all_platforms() — single source of truth for the setup wizard, the curses gateway-config menu, and any future picker. Adapter loading at runtime is intentionally NOT gated: users who already have MATRIX_* env vars set (e.g. config copied from a Linux install) keep working if they somehow have python-olm available. This is the lowest-friction fix — picker visibility only. Tests cover linux/darwin/win32 and verify other platforms aren't collateral damage.	2026-05-12 15:06:25 -07:00
Ahmet Oşrak	4bb0a82a2b	fix(gateway): enqueue SSE EOS sentinel on task completion	2026-05-12 15:04:54 -07:00
Teknium	4fa5f7b765	chore(release): add AUTHOR_MAP entry for luarss	2026-05-12 15:03:33 -07:00
luarss	1189ed7855	fix(docs): correct broken internal links to webhooks and mlops skill pages - cron-script-only: webhook subscription links pointed to /docs/user-guide/features/webhooks; the page lives under messaging/ - mlops-hermes-atropos-environments: axolotl and TRL related-skill links pointed to skills/bundled/mlops/; both files live under skills/optional/mlops/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:03:33 -07:00
墨綠BG	71198b9e19	📝 docs(kanban): clarify dependent task gating	2026-05-12 15:01:55 -07:00
Teknium	954e854ccc	chore(release): map kyanam.preetham@gmail.com → pkyanam	2026-05-12 15:00:29 -07:00
Teknium	629c33c633	test(gateway): patch _pid_exists instead of os.kill for scoped-lock tests Post-#21561 the liveness probe in acquire_scoped_lock() routes through gateway.status._pid_exists (psutil-first, safe on Windows), not os.kill(pid, 0). The two new macOS regression tests were patching status.os.kill, which had no effect — the unmocked psutil call returned False for PID 99999, marking the lock stale before the new code branch ran. The 'replaces' test passed only because acquired=True was already the expected outcome; the 'keeps' test failed in CI. Switch both tests to monkeypatch status._pid_exists directly, matching the existing test_acquire_scoped_lock_rejects_live_other_process pattern, so they actually exercise the new start_time=None + cmdline-based staleness branch.	2026-05-12 15:00:29 -07:00
Preetham Kyanam	653d304290	fix(gateway): detect stale scoped locks via cmdline when start_time is absent on macOS On macOS (and Windows), /proc is unavailable so _get_process_start_time() always returns None. When a gateway creates a scoped lock record with start_time=None and then exits, macOS can reuse that PID for an unrelated process. On restart, acquire_scoped_lock() sees: 1. os.kill(pid, 0) succeeds (PID is alive — but it's bluetoothuserd, not the gateway) 2. existing.start_time is None and current_start is None, so the start_time comparison is inconclusive 3. The lock is treated as active, blocking gateway startup with: "Telegram bot token already in use (PID 873). Stop the other gateway first." Root cause: _read_process_cmdline() only reads /proc/<pid>/cmdline, which doesn't exist on macOS. It always returns None, making _looks_like_gateway_process() always return False, so the cmdline fallback path in acquire_scoped_lock() was unreachable on macOS. Fix (two parts): 1. _read_process_cmdline(): Add a ps(1) fallback for platforms without /proc. When /proc/<pid>/cmdline doesn't exist, we now run "ps -p <pid> -o command=" to retrieve the process command line. The /proc path is tried first (preserving Linux performance); ps is only invoked as a fallback. 2. acquire_scoped_lock(): When both the lock record's start_time and the live process's start_time are None (the macOS case), fall back to checking whether the live PID still looks like a Hermes gateway process via _looks_like_gateway_process(). If it doesn't, the lock is stale. Closes #16376	2026-05-12 15:00:29 -07:00
Austin Pickett	642768c5c7	Merge pull request #24161 from NousResearch/austin/fix/dashboard fix(dashboard): UI polish — modals, layout, consistency	2026-05-12 17:57:31 -04:00
helix4u	a34998ee2f	fix(cli): parse positional insights days	2026-05-12 14:56:47 -07:00
rob-maron	c23a87bc16	union paid recs from nous portal with static list (#24509 )	2026-05-12 12:16:17 -07:00
Teknium	d186186e1a	fix(install): surface uv install + uv.lock sync errors instead of silently hanging (#24504 ) The `c1eb2dcda` tiered installer made two install paths look frozen on slow networks or broken environments because both swallowed the underlying tool's stderr. scripts/install.sh, setup-hermes.sh: curl -LsSf https://astral.sh/uv/install.sh \| sh 2>/dev/null printed only '✗ Failed to install uv' on failure with no diagnostic. Common real causes (glibc mismatch on old distros, corp proxy / TLS interception, missing curl, ~/.local/bin not writable, disk full) were invisible. Also: piping curl into sh masks curl failures under set -e (no pipefail) — sh exits 0 on empty stdin, so a network error succeeded silently. Fix: download installer to a tempfile first, then run it. Capture curl + installer output to a log; on failure, indent and print it. scripts/install.sh hash-verified tier: uv sync --all-extras --locked 2>"$(mktemp)" silenced uv's progress output, making a fresh-venv install (~50 transitives including torch-class deps) look hung for 1-5 minutes — users see 'Trying tier: hash-verified (uv.lock) ...' and assume it's frozen. The mktemp substitution also wasn't saved to a variable, so the uv error on failure was unreachable. Fix: stream uv's stderr directly so users see live 'Resolved N / Prepared / Installed' progress. Print an upfront note that the first run takes 1-5 minutes.	2026-05-12 12:11:16 -07:00
rob-maron	2863e9484a	Use nous portal as model metadata authority (#24502 ) * nous portal metadata resolver * minor fixes	2026-05-12 11:59:31 -07:00
Teknium	c594a23047	feat(agent): per-turn file-mutation verifier footer (#24498 ) Detect when write_file / patch calls fail during a turn and are never superseded by a successful write to the same path. When the final text response is delivered, append an advisory footer listing the files that did NOT change — so models that over-claim 'patched 5 files' after 4 silent failures can't hide the lie. Catches the failure mode reported in Ben Eng's llm-wiki session: grok-4.1-fast issued batches of parallel patches, half failed with 'Could not find old_string', and the agent summarised the turn claiming every file was edited. The user had to manually run 'git status' each turn to catch it. The verifier is a pure post-hoc check on tool results — no new LLM calls, no synthetic messages injected into history (prompt cache preserved), no changes to tool argument dispatch. Per-turn state is keyed by path; a later successful write to the same path clears the failure entry so single-file retry recovery is not flagged. Wired into both _execute_tool_calls_concurrent and _execute_tool_calls_sequential, so batched parallel patches and one-at- a-time edits are both covered. Footer emission happens after the agent loop exits, before transform_llm_output / post_llm_call plugin hooks run, so plugins still see (and can modify) the augmented text. Config: display.file_mutation_verifier (bool, default true) + HERMES_FILE_MUTATION_VERIFIER env override. 31 unit tests in tests/run_agent/test_file_mutation_verifier.py cover target extraction (write_file, patch-replace, patch-v4a single and multi-file), error-preview extraction (JSON .error field and plain string), per-turn state transitions (first-error-wins on repeated failure, success supersedes failure), footer rendering (truncation at 10 entries, user-actionable hint), and env/config precedence. Companion docs updated: user-guide/configuration.md + reference/environment-variables.md.	2026-05-12 11:54:13 -07:00
Austin Pickett	fc3fd6bb6b	fix(dashboard): UI polish — modals, layout, consistency, test fixes Dashboard UX polish pass — consolidates create forms into modals triggered from the page header, fixes layout inconsistencies, adds scroll-to navigation for the Keys page, and aligns the TokenBar with the design system. Changes: - App.tsx: add padding to sidebar header - resolve-page-title.ts: add missing routes, better fallback title - en.ts: fix nav labels (Profiles was 'profiles : multi agents') - ModelsPage: two-col layout, auxiliary tasks modal, TokenBar redesign - ProfilesPage: create button in header, form in modal, Checkbox component - CronPage: create button in header, form in modal - EnvPage: scroll-to sub-nav in header, fix text overflow Modal and dialog standardization: - Replace all native confirm()/window.confirm() with ConfirmDialog (OAuthProvidersCard, PluginsPage, ModelsPage, ConfigPage) - Add useModalBehavior hook (Escape-to-close, scroll lock, focus restore) - Apply hook to ProfilesPage, CronPage, AuxiliaryTasksModal Component fixes (from PR review): - Checkbox: fix controlled/uncontrolled mismatch, add focus-visible ring - TokenBar: add rounded-full to legend dots, remove dead code CI/test fixes: - Fix TS unused imports (noUnusedLocals), type-narrow PickerTarget union - Add windows-footgun suppression on platform-guarded os.killpg - Fix 19 stale unit tests + 9 e2e tests broken by recent main changes - Restore minimal example-dashboard plugin for plugin auth test	2026-05-12 13:59:22 -04:00
Teknium	dd0923bb89	docs: remove public advisory page (handle community comms separately) (#24253 )	2026-05-12 01:09:58 -07:00
Teknium	c1eb2dcda7	feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback (#24220 ) * feat(security): supply-chain advisory checker + lazy-install framework + tiered install fallback Three coordinated mitigations for the Mini Shai-Hulud worm hitting mistralai 2.4.6 on PyPI (2026-05-12) and for the next single-package compromise that follows. # What this PR makes true 1. Users with the poisoned mistralai 2.4.6 in their venv get a loud detection banner with copy-pasteable remediation steps the moment they run hermes (and on every gateway startup). 2. One quarantined / yanked PyPI package can no longer silently demote a fresh install to 'core only' — the installer keeps every other extra and tells the user which tier landed. 3. Future opt-in backends (Mistral, ElevenLabs, Honcho, etc.) can lazy-install on first use under a strict allowlist, instead of eagerly pulling everything at install time. # Detection: hermes_cli/security_advisories.py - ADVISORIES catalog (one entry currently: shai-hulud-2026-05 for mistralai==2.4.6). Adding the next one is a single dataclass. - detect_compromised() uses importlib.metadata.version() — no pip dependency, works in uv venvs that lack pip. - Banner cache (~/.hermes/cache/advisory_banner_seen) rate-limits the startup banner to once per 24h per advisory. - Acks persisted to security.acked_advisories in config.yaml; never re-banner after ack. - Wired into: * hermes doctor — runs first, prints full remediation block * hermes doctor --ack <id> — dismisses an advisory * cli.py interactive run() and single-query branches — short stderr banner pointing at hermes doctor * gateway/run.py startup — operator-visible warning in gateway.log # Lazy-install framework: tools/lazy_deps.py - LAZY_DEPS allowlist maps namespaced feature keys (tts.elevenlabs, memory.honcho, provider.bedrock, etc.) to pip specs. - ensure(feature) installs missing deps in the active venv via the uv → pip → ensurepip ladder (matches tools_config._pip_install). - Strict spec safety regex rejects URLs, file paths, shell metas, pip flag injection, control chars — only PyPI-by-name accepted. - Gated on security.allow_lazy_installs (default true) plus the HERMES_DISABLE_LAZY_INSTALLS env var for restricted/audited envs. - Migrated three backends as proof of pattern: * tools/tts_tool.py — _import_elevenlabs() calls ensure first * plugins/memory/honcho/client.py — get_honcho_client lazy-installs * tts.mistral / stt.mistral entries pre-registered for when PyPI restores mistralai # Installer fallback tiers scripts/install.sh, scripts/install.ps1, setup-hermes.sh: - Centralised _BROKEN_EXTRAS list (currently: mistral). Edit one array when a transitive breaks; users keep every other extra. - New 'all minus known-broken' tier between [all] and the existing PyPI-only-extras tier. Only kicks in when [all] fails resolve. - All three tiers explicit: every fallback announces which tier landed and prints a re-run hint when not on Tier 1. - install.ps1 and install.sh both regenerate their tier specs from the same _BROKEN_EXTRAS array so updates stay in sync. Side effect: install.ps1 Tier 2 spec previously hardcoded 'mistral' in its extra list — bug fixed by the refactor (mistral is filtered out). # Config hermes_cli/config.py — DEFAULT_CONFIG.security gains: - acked_advisories: [] (advisory IDs the user has dismissed) - allow_lazy_installs: True (security gate for ensure()) No config version bump needed — both keys nest under existing security: block, and load_config's deep-merge picks up DEFAULT_CONFIG defaults for users with older configs. # Tests tests/hermes_cli/test_security_advisories.py — 23 tests covering: - detect_compromised matches/non-matches, wildcard frozenset - ack persistence, idempotence, blank rejection, config-failure path - banner cache rate limiting + 24h re-banner + ack-stops-banner - short_banner_lines / full_remediation_text / render_doctor_section / gateway_log_message - shipped catalog well-formedness invariant tests/tools/test_lazy_deps.py — 40 tests covering: - spec safety: 11 safe parametrized + 18 unsafe parametrized - allowlist: unknown-feature rejection, namespace.name shape, every shipped spec passes the safety regex - security gating: config flag, env var, default, fail-open - ensure() happy/sad paths: already-satisfied, install success, pip stderr surfaced on failure, install-succeeds-but-still-missing - is_available, feature_install_command Combined: 63 new tests, all passing under scripts/run_tests.sh. # Validation - scripts/run_tests.sh tests/hermes_cli/test_security_advisories.py tests/tools/test_lazy_deps.py → 63/63 passing - scripts/run_tests.sh tests/hermes_cli/test_doctor.py tests/hermes_cli/test_doctor_command_install.py tests/tools/test_tts_mistral.py tests/tools/test_transcription_tools.py tests/tools/test_transcription_dotenv_fallback.py → 165/165 passing - scripts/run_tests.sh tests/hermes_cli/ tests/tools/ → 9191 passed, 8 pre-existing failures (verified on origin/main before this change) - bash -n on install.sh and setup-hermes.sh → OK - py_compile on all modified .py files → OK - End-to-end smoke test of detect_compromised + render_doctor_section + gateway_log_message with mocked installed version → produces copy-pasteable remediation output # Community Full advisory + remediation steps: website/docs/community/security-advisories/shai-hulud-mistralai-2026-05.md Short-form post drafts (Discord, GitHub pinned issue, README banner): scripts/community-announcement-shai-hulud.md Refs: PR #24205 (mistral disabled), Socket Security advisory <https://socket.dev/blog/mini-shai-hulud-worm-pypi> * build(deps): pin every direct dep to ==X.Y.Z (no ranges) Companion to the supply-chain advisory work: replace every >=/</~= range in pyproject.toml's [project.dependencies] and [project.optional-dependencies] with an exact ==X.Y.Z pin sourced from uv.lock. Why: ranges allow PyPI to ship a fresh version of any direct dep at any time without a code review on our side. With ranges, the malicious mistralai 2.4.6 release would have been pulled by every fresh 'pip install -e .[all]' for the hours between upload and PyPI's quarantine — exactly the install window we got hit on. Exact pins close that window: the only way a new package version reaches a user is via an intentional update on our end. What the user-facing change is: nothing, behavior-wise. Every package resolves to the same version it was already resolving to via uv.lock — the pins just remove the resolver's freedom to pick a different one. Cost: any user installing Hermes alongside another package that requires a newer pin gets a resolver conflict. Acceptable for our isolated-venv install path; documented in the new comment block. Build-system requires line (setuptools>=61.0) is intentionally left as a range — pinning the build backend would block fresh pip from bootstrapping the build on architectures where that exact wheel isn't available. mistral extra (mistralai==2.3.0) is pinned but stays out of [all] (per PR #24205). 'uv lock' regeneration will fail until PyPI restores mistralai; lockfile regeneration is gated behind that, NOT on every PR. LAZY_DEPS in tools/lazy_deps.py also moved to exact pins so the lazy- install pathway can never resolve a different version than the one declared in pyproject.toml. Validation: - Cross-checked all 77 pinned direct deps in pyproject.toml against uv.lock — every pin matches the resolved version exactly. - Cross-checked all LAZY_DEPS specs against uv.lock — same. - 'uv pip install -e .[all] --dry-run' resolves 205 packages cleanly. - tests/tools/test_lazy_deps.py + tests/hermes_cli/test_security_advisories.py → 63/63 passing (every shipped spec passes the safety regex). - Doctor + TTS + transcription targeted suite → 146/146 passing. * build(deps): hash-verify transitives via uv.lock; remove unresolvable [mistral] extra You asked: 'what about the dependencies the dependencies rely on?' — correctly noting that exact-pinning direct deps in pyproject.toml does NOT cover the transitive graph. `pip install` and `uv pip install` both re-resolve transitives fresh from PyPI at install time, so a compromised transitive (e.g. `httpcore` if it got worm-poisoned tomorrow) would still hit our users even with every direct dep exact-pinned. # What this commit fixes 1. Both real installer scripts now prefer `uv sync --locked` as Tier 0. uv.lock records SHA256 hashes for every transitive — a compromised package with a different hash gets REJECTED. Falls through to the existing `uv pip install` cascade if the lockfile is missing or stale, with a loud warning that the fallback path does NOT hash-verify transitives. Previously only `setup-hermes.sh` (the dev path) used the lockfile; `scripts/install.sh` and `scripts/install.ps1` (the paths fresh users actually run) skipped it. 2. Removed the `[mistral]` extra entirely. The `mistralai` PyPI project is fully quarantined right now — every version returns 404, so any pin we wrote was unresolvable, which broke `uv lock --check` in CI. Restoration is documented in pyproject.toml as a 5-step checklist (verify, re-add extra, re-enable in 4 modules, regenerate lock, optionally re-add to [all]). 3. Regenerated uv.lock. 262 packages, mistralai/eval-type-backport/ jsonpath-python pruned. `uv lock --check` now passes. # Defense-in-depth view \| Layer \| Where \| Protects against \| \|----------------------------\|-------------------\|-------------------------------------------\| \| Exact pins in pyproject \| direct deps \| new mistralai 2.4.6-style direct compromise \| \| uv.lock + `--locked` install \| transitive graph \| transitive worm injection \| \| Tier-0 hash-verified path \| install.sh / .ps1 \| actually USE the lockfile in fresh installs \| \| `uv lock --check` CI gate \| every PR \| drift between pyproject and lockfile \| \| `hermes_cli/security_advisories.py` \| runtime \| cleanup for users who already got hit \| The exact pinning + hash verification together close the supply-chain gap. Without the lockfile path, exact pins alone are theater. # Validation - `uv lock --check` → passes (262 packages resolved, no drift). - `bash -n` on install.sh + setup-hermes.sh → OK. - 209/209 tests passing across new + adjacent test files (test_lazy_deps.py, test_security_advisories.py, test_doctor.py, test_tts_mistral.py, test_transcription_tools.py). - TOML parse OK. * chore: remove community announcement drafts (PR body covers it) * build(deps): lazy-install every opt-in backend (anthropic, search, terminal, platforms, dashboard) Extends the lazy-install framework to cover everything that's not used by every hermes session. Base install drops from ~60 packages to 45. Moved out of core dependencies = []: - anthropic (only when provider=anthropic native, not via aggregators) - exa-py, firecrawl-py, parallel-web (search backends; only when picked) - fal-client (image gen; only when picked) - edge-tts (default TTS but still optional) New extras in pyproject.toml: [anthropic] [exa] [firecrawl] [parallel-web] [fal] [edge-tts]. All added to [all]. New LAZY_DEPS entries: provider.anthropic, search.{exa,firecrawl,parallel}, tts.edge, image.fal, memory.hindsight, platform.{telegram,discord,matrix}, terminal.{modal,daytona,vercel}, tool.dashboard. Each import site now calls ensure() before importing the SDK. Where the module had a top-level try/except (telegram, discord, fastapi), the graceful-fallback pattern was extended to lazy-install on first check_*_requirements() call and re-bind module globals. Updated test_windows_native_support.py tzdata check from snapshot (>=2023.3 literal) to invariant (any version + win32 marker). Validation: - Base install: 45 packages (was ~60); 6 newly-extracted packages absent - uv lock --check: passes (262 packages, no drift) - 209/209 lazy_deps + advisory + doctor + tts/transcription tests passing - py_compile clean on all 12 modified modules	2026-05-12 01:02:25 -07:00
Teknium	99ad2d1372	fix(deps): unbreak [all] install — drop mistralai while PyPI quarantined (#24205 ) The `mistralai` PyPI package was quarantined on 2026-05-12 after a malicious 2.4.6 release. Every fresh resolve (AUR makepkg, Docker build, CI run, install.sh first-run) currently fails on `mistralai>=2.3.0,<3` because PyPI returns zero candidates. Existing users running `hermes update` mostly didn't notice — `hermes update` falls back from `.[all]` to per-extra retries and silently skips mistral with a warning that scrolls past. But fresh installs hard-fail or lose every other extra. Changes: - pyproject.toml: drop `hermes-agent[mistral]` from `[all]` and `[termux-all]`. The `mistral` extra itself is preserved so users can opt back in once PyPI un-quarantines. - hermes_cli/tools_config.py: hide Mistral Voxtral TTS from the `hermes tools` provider picker until restored. - hermes_cli/web_server.py: drop "mistral" from dashboard STT options. - tools/transcription_tools.py: explicit `provider: mistral` returns "none" with a clear status message; auto-detect skips mistral. - tools/tts_tool.py: dispatcher returns a clear "temporarily disabled" error before any SDK import attempt (avoids cached-stale-package surprises). - tests/tools/: update three test files to assert the new disabled behavior. Each test docstring records why and points at the rollback trigger (PyPI un-quarantines mistralai). Restore plan: revert this commit once the package is available on PyPI again. The behavior change is intentional and documented in code comments + test docstrings to make the rollback trivial. Validation: - scripts/run_tests.sh tests/tools/ -k 'mistral or stt or tts' → 425/425 passing. Refs: https://pypi.org/simple/mistralai/ (currently "pypi:project-status: quarantined").	2026-05-11 23:02:15 -07:00
nightcityblade	407683b72d	fix(docs): repair Voice & TTS provider table Fixes NousResearch/hermes-agent#24101	2026-05-11 22:42:00 -07:00
Robin Fernandes	94d9db72ba	add client marker tag on aux inference requests	2026-05-11 22:30:42 -07:00
Austin Pickett	58e2109f10	fix(minimax): harden OAuth dashboard and runtime Handle MiniMax OAuth expiry values consistently across CLI and dashboard flows, fix CLI status/add behavior, and force pooled OAuth runtime requests through Anthropic Messages. - web_server._minimax_poller: parse expired_in via the shared resolver so unix-ms absolute timestamps stop landing as TTL seconds and crashing with 'year 583911 is out of range' when a user connects MiniMax OAuth from the dashboard. - auth._minimax_oauth_login / _refresh_minimax_oauth_state: same fix on the CLI login + refresh paths. - auth.get_auth_status: dispatch minimax-oauth to its dedicated status function instead of falling through. - auth_commands.auth_add_command: 'hermes auth add minimax-oauth' now starts the device-code login flow and persists a pool entry with the access + refresh tokens, instead of requiring credentials to already exist. - runtime_provider._resolve_runtime_from_pool_entry: pin pooled minimax-oauth credentials to anthropic_messages so a stale model.api_mode: chat_completions can't send requests to /anthropic/chat/completions and trigger MiniMax nginx 404s. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 22:15:16 -07:00
rob-maron	32abe742fa	fix comment	2026-05-11 21:30:29 -07:00
rob-maron	f0c2964f0b	remove comments	2026-05-11 21:30:29 -07:00
rob-maron	057fc7b073	fix guard	2026-05-11 21:30:29 -07:00
rob-maron	528bba6734	fix kimi	2026-05-11 21:30:29 -07:00
Teknium	7993e03c06	fix(cache): route Nous Portal Qwen through Portal-Claude cache pathway (#24151 ) Qwen models on Nous Portal (e.g. qwen3.6-plus) now get the same envelope-layout cache_control markers and long-lived (1h cross-session) cache treatment as Portal Claude. Portal proxies to OpenRouter with identical wire-format and cache_control semantics, but the prior policy left Portal Qwen falling through to the alibaba-family branch (which only matches provider=opencode/alibaba), serving 0% cache hits and re-billing the full prompt every turn. Scope is narrow: Portal Claude OR Portal Qwen. Other models on Portal keep their existing behavior. - _anthropic_prompt_cache_policy: add (is_nous_portal and qwen) -> (True, False) - _supports_long_lived_anthropic_cache: drop Claude-only gate for Portal so Qwen also gets the validated 1h cross-session layout - tests cover both functions, both bare and vendored qwen slug forms, and the rejection of non-Claude non-Qwen Portal traffic	2026-05-11 21:04:55 -07:00
Ben Barclay	3c23b15f81	fix(tui-clipboard): skip native safety net on OSC52-capable terminals (#20954 ) Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (push) Waiting to run Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Has been cancelled Details Nix Lockfile Fix / fix (push) Has been cancelled Details Build Skills Index / build-index (push) Has been cancelled Details Build Skills Index / deploy-with-index (push) Has been cancelled Details * fix(tui-clipboard): skip native safety net on OSC52-capable terminals On terminals with first-class OSC 52 support (Ghostty, kitty, WezTerm, Windows Terminal, VS Code), setClipboard() currently fires both OSC 52 AND a parallel native-tool write (wl-copy / xclip / pbcopy). On Wayland + wl-copy this corrupts the clipboard: probeLinuxCopy() runs wl-copy with empty stdin as an existence check (destructive — wipes clipboard to empty string), and the subsequent real wl-copy invocation races OSC 52 plus its own daemon's previous SIGTERM. Symptom: user on Arch + Ghostty + wl-copy (Wayland, no tmux, no SSH) had to press Ctrl+Shift+C three times before a selection landed. env -u WAYLAND_DISPLAY -u DISPLAY HERMES_TUI_FORCE_OSC52=1 (which short-circuits copyNative via the DISPLAY-absent early-return) made every copy work instantly — proving OSC 52 alone is sufficient on Ghostty and that copyNative() is actively destructive there. Add OSC52_CAPABLE_TERMINALS allowlist to terminal.ts (same pattern as the existing EXTENDED_KEYS_TERMINALS), and gate copyNative() on the terminal NOT being on it. The native safety net continues to fire on unrecognised terminals (xterm, GNOME Terminal, Konsole, Terminal.app, etc.) where OSC 52 is less reliable. * fix(tui-clipboard): address Copilot review feedback - Move OSC52_CAPABLE_TERMINALS + supportsOsc52Clipboard() from ink/terminal.ts to utils/env.ts. ink/terminal.ts already imports link from ink/termio/osc.ts; importing back into termio/osc.ts introduced a circular dependency. utils/env.ts has no deps on either file and already owns terminal detection (detectTerminal()), so the helper sits naturally next to it. - Replace the inline gating (!SSH_CONNECTION && !supportsOsc52Clipboard()) with a pure shouldUseNativeClipboard(env, terminal) helper. The old expression skipped native on allowlisted terminals even when setClipboard() wouldn't actually emit OSC 52 (e.g. inside TMUX/STY where we use tmux load-buffer instead, or when the user has set HERMES_TUI_FORCE_OSC52=0). That made the clipboard write a no-op in those configurations. The new helper: 1. SSH_CONNECTION set -> false (existing behaviour) 2. TMUX or STY set -> true (we go through load-buffer, no race) 3. shouldEmitClipboardSequence() false -> true (native is the only path left when OSC 52 is suppressed) 4. Otherwise: skip native iff terminal is allowlisted. - Add 11 tests for shouldUseNativeClipboard covering the SSH guard, TMUX/STY tmux-inside-Ghostty case, HERMES_TUI_FORCE_OSC52=0 override, allowlisted vs non-allowlisted terminals, precedence, and default-args smoke. Tests follow the package's existing parameterised-helper style (no vi.mock; helpers accept env and terminal as arguments). - Update test imports to the new utils/env.js path. * fix(tui-clipboard): address Copilot round 2 feedback * fix(tui-clipboard): address Copilot round 3 feedback * fix(tui-clipboard): address Copilot round 4 feedback	2026-05-11 19:40:07 -07:00
Teknium	e85592591e	fix(nous): surface Portal-flagged free models in picker even when curated list is stale (#24082 ) Free-tier users were seeing 'No free models currently available.' in the `hermes model` and post-login pickers even though qwen/qwen3.6-plus is free on the Portal right now. Three independent breakages compounded: 1. The docs-hosted catalog manifest at website/static/api/model-catalog.json was not regenerated when _PROVIDER_MODELS['nous'] was updated, so users fetching the manifest got a list that didn't include qwen/qwen3.6-plus. 2. _resolve_nous_pricing_credentials() returned ('', '') on any auth blip, collapsing get_pricing_for_provider('nous') to {} and making every curated model fall through the free-tier filter as 'paid'. 3. Even with healthy pricing, the picker only ever showed models from the in-repo curated list intersected with live pricing — a Portal-flagged free model not yet in the curated list could never appear. Changes: - hermes_cli/models.py: new union_with_portal_free_recommendations() that augments the curated list with Portal freeRecommendedModels entries (with synthetic free pricing so partition keeps them). The Portal's /api/nous/recommended-models endpoint is now the source of truth for free-tier surfacing — old Hermes builds will see new free models without a CLI release. - hermes_cli/models.py: _resolve_nous_pricing_credentials() falls back to the public inference base URL when runtime cred resolution fails. The /v1/models endpoint exposes pricing without auth, so silently returning {} just because a refresh token expired was wrong. - hermes_cli/auth.py + hermes_cli/main.py: both free-tier picker call sites call union_with_portal_free_recommendations() before partition. - tests/hermes_cli/test_models.py: 7 tests covering union behaviour (prepend, dedup, end-to-end with stale pricing, empty/missing/error payloads, invalid entries). - tests/hermes_cli/test_model_catalog.py: drift guard TestManifestMatchesInRepoLists fails CI when _PROVIDER_MODELS['nous'] or OPENROUTER_MODELS is edited without re-running scripts/build_model_catalog.py. Verified empirically that removing a manifest entry triggers an assertion with an actionable error message. Validation: - 133/133 targeted tests pass (test_models, test_model_catalog, test_auth_nous_provider). - Live E2E against the real Portal: - Stale curated list ['claude-opus','claude-sonnet','gpt-5.4'] (no qwen) → after union: ['qwen/qwen3.6-plus', ...] → partition(free_tier=True): selectable=['qwen/qwen3.6-plus']. - Simulated expired refresh token → anon fetch returns 403 pricing entries including qwen/qwen3.6-plus -> {prompt:0, completion:0}. - ruff: clean.	2026-05-11 18:08:16 -07:00
Teknium	ced1990c1c	feat(computer-use): refresh cua-driver on `hermes update` + add `install --upgrade` (#24063 ) cua-driver was only installed once on toolset enable: `_run_post_setup` early-returns when the binary is already on PATH, so upstream fixes (e.g. v0.1.6 Safari window-focus fix) never reached existing users without manual reinstall. Two refresh points now: - `hermes update` re-runs the upstream installer at the end of the update if cua-driver is on PATH (macOS-only, no-op otherwise). Ties driver freshness to the user-controlled update cadence — no startup latency, no per-launch GitHub API call. - `hermes computer-use install --upgrade` for manual force-refresh. The upstream `install.sh` always pulls the latest release, so re-running is the canonical upgrade path. No version-comparison logic needed. `hermes computer-use status` now shows the installed version, and points at `--upgrade` for refreshing.	2026-05-11 17:10:58 -07:00
Teknium1	97a0e69df0	chore(release): add AUTHOR_MAP entry for ahmedbadr3	2026-05-11 16:51:09 -07:00
Ahmed Badr	05bad7b1e7	fix(dashboard): MiniMax 'Login' button launched Claude OAuth (#22832 ) Fixes #22832. ## Root cause `hermes_cli/web_server.py:start_oauth_login` dispatched OAuth flows by the catalog's `flow` field rather than provider id: if catalog_entry["flow"] == "pkce": return _start_anthropic_pkce() The catalog had two `flow: "pkce"` entries — `anthropic` and `minimax-oauth` — so clicking "Login" on MiniMax in the dashboard's Keys tab unconditionally launched the Anthropic/Claude PKCE flow. ## Fix Three changes in `hermes_cli/web_server.py`: 1. Catalog entry for `minimax-oauth` changed from `flow: "pkce"` to `flow: "device_code"`. From a UX perspective MiniMax is a verification-URI + user-code flow (open URL, enter code, backend polls) — same shape as Nous's device-code flow. The PKCE bit (verifier + challenge from `_minimax_pkce_pair`) is a security extension that doesn't change the operator experience; the existing dashboard modal already renders `device_code` correctly for this UX. 2. New MiniMax branch in `_start_device_code_flow`, mirroring the existing Nous branch but calling MiniMax-specific helpers (`_minimax_request_user_code`, `_minimax_pkce_pair`). Stashes verifier + state in the session for the poller to consume. Handles the overloaded `expired_in` field (could be unix-ms timestamp OR seconds-from-now duration) the same way `_minimax_poll_token` does. 3. New `_minimax_poller` background thread mirroring `_nous_poller`. Calls `_minimax_poll_token` → on success builds the same `auth_state` dict the CLI flow (`_minimax_oauth_login`) builds, and persists via `_minimax_save_auth_state` so the dashboard path leaves the system in the same state as `hermes auth add minimax-oauth`. Plus a dispatcher tightening to prevent regression: the `pkce` branch now requires `provider_id == "anthropic"`, so any future PKCE provider added without a proper start function gets a clean `400 Unsupported flow` rather than silently launching Anthropic OAuth. ## Test New `tests/hermes_cli/test_web_oauth_dispatch.py`: - Regression test asserting MiniMax start does NOT return claude.ai - Sanity test that Anthropic PKCE still works after the dispatcher tightening - Forward-looking test: a hypothetical pkce-flagged provider without an explicit branch is rejected cleanly rather than misrouted ## Limitations - The dashboard MiniMax path defaults to `region="global"`. CN-region operators can still use the CLI flow which supports `--region cn`. Adding a region toggle to the dashboard UI is a follow-up.	2026-05-11 16:51:09 -07:00
Teknium	ea1d0462cf	fix(cli): vertical fallback for markdown tables wider than terminal (#23948 ) Follow-up to #23863 (CJK table alignment). The realigner was correctly padding pipes to identical column offsets, but when a table's natural width exceeds terminal cells it produced lines that the terminal soft-wrapped mid-cell, destroying column alignment visually even though the bytes were perfectly padded. Reported as 'columns are not aligned' on tables containing one long row alongside several short rows. Approach mirrors Claude Code's MarkdownTable.tsx narrow-terminal fallback: when realign_markdown_tables is given an available_width budget and the rebuilt horizontal table exceeds it, render each body row as 'Header: value' lines separated by a thin ─ rule. Word-wraps oversize values at the budget with a 2-space continuation indent. - agent/markdown_tables.py: realign_markdown_tables(text, available_width=None); threshold check at the top of _render_block flips into a new _render_vertical fallback. Includes _wrap_to_width with hard-break for tokens longer than the budget. - cli.py: helper _terminal_width_for_streaming() returns shutil.get_terminal_size().columns minus _STREAM_PAD and a 2-cell safety margin; passed to all three realign call sites (_render_final_assistant_content for strip+render Panel paths, and the streaming flushers in _emit_stream_text / _flush_stream). - tests/agent/test_markdown_tables.py: 4 new tests covering the overflow-vertical fallback for ASCII + CJK content, the 'fits → keep horizontal' case, and the long-cell wrap with indent. Live-verified: with COLUMNS=100, the user's reported 'long row in ASCII table' case now renders as vertical key-value rows that all fit the panel; the 6-column CJK comparison table still renders as an aligned horizontal table because it fits inside 100 cols.	2026-05-11 16:49:13 -07:00
ethernet	825bd50e6b	Merge pull request #18036 from NousResearch/fix/bundle-size ui-tui: bundle with esbuild, drop runtime node_modules	2026-05-11 17:46:19 -04:00
brooklyn!	75b428c852	feat(ui-tui): resolve markdown links to readable page titles (#24013 ) * feat(ui-tui): resolve links to readable page titles Mirror desktop pretty-link behavior in the TUI by resolving HTTP links to page titles with shared caching and safe fetch filters, plus slug-based fallbacks so chat links stay readable even when title fetch fails. * refactor(ui-tui): tighten link-title fallback handling Clean up the link-title resolver by hardening in-flight cleanup and clarifying title length limits, while adding focused coverage for HTML entity decoding and markdown-label fallback behavior. * fix(ui-tui): block private-network targets in title fetches Prevent automatic link-title resolution from requesting local or private hosts by rejecting RFC1918, link-local, ULA, and intranet-style hostnames before fetch, and add regression coverage for blocked host patterns.	2026-05-11 14:16:31 -07:00
ethernet	c6ca11618a	refactor(tui): simplify TUI build logic, remove stale staleness checks The old mtime-tracking staleness machinery (_tui_build_needed, _hermes_ink_bundle_stale, _find_bundled_tui) tried to avoid rebuilding by comparing source timestamps to dist/entry.js. This was fragile and added ~100 lines of code. Replace with three clear paths: 1. HERMES_TUI_DIR set (prebuilt/nix): just node dist/entry.js, no build 2. --dev mode: tsx src/entry.tsx, no build, hot reload 3. Normal: always npm run build (esbuild is ~1s, correctness > caching) Also error when HERMES_TUI_DIR is set with --dev (footgun: prebuilt bundle has no source code to hot-reload).	2026-05-11 17:04:34 -04:00
kshitijk4poor	9a63b5f16c	chore: add nicoechaniz to AUTHOR_MAP	2026-05-11 13:16:07 -07:00
nicoechaniz	e2b713cced	fix(model-metadata): skip OpenRouter for known providers, add kimi/moonshot to PROVIDER_TO_MODELS_DEV Based on PR #23950 by @nicoechaniz. - Add "kimi" and "moonshot" to PROVIDER_TO_MODELS_DEV → kimi-for-coding - Gate OpenRouter metadata step behind "if not effective_provider": known providers should not be overridden by community-maintained OR data - Keep the targeted Kimi-family 32k guard as a secondary safety net inside the OR gate (for unknown providers with Kimi models) Co-authored-by: nicoechaniz <nicoechaniz@altermundi.net>	2026-05-11 13:16:07 -07:00
kshitijk4poor	91eef6255e	fix: correct context-length resolution for kimi-k2.6 on Ollama Cloud and Kimi Coding Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K, tripping the 64K minimum-context guard and preventing use of the model on Ollama Cloud and Kimi Coding / Moonshot providers. Three fixes in the context-length resolution chain: 1. Ollama Cloud native /api/show query: new _query_ollama_api_show() queries the Ollama native API for authoritative GGUF model_info context_length. For hosted Ollama, prefers model_info over num_ctx since users can't set their own num_ctx on Cloud. Added at step 5e in get_model_context_length(), before the models.dev fallback. 2. models.dev :cloud/-cloud suffix fallback: lookup_models_dev_context() now also tries appending :cloud and -cloud suffixes when the bare model name doesn't match. models.dev stores 'kimi-k2.6:cloud' but users and the live API use bare 'kimi-k2.6'. 3. Kimi-family 32K guard: after the OpenRouter metadata step, reject exactly 32768 for Kimi-named models (kimi-, moonshot) and fall through to hardcoded defaults ('kimi': 262144). OpenRouter reports 32768 for moonshotai/kimi-k2.6 but the model actually supports 262K. Narrow filter — only 32768, only Kimi-family — becomes dead code when OpenRouter updates its metadata. ---	2026-05-11 13:16:07 -07:00
ethernet	3197b4de6d	Merge remote-tracking branch 'origin/main' into fix/bundle-size	2026-05-11 16:01:04 -04:00
Siddharth Balyan	271883447e	feat: expose HERMES_SESSION_ID to agent tools via ContextVar + env (#23847 ) Set HERMES_SESSION_ID using the existing session_context.py ContextVar system for concurrency safety (multiple gateway sessions in one process won't cross-talk). Also writes os.environ as fallback for CLI mode. Touchpoints: - gateway/session_context.py: Add _SESSION_ID ContextVar + _VAR_MAP entry - run_agent.py: Set both ContextVar and os.environ at init and on context-compression rotation - tools/environments/local.py: Bridge ContextVars into subprocess env in _make_run_env() (ContextVars don't propagate to child processes) - tests/run_agent/test_session_id_env.py: 3 tests covering env, provided ID, and ContextVar paths execute_code subprocess already passes HERMES_* prefixed vars through _scrub_child_env (line 82: _SAFE_ENV_PREFIXES includes 'HERMES_'). Primary use case: webhook-triggered agents that need to include a `--resume <session_id>` takeover command in their output.	2026-05-12 00:16:45 +05:30
kshitij	ce0f529cde	chore: ruff auto-fix C401, C416, C408, PLR1722 (#23940 ) C401: set(x for x in y) -> {x for x in y} (set comprehension) C416: [(k,v) for k,v in d] -> list(d.items()) (unnecessary listcomp) C408: tuple()/dict() -> ()/{} (unnecessary collection call) PLR1722: exit() -> sys.exit() (adds import sys where needed) 21 instances fixed, 0 remaining. 19 files, +40/-36.	2026-05-11 11:20:58 -07:00
Teknium	7b76366552	feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828 ) Cuts input cost for first-turn Claude requests by ~85-90% on subsequent sessions within an hour. Tools array (~13k tokens for default toolset) + stable system prefix (~5-8k tokens) get a 1h cache_control marker; the volatile suffix (memory, USER profile, timestamp, session id) sits in a separate non-cached block at the end so it doesn't poison the cross-session prefix when it changes. Provider gate: Claude on native Anthropic (incl. OAuth subscription), OpenRouter, and Nous Portal (which proxies to OpenRouter). All other providers keep today's system_and_3 layout unchanged. Layout (4 cache_control breakpoints, Anthropic max): 1. tools[-1] -> 1h (cross-session) 2. system content[0] -> 1h (cross-session, stable prefix) 3. messages[-2] -> 5m (within-session rolling) 4. messages[-1] -> 5m (within-session rolling) Within-session rolling shrinks from 3 messages to 2 to free the breakpoint budget. On Claude with realistic tool loadouts the long-lived tier carries the bulk of cross-session value anyway. System prompt is now always assembled cache-friendly: stable identity / guidance / skills / platform hints first, then session-stable context files (AGENTS.md, .cursorrules), then per-call volatile content. Old single-string callers see the same logical content (same join order), just reordered so volatile lives at the end. Config knobs (defaults shown): prompt_caching: cache_ttl: "5m" # rolling-window TTL (unchanged) long_lived_prefix: true # opt-out switch long_lived_ttl: "1h" # cross-session prefix TTL Live E2E (tests/agent/test_prompt_caching_live.py, gated on OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset: Call 1 (cold): cache_write=13,415 cache_read=0 Call 2 (NEW agent + msg): cache_write=391 cache_read=13,025 Cross-session reuse: 97.09% Implementation: * agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived() + mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control() preserved verbatim for the fallback path. * agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards cache_control onto each Anthropic-format tool dict. * run_agent.py: _build_system_prompt_parts() returns the 3-tier dict; _build_system_prompt() joins them (backward compatible). _supports_long_lived_anthropic_cache() policy added next to the existing _anthropic_prompt_cache_policy() (which now also recognises Nous Portal Claude — pre-existing gap fixed in passing). _build_api_kwargs() resolves tools_for_api once and propagates the marker through all four build paths (anthropic_messages, bedrock, codex_responses, profile/legacy chat completions). Long-lived flag plumbed into the runtime snapshot/restore + model-switch + fallback-promotion paths. Tests: * tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache, TestApplyAnthropicCacheControlLongLived). * tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests (TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes + a fallback-target case). * tests/agent/test_prompt_caching_live.py: new live E2E (skipif when OPENROUTER_API_KEY is unset; runs outside the hermetic suite). * Targeted suites: 327/327 pass (caching/adapter/policy/builder). * tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry, verified failing on pristine origin/main).	2026-05-11 11:14:56 -07:00

1 2 3 4 5 ...

8137 commits