hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
teknium1	47bc8e080d	chore(release): AUTHOR_MAP noreply entry for Slimydog21	2026-05-18 10:37:35 -07:00
Slimydog21	aae1615977	fix(xai-responses): strip enum values containing '/' from tool schemas xAI's /v1/responses and /v1/chat/completions endpoints reject tool schemas whose enum values contain a forward slash with a generic HTTP 400 'Invalid arguments passed to the model.' before any token is emitted — the schema compiler trips on the '/' character regardless of where it appears. Most commonly hit by MCP-derived tools whose enum lists HuggingFace model IDs ('Qwen/Qwen3.5-0.8B', 'openai/gpt-oss-20b') or owner/name environment identifiers. Mirrors the existing strip_pattern_and_format sanitizer (PR for #27197). The new strip_slash_enum walks tool parameters and drops the entire enum keyword when any value contains '/' — keeping it partial would still 400 since xAI's failure is all-or-nothing on the enum. The field description still reaches the model so the prompting hint is preserved. Wired in at both code paths for parity: - agent/chat_completion_helpers.py (main agent xAI Responses path) - agent/auxiliary_client.py (aux client xAI Responses path, matching the same parity guarantee `2fae8fba9` established for pattern/format) Salvaged from #28021 by @Slimydog21 — contributor's branch was severely stale (would have reverted ~5000 LOC across azure/kanban/i18n); fix re-applied surgically on current main with their sanitizer + 9 tests preserved verbatim. Author noreply email used (original was a Mac hostname leak).	2026-05-18 10:37:35 -07:00
EloquentBrush0x	d9331eecee	fix(minimax-oauth): quarantine dead tokens on terminal refresh failure resolve_minimax_oauth_runtime_credentials called _refresh_minimax_oauth_state without a try/except, so a terminal failure (invalid_grant, refresh_token_reused, invalid_refresh_token) raised AuthError but left the dead refresh_token in auth.json. Every subsequent API call retried the same token via a network round-trip, failing identically each time. Fix: wrap the refresh call and, when exc.relogin_required is True and a refresh_token is present, clear the dead OAuth fields (access_token, refresh_token, expires_*) and write a last_auth_error quarantine marker to auth.json before re-raising. The next call sees no access_token and fails fast with 'not_logged_in' — no network retry — and the user is prompted to re-authenticate. Mirrors the existing quarantine pattern for Nous (_quarantine_nous_oauth_state), xAI-OAuth (#28116), and Codex-OAuth (#28118). Persist failure is best-effort (logged at DEBUG, error still re-raised). Salvaged from #28003 by @EloquentBrush0x — contributor's branch was severely stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems); fix re-applied surgically with their pattern preserved and added two regression tests (terminal-quarantines + transient-does-not-quarantine).	2026-05-18 10:34:03 -07:00
EloquentBrush0x	b570e0fdd0	fix(codex-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions When a Codex OAuth refresh token is permanently invalidated (HTTP 400/401/403, token revoked or reused), _mark_exhausted was called but auth.json was left with the dead credentials. On the next session, _seed_from_singletons re-read auth.json and re-seeded the pool with the same revoked token, triggering the same terminal failure in a loop. Add _is_terminal_codex_oauth_refresh_error to auth.py and a matching quarantine block in _refresh_entry: when a terminal error is detected and auth.json holds no newer tokens, clear access_token/refresh_token from auth.json and remove all device_code-sourced pool entries from memory. Mirrors the Nous quarantine added in `c90556262` and the xAI quarantine in #28116. Also add a pre-refresh sync from auth.json before calling refresh_codex_oauth_pure, matching the xAI and Nous patterns, to avoid refresh_token_reused races when multiple Hermes processes share the same auth.json singleton. Salvaged from #27911 by @EloquentBrush0x — contributor's branch was severely stale (would have reverted ~5000 LOC across azure/kanban/i18n subsystems); fix re-applied surgically on current main with their predicate and tests preserved.	2026-05-18 10:31:40 -07:00
Teknium	9aae59feab	fix(compress): make abort-on-summary-failure opt-in via config flag (#28117 ) PR #28102 made the summary-failure abort path the unconditional default, changing established behavior. Gate it behind config.yaml flag `compression.abort_on_summary_failure` (default False = historical fallback-placeholder behavior). - hermes_cli/config.py: new `compression.abort_on_summary_failure` key, default False, documented inline. - agent/agent_init.py: read the flag from compression config and pass to ContextCompressor. - agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure` (default False). `compress()` failure branch gates the abort on the flag; when False, falls through to the restored legacy fallback path (static "summary unavailable" placeholder + drop middle window). - tests: restore original fallback expectations as default; add new TestAbortOnSummaryFailure class for the opt-in mode. Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort detection, locale `gateway.compress.aborted` key) from PR #28102 stays intact — those paths only fire when `_last_compress_aborted` is True, which now only happens when the flag is enabled.	2026-05-18 10:28:20 -07:00
EloquentBrush0x	5e40f83cb7	fix(xai-oauth): quarantine terminal refresh errors so dead tokens are not replayed across sessions When refresh_xai_oauth_pure raises a terminal error (HTTP 400/401/403, i.e. revoked or reused refresh token), _refresh_entry's existing race- recovery path re-syncs from auth.json and returns if another process has already rotated the tokens. If auth.json still holds the same stale token pair, the function fell through to _mark_exhausted — leaving the dead credentials in auth.json. On the next Hermes startup _seed_from_singletons re-seeded the pool from those stale tokens, causing the same failure loop on every session. Fix: after the auth.json re-sync check in the xAI-oauth error handler, detect terminal errors with the new _is_terminal_xai_oauth_refresh_error helper and apply a quarantine: - Clear access_token and refresh_token from providers["xai-oauth"]["tokens"] in auth.json so they are not re-seeded. - Write a last_auth_error entry for hermes doctor / auth status diagnostics. - Remove all loopback_pkce entries from the in-memory pool so the current session stops retrying with the dead credentials. Mirrors the identical quarantine already in place for Nous OAuth (`c90556262`). Closes the parity gap introduced when `c90556262` added Nous-only terminal error handling without a corresponding xAI-oauth path.	2026-05-18 10:28:09 -07:00
konsisumer	226680500d	fix(auth): improve xAI OAuth SSH hint with visual header and auto-detected host	2026-05-18 10:26:55 -07:00
briandevans	bf6eeb3f93	fix(xai-oauth): show "not received" page when loopback callback has no code When xAI's auth backend fails to redirect (e.g. the German "We couldn't reach your app" fallback shown in #27385), users sometimes navigate manually to the bare loopback callback URL — `http://127.0.0.1:<port>/callback` with no query string. The handler used to return 200 "xAI authorization received" for any GET that hit the expected path, because `parse_qs("")` yields no `code` and no `error`, leaving `result` untouched while the success page was still served. The CLI's wait loop, of course, still saw no code and timed out with `AuthError: xAI authorization timed out waiting for the local callback.` The user is left looking at a browser tab that claims success and a terminal that says failure — exactly the contradiction in #27385. This change makes the empty-callback case return 400 with an explicit "not received" page and a hint to retry `hermes auth add xai-oauth`. The wait-loop semantics are unchanged: `result["code"]` and `result["error"]` both stay None, so the CLI still raises a real timeout rather than treating the bare hit as a successful callback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 10:26:00 -07:00
EloquentBrush0x	1fabd6e100	fix(error_classifier): classify xAI Grok entitlement SSE errors as auth When xAI returns a subscription/entitlement error through an SSE ``type=error`` frame, ``_StreamErrorEvent`` is raised with ``status_code=None``. This caused ``_classify_by_status`` (step 2 of ``classify_api_error``) to be skipped entirely, and the Grok-specific phrases ("do not have an active Grok subscription", "out of available resources") appeared in none of the message-pattern lists. The error fell through to ``FailoverReason.unknown (retryable=True)``, burning ``max_retries`` on every affected X Premium+ / SuperGrok user before the agent stopped — and ``_is_entitlement_failure`` was never called because it only fires under ``FailoverReason.auth``. The HTTP 403 path already handled this correctly (``_classify_by_status`` returns ``auth/non-retryable`` for 403). Add an explicit pattern block at step 1 (highest priority, before the ``status_code`` guard) so both code paths route to ``FailoverReason.auth, retryable=False, should_fallback=True`` — matching the 403 path exactly. Add three regression tests in ``Fix D`` section of ``test_codex_xai_oauth_recovery.py``: - primary "do not have an active Grok subscription" phrase - "out of available resources" + "grok" variant - unrelated ``_StreamErrorEvent`` must not be reclassified	2026-05-18 10:24:13 -07:00
teknium1	bc77f79798	chore(release): AUTHOR_MAP entries for Fewmanism + Slimydog21	2026-05-18 10:23:13 -07:00
Fewmanism	0d63661702	fix: latch xAI OAuth callback result	2026-05-18 10:23:13 -07:00
Fewmanism	eac198b6d5	fix: make xAI OAuth callback server threaded	2026-05-18 10:23:13 -07:00
flamiinngo	5613dfea93	fix(security): redact xAI (Grok) API keys in logs xAI is a first-class provider in hermes-agent with its own credential pool entry (XAI_API_KEY / xai-oauth). API keys follow the format xai-<60+ alphanumeric chars> and were absent from _PREFIX_PATTERNS in agent/redact.py. When a key appears raw in log output, tool results, or error messages, it passed through completely unmasked. The ENV-assignment and Bearer header patterns catch the most common cases, but a raw token in a stack trace or debug print had no protection. Verified before fix: redact_sensitive_text("using key xai-ABCD...rstu to call xAI", force=True) # "using key xai-ABCD...rstu to call xAI" <- exposed After fix: # "using key xai-AB...rstu to call xAI" <- masked Five unit tests added to TestXaiToken covering bare token masking, env assignment, short-prefix false positive, company name false positive, and visible prefix in masked output.	2026-05-18 10:21:22 -07:00
Wesley Simplicio	fae0fa4325	fix(tirith): suppress .app lookalike_tld false positives in warn verdicts Tirith flags .app domains with a lookalike_tld finding because the TLD "can be confused with file extensions". This is a false positive for legitimate production APIs (e.g. api.example.app, lark.app). Add _is_app_tld_finding() and a post-parse suppression block in check_command_security(): if the only finding(s) on a warn verdict are lookalike_tld entries for .app, downgrade the action to allow. Mixed findings (e.g. .app + shortened_url) and block verdicts are unaffected. Non-.app lookalike_tld findings (.zip, .exe, etc.) are preserved. Add 15 regression tests covering: .app-only suppression, mixed-finding preservation, non-.app TLD preservation, block-verdict invariance, and the helper's field-name and case-insensitivity behaviour. Closes #24461	2026-05-18 10:20:07 -07:00
Teknium	1634397ddb	fix(compress): abort instead of dropping messages when summary LLM fails (#28102 ) When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path.	2026-05-18 10:19:40 -07:00
teknium	65e0c49b77	chore(release): add AUTHOR_MAP entry for glennc	2026-05-18 10:14:38 -07:00
glennc	9df9816dab	feat(azure-foundry): add Microsoft Entra ID auth Use azure-identity DefaultAzureCredential for keyless Foundry auth. Preserve refreshable callable credentials through OpenAI and Anthropic client paths. Add setup, doctor, auth status, docs, and tests for Entra auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-18 10:14:38 -07:00
Teknium	457fa913b8	chore(deps): regen uv.lock to match pinned versions in pyproject (#28094 ) uv.lock drifted from pyproject.toml after the CVE bumps (#26830) and the 0.14.0 release. The installer's hash-verified tier was failing `uv pip sync --locked` and falling back to unlocked PyPI resolve, producing two warnings on every fresh install. Regen aligns the lockfile: - aiohttp 3.13.4 -> 3.13.3 (matches messaging/slack/homeassistant/sms pin) - anthropic 0.87.0 -> 0.86.0 (matches anthropic extra pin) - hermes-agent 0.13.0 -> 0.14.0 (matches project version) No behavioral changes. `uv lock --check` now passes.	2026-05-18 09:49:15 -07:00
teknium1	9cae9c0166	fix(aux): log sanitizer failures instead of silently swallowing them Match the warning behavior of the parent main-agent path in chat_completion_helpers.py — sanitizer failures should be visible in logs, not silent.	2026-05-18 09:43:44 -07:00
EloquentBrush0x	2fae8fba9c	fix(aux): strip pattern/format keywords from tool schemas on xAI Responses path xAI's /responses endpoint rejects tool schemas that contain pattern or format JSON Schema keywords with HTTP 400. chat_completion_helpers.py already strips these for the main-agent xAI/xai-oauth path (lines 294-302), but _CodexCompletionsAdapter.create() — used for every xAI OAuth auxiliary call (kanban decomposer, profile describer, etc.) — passed raw tool schemas without sanitization. MCP tools that carry pattern/format keywords (common for string fields) silently caused every auxiliary call over xAI OAuth to fail with an HTTP 400, while the main agent worked fine. Parity fix: call strip_pattern_and_format() on the tool list before converting to Responses API format, matching the main-agent guarantee.	2026-05-18 09:43:44 -07:00
EloquentBrush0x	502d03d5a3	fix(kanban): detect cycles in decompose_triage_task sibling-link pre-validation decompose_triage_task inlines SQL INSERTs for atomicity and intentionally bypasses link_tasks() — which calls _would_cycle() per edge. If the LLM emits a cyclic parent graph (e.g. A.parents=[1], B.parents=[0]) the DB write succeeds but every involved child deadlocks in 'todo' forever: recompute_ready() requires all parents to be done, which is impossible when A waits for B and B waits for A. Add a Kahn topological sort over the sibling parent indices in the pre-validation block, before any DB writes. Mirrors the cycle-safety guarantee that link_tasks() provides for manually linked tasks.	2026-05-18 09:40:44 -07:00
Teknium	a86d2ad557	fix(kanban-dashboard): wire onValueChange on OrchestrationPanel Selects (#27893 ) The dashboard SDK's <Select> is a shadcn-style popup that fires onValueChange(value), not native onChange({target:{value}}). The file even has a selectChangeHandler() helper at L213 documenting this: "Older plugin code calls onChange({target:{value}}) which silently never fires." #24547 already fixed the bulk-reassign, workspace-kind, and new-task parent selects. This patch covers the two OrchestrationPanel selects introduced later in #27572 that regressed onto the same broken pattern: - OrchestrationPanel orchestrator_profile picker - OrchestrationPanel default_assignee picker Users opened the popup, picked an option, and the popup closed without firing a PUT to /orchestration — so the orchestrator profile and default assignee dropdowns appeared totally inert. Uses the same selectChangeHandler helper as the other working Selects in the file for consistency. Reported by Exaario.	2026-05-18 09:31:08 -07:00
teknium	f0c6d59148	fix(anthropic): scope MiniMax beta-strip to MiniMax only Cherry-pick of @sharziki's #27022 routed Azure Foundry through _requires_bearer_auth, which also triggered the MiniMax-specific beta-strip in _common_betas_for_base_url — dropping the 1M-context beta from Azure even though Azure needs it for 1M context. Split the strip predicate: introduce _is_minimax_anthropic_endpoint so the fine-grained-tool-streaming and context-1m strips only fire for MiniMax hosts, leaving Azure's bearer-auth header swap intact without losing 1M context. Also add a regression test that asserts Azure gets Bearer auth, the api-version query param, and the context-1m-2025-08-07 beta.	2026-05-18 09:27:18 -07:00
sharziki	73407b1e30	fix(auth): send Bearer auth for Azure Foundry anthropic_messages endpoints Azure AI Foundry's Anthropic-style endpoint requires `Authorization: Bearer` instead of `x-api-key`. Add `azure.com` to `_requires_bearer_auth()` so the existing Bearer path at line 586 fires before the generic third-party branch sets `api_key` (x-api-key). Fixes #26970	2026-05-18 09:27:18 -07:00
Wesley Simplicio	16abb74eab	fix(kanban): use selectChangeHandler for workspace, parent, and bulk-reassign selects (#24547 ) SDK Select fires onValueChange(value) not onChange({target:{value}}), so all three bare onChange handlers silently received undefined from e.target. Replace raw onChange with selectChangeHandler() — the existing helper that wires both onValueChange and a guarded onChange — so selections register regardless of which event the SDK Select dispatches. Closes #24520 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-18 10:48:21 -04:00
LeonSGP	4414a99d8c	fix(kanban): stop forcing dashboard text to all caps (#26413 )	2026-05-18 10:35:18 -04:00
Brian D. Evans	6a20ad6c0a	fix(dashboard): constrain theme picker dropdown height so themes are scrollable (#25213 ) (#25220 ) The header theme picker (`ThemeSwitcher`) renders a `role="listbox"` popup with no `max-height` or overflow. With 20+ community themes installed under `~/.hermes/dashboard-themes/`, the list extends past the viewport and themes at the top or bottom are unreachable — the user reports only 15 of 26 themes visible, with no scrollbar to access the rest. Sibling switchers (`LanguageSwitcher`, `SlashPopover`) already cap their listboxes (`max-h-80 overflow-y-auto` / `max-h-64 overflow-y-auto`); this just brings the theme picker into line. Scoped to the component instead of a global `div[role="listbox"]` CSS rule so other dropdowns aren't affected. `70dvh` matches the user's tested workaround and the `dvh` unit handles mobile browser UI chrome correctly (unlike `vh`). Fixes #25213. Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 10:23:03 -04:00
duyua9	ac1536b19f	fix(web): render object config values structurally (#10949 )	2026-05-18 10:03:25 -04:00
Austin Pickett	609c485fc6	Merge pull request #27971 from NousResearch/austin/fix/goal-statusbar fix(tui): keep /goal verdict out of compact status row	2026-05-18 08:42:33 -04:00
Siddharth Balyan	d9b6f75c0b	refactor(bootstrap): consolidate ACP browser bootstrap into install.{sh,ps1} (#27851 ) * refactor(bootstrap): consolidate ACP browser bootstrap into install.{sh,ps1} Delete 687 lines of duplicated browser bootstrap code from acp_adapter/bootstrap/. All browser installation now routes through dep_ensure -> install.{sh,ps1} --ensure, using agent-browser install for Chromium. install.sh gains ensure_browser() with macOS app-bundle detection and per-distro guidance. Tracking: #27826 * fix(install.sh): add --ignore-scripts to npm install for camofox @askjo/camofox-browser has a dependency (impit) whose postinstall script runs `npx only-allow pnpm`, which fails under npm. Adding --ignore-scripts avoids the spurious failure without affecting functionality. Tracking: #27826 * fix: add explicit return in ensure_browser, narrow exception in entry.py ensure_browser() now returns 0 explicitly on all success paths. _run_setup_browser() catches OSError instead of broad Exception, letting ImportError propagate as a real packaging bug.	2026-05-18 16:36:26 +05:30
Siddharth Balyan	e3a254d65b	feat(dep_ensure): complete Windows bootstrap — dep_ensure + install.ps1 + detection (#27845 ) * feat(dep_ensure): complete Windows bootstrap — dep_ensure + install.ps1 + detection dep_ensure.py gains Windows awareness: PowerShell invocation, platform- specific browser detection, (path, shell) tuple returns. install.ps1 gains -Ensure/-PostInstall modes using npm -g --prefix (aligned with install.sh) and agent-browser install for Chromium. browser_tool.py gains node/ in candidate dirs for Windows .cmd shims. Both install scripts bundled in pip wheel. Tracking: #27826 * fix(install.ps1): add --ignore-scripts to npm install for camofox @askjo/camofox-browser has a dependency (impit) whose postinstall script runs `npx only-allow pnpm`, which fails under npm. Adding --ignore-scripts avoids the spurious failure without affecting functionality. Tracking: #27826 * fix: remove duplicate install scripts from git CI already copies scripts/install.{sh,ps1} into hermes_cli/scripts/ during wheel build. No need to commit copies — .gitignore keeps them out, _find_install_script() falls back to scripts/ for git-clone users. Tracking: #27826 * fix: address review — remove env_extra, fix ps1 error handling - Remove unused env_extra parameter from ensure_dependency() - Invoke-EnsureMode node case now uses Test-Node consistently - Install-AgentBrowser uses throw instead of exit 1	2026-05-18 16:34:24 +05:30
Siddharth Balyan	6f5ec929a1	feat(config): add install-method stamping + Docker detection (#27843 ) * feat(config): add install-method stamping + Docker detection Dockerfile stamps "docker", install.sh stamps "git", and cmd_postinstall stamps "pip" into ~/.hermes/.install_method. detect_install_method() reads the stamp first, then falls back to managed-system / container / .git heuristics. Adds Docker upgrade guidance. Tracking: #27826 * fix(stamp): move Docker stamp to entrypoint, install.sh stamp after print_success The Dockerfile stamp was overwritten by the VOLUME overlay at container start. Moving it to entrypoint.sh ensures it persists. The install.sh stamp now writes after print_success so it only lands on full success.	2026-05-18 16:34:10 +05:30
Teknium	f2fdb9a178	feat(gateway): deliverable mode — ship artifacts as native uploads from any agent surface (#27813 ) The agent can now produce a chart, PDF, spreadsheet, or any other supported file type and have it land in Slack / Discord / Telegram / WhatsApp / etc. as a native attachment, just by mentioning the absolute path in its response. Same primitive works for kanban-worker completions: workers attach artifacts via kanban_complete(artifacts=[...]) and the gateway notifier uploads them alongside the completion message. Changes: - gateway/platforms/base.py: extract_local_files now covers PDFs, docx, spreadsheets (xlsx/csv/json/yaml), presentations (pptx), archives (zip/tar/gz), audio (mp3/wav/...), and html — not just images and video. Image/video extensions still embed inline; everything else routes to send_document via the existing dispatch partition in gateway/run.py. - tools/kanban_tools.py + hermes_cli/kanban_db.py: kanban_complete gains an explicit ``artifacts`` parameter. The handler stashes it in metadata.artifacts (for downstream workers) and the kernel promotes it onto the completed-event payload so the notifier can find it without a second SQL round-trip. - gateway/run.py: _kanban_notifier_watcher now calls a new helper _deliver_kanban_artifacts after sending the completion text. The helper reads payload.artifacts (preferred), falls back to scanning the payload summary and task.result with extract_local_files, then partitions images / videos / documents and uploads each via send_multiple_images / send_video / send_document. - website/docs/user-guide/features/deliverable-mode.md + sidebars.ts: user-facing docs page covering the extension list, the kanban artifacts pattern, and the MCP-for-connector-breadth recommendation. Tests: - tests/gateway/test_extract_local_files.py: 7 new test cases (documents, spreadsheets, presentations, audio, archives, html, chart-pdf canonical case). 44 passing, 0 regressions. - tests/tools/test_kanban_tools.py: 4 new cases covering the artifacts arg shape (list / string / merge with existing metadata / type rejection). 17 passing. - tests/hermes_cli/test_kanban_notify.py: 2 new cases covering full notifier → artifact-upload path and missing-file silent-skip. 12 passing. - E2E (real files, real kanban kernel, real BasePlatformAdapter): worker calls kanban_complete(artifacts=[png,pdf,csv]) → metadata + event payload land → notifier helper partitions correctly → send_multiple_images called once with the PNG, send_document called twice with PDF + CSV. What's NOT in this PR (deferred to follow-ups): - Ad-hoc "research this for two hours, ping the thread when done" slash command — covered today by kanban subscriptions; a dedicated slash command can ride a follow-up PR if needed. - Setup-wizard prompt for recommended MCP servers (Notion, GitHub, Linear, etc.) — docs page lists them; UI is a separate change. Plan and rationale captured in ~/.hermes/docs/perplexity-computer-parity.pdf (local doc, not shipped).	2026-05-18 02:14:43 -07:00
Teknium	dadc8aa255	fix(kanban): surface unusable triage auxiliary model (auto-decompose aware) (#27871 ) Adds a 'triage_aux_unavailable' diagnostic for tasks stuck in triage when neither the active aux helper slot nor the main-model auto fallback is usable. Auto-decompose aware: - kanban.auto_decompose=True (default): primary is auxiliary.kanban_decomposer, triage_specifier is the fanout=false fallback. - kanban.auto_decompose=False: primary is auxiliary.triage_specifier (manual 'hermes kanban specify' path). Default aux slots use 'provider: auto' which falls back to the main model, so this rule only fires when both the explicit slot config AND the main-model auto fallback are absent. Quiet by default; informative when there is a real config gap. Also adds kd.config_from_runtime_config() that carries kanban + auxiliary + model keys through to diagnostics, and updates CLI/dashboard call sites to use it. config_from_kanban_config() is preserved for back-compat. Reworks the original PR #25640 idea (@qWaitCrypto) to align with the new auto-decompose dispatcher path landed in #27572. The original PR pointed only at auxiliary.triage_specifier, which is now the fallback rather than the primary helper. Co-authored-by: qWaitCrypto <axmaiqiu@gmail.com>	2026-05-18 01:27:06 -07:00
qWaitCrypto	d9fef0c8ab	fix(kanban): align failure diagnostics with retry limit	2026-05-18 01:22:16 -07:00
qWaitCrypto	6e60a8a092	feat(kanban): make worker log retention configurable	2026-05-18 01:21:41 -07:00
qWaitCrypto	8831eb5c70	fix(kanban): align worker terminal timeout with task runtime	2026-05-18 01:20:52 -07:00
HenkDz	0292398604	fix(acp): use modes for edit auto-approval	2026-05-18 01:19:55 -07:00
HenkDz	f70e0b85dd	feat(acp): add session-scoped edit auto-approval	2026-05-18 01:19:55 -07:00
HenkDz	49b28d1646	fix(acp): avoid duplicate edit approval diffs	2026-05-18 01:19:55 -07:00
HenkDz	9592e595a2	feat(acp): require approval for editor file edits	2026-05-18 01:19:55 -07:00
HenkDz	060ec02858	docs: add ACP Zed edit approval diffs plan	2026-05-18 01:19:55 -07:00
teknium1	0fa46c613b	fix(yuanbao): persist message_id on @bot user transcript writes Yuanbao's QuoteContextMiddleware has a transcript-lookup fallback for when quote.desc is empty: it scans the session transcript for the quoted message_id and pulls ybres anchors out of its content. That fallback works for observed (silent) group messages because the platform writer attaches message_id (yuanbao.py:2091). It silently fails for @bot agent-processed messages because gateway/run.py wrote them as {role:user, content, timestamp} with no message_id, so quoting an earlier @bot turn that contained an image/file couldn't be resolved. Fix: attach event.message_id to the user transcript entry at all three write sites in gateway/run.py — the agent_failed_early branch, the no-new-messages edge case, and the normal agent path (first user-role entry in new_messages). Surfaces gap reported in #27425 (loongfay) using the existing fallback already on main; no new caches needed. Co-authored-by: loongfay <loongfay@users.noreply.github.com>	2026-05-18 01:19:41 -07:00
kshitij	41f1eddee3	refactor(doctor): extract section banner + fail-and-issue helpers (#27830 ) `hermes_cli/doctor.py` had two recurring patterns: 1. 15 section headers of the form `print() ; print(color("◆ Name", Colors.CYAN, Colors.BOLD))` bracketed by 3-line `# =====` / `# Check: X` / `# =====` comment banners. 2. Paired `check_fail(...) ; issues.append(...)` for every diagnostic that emits both a user-visible failure and an auto-fix instruction. Add two helpers and collapse the patterns: def _section(title): print() print(color(f"◆ {title}", Colors.CYAN, Colors.BOLD)) def _fail_and_issue(text, detail, fix, issues): check_fail(text, detail) issues.append(fix) Replacements: - 15 `# =====/# X/# =====` banner triples + section header pairs compressed to `_section(...)` - All 18 `check_fail + issues.append` pairs collapsed to `_fail_and_issue(...)` (single-line where the call fits under 120 chars, multi-line where it doesn't) - Net -5 LOC (`+128 / -133`) The LOC delta is modest after wrapping long calls onto multi-line form for readability — the real win is uniform call shape and removal of two parallel-pattern footguns. There is now exactly one way to emit a diagnostic that pairs a user-visible failure with a fix instruction. Behavior is byte-identical. `_section` produces the same blank line + bold-cyan output the inline two prints did, and `_fail_and_issue` does the same `check_fail + issues.append` sequence in the same order. Verified empirically by diffing live `run_doctor()` stdout from this branch against `origin/main` — `diff -q` reports zero differences. Test plan: - All 69 tests across test_doctor.py, test_doctor_command_install.py, and test_doctor_dedicated_provider_skip.py pass - `ruff check hermes_cli/doctor.py` clean - Live `run_doctor()` output byte-identical to origin/main Refs #23972 (Phase 2 tracker — dedup-only refactor in line with the "net-LOC-negative" discipline).	2026-05-18 00:45:25 -07:00
Teknium	94c523f0c5	docs(session_search): update all docs for the single-shape rewrite (#27840 ) Companion PR to #27590. Sweeps remaining stale references to the LLM-summary path that landed in main with #27590 but weren't fully caught in the followup cleanup commit. Real rewrites: - user-guide/sessions.md: 'Session Search Tool' section rewritten to describe the three calling shapes (discovery / scroll / browse) with worked examples. Adds the 'Optional parameters' subsection covering sort and role_filter. - user-guide/features/memory.md: 'Session Search' overview rewritten, comparison table updated (speed: ms instead of LLM summarization, added explicit free-cost row, link to sessions.md for details). Stale-claim sweeps: - user-guide/configuring-models.md: drop the 'Session Search' row from the aux-model override table (no aux model anymore), drop session search from the auxiliary-models list. - user-guide/features/codex-app-server-runtime.md: drop session_search from the ChatGPT-subscription cost note, drop the session_search block from the per-task override config example. - developer-guide/provider-runtime.md: drop 'session search summarization' from the auxiliary tasks list. - developer-guide/agent-loop.md: drop session search from the auxiliary fallback chain list. - user-guide/skills/.../autonomous-ai-agents-hermes-agent.md: drop session_search from the 'auxiliary models not working' debug step. Untouched (still accurate as tool-name mentions, not behavioral claims): - features/tools.md, features/honcho.md, features/acp.md - cli.md, sessions.md (other sections) - developer-guide/tools-runtime.md, agent-loop.md (line 157) - acp-internals.md, adding-tools.md, prompt-assembly.md - reference/toolsets-reference.md, reference/tools-reference.md	2026-05-18 00:36:17 -07:00
wysie	ff078738ea	fix(skills): load symlinked skill slash commands	2026-05-18 00:34:29 -07:00
Teknium	abf1af5401	feat(session_search): single-shape tool with discovery, scroll, browse — no LLM (#27590 ) * feat(session_search): single-shape tool with discovery, scroll, browse — no LLM Replaces the LLM-summarized session_search with a single-shape tool that returns actual messages from the DB. Three calling shapes inferred from args (no mode parameter): 1. Discovery — pass query. FTS5 + anchored ±5 window + bookends per hit, all in one call. ~20ms on a real DB instead of ~90s for the previous three aux-LLM calls. 2. Scroll — pass session_id + around_message_id. Returns a window centered on the anchor. To paginate, re-anchor on the first/last id of the returned window. Boundary message appears in both windows as the orientation marker. ~1ms per scroll call. 3. Browse — no args. Recent sessions chronologically. Bookend_start (first 3 user+assistant msgs) and bookend_end (last 3) give the agent goal + resolution on every discovery hit, so a single tool call reconstructs a long session's arc without loading the whole transcript. The aux-LLM summary path is gone: it cost ~$0.30/call, took ~30s, and laundered FTS5 hits through a model that could confabulate when the right session wasn't in the hit list. The merged shape returns byte-for-byte content from SQLite. History: - PR #20238 (JabberELF) seeded the fast/summary dual-mode split. - PR #26419 (yoniebans) expanded to fast/guided/summary with bookends, multi-anchor drill-down, default-mode config, and a teaching skill. This PR collapses that toolkit into one shape with explicit scroll support, drops the summary path, drops the mode parameter, drops the config knob, drops the skill. JabberELF's seed work is acknowledged via the AUTHOR_MAP entry. Validation: - 38/38 tool tests pass (tests/tools/test_session_search.py) - 12/12 get_messages_around tests pass (tests/hermes_state/) - 11/11 get_anchored_view tests pass (tests/hermes_state/) - Full tests/tools/ run: 5168 passing, 2 failures pre-exist on main (test ordering in test_delegate.py, unrelated) - E2E against live state DB: discovery 20ms, scroll 1ms, browse 280ms; pagination forward+backward works with boundary-message orientation; error paths return clean tool_error responses Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com> * chore(session_search): prune dead LLM-summary config and docs Companion to the single-shape rewrite. The auxiliary.session_search config block, max_concurrency / extra_body tunables, and matching docs sections all referenced the removed LLM summarization path. Removing them so users don't try to tune knobs that nothing reads. - hermes_cli/config.py: drop dead auxiliary.session_search block from DEFAULT_CONFIG. Leftover keys in user config.yaml are harmless and ignored. - hermes_cli/tips.py: drop two tips referencing the removed max_concurrency / extra_body knobs. - website/docs/user-guide/configuration.md: drop 'Session Search Tuning' section and the auxiliary.session_search block from the example. - website/docs/user-guide/features/fallback-providers.md: drop session_search rows from the auxiliary-tasks tables and the dedicated tuning subsection. - website/docs/reference/tools-reference.md: rewrite the session_search entry to describe the new three-shape behaviour. - CONTRIBUTING.md: update the file-tree description. - tests/tools/test_llm_content_none_guard.py: remove TestSessionSearchContentNone class and test_session_search_tool_guarded — both guard against an unguarded .content.strip() call site in _summarize_session() that no longer exists. Validation: 97/97 targeted tests still pass (hermes_state + session_search + llm_content_none_guard). Config tests 55/55. --------- Co-authored-by: JabberELF <abcdjmm970703@gmail.com> Co-authored-by: yoniebans <jonny@nousresearch.com>	2026-05-17 23:28:45 -07:00
teknium1	4a3f13b47b	perf(prompt-cache): date-only timestamp + loud gateway-DB roundtrip logging The system prompt's 'Conversation started:' line carried minute precision (%I:%M %p), making it byte-unstable across every rebuild path. Within a CLI session the in-memory cache held, but on the gateway path (fresh AIAgent per turn → restore from session DB), any silent failure in the read or write path dropped the cache stem and forced a full re-prefill on every subsequent turn. Local prefix-caching backends (llama.cpp / vLLM) saw this as KV-cache invalidation; remote prefix-caching providers saw it as an Anthropic-style cache miss. Three changes: 1. Date-only timestamp ('Sunday, May 17, 2026' instead of '... 03:42 PM'). System prompt now byte-stable for the full day. The model can still query exact time via tools when it actually needs it. Credit: @iamfoz (PR #20451). 2. Loud logging on session DB write failures. The update_system_prompt call used to log at DEBUG, hiding disk-full / locked-database / schema drift behind a silent fall-through that forced fresh rebuilds on every subsequent turn. Now WARN with the session id and exception so persistent issues show up in agent.log without verbose mode. 3. Three-way stored-state distinction on read. The previous 'session_row.get("system_prompt") or None' collapsed three states into one (missing row / null column / empty string). Now we tell them apart and WARN when a continuing session lands on null/empty (which means the previous turn's write never persisted — every subsequent turn rebuilds and the prefix cache misses every time). The restore block is extracted into _restore_or_build_system_prompt() so the prefix-cache path can be unit-tested in isolation. E2E proof: fresh AIAgent constructed for turn 2 across a minute-boundary sleep restores byte-identical bytes from the session DB. NULL stored prompt fires the new warning. Date-only timestamp survives the rebuild path. All on real SessionDB, no mocks. Tests: - tests/agent/test_system_prompt_restore.py (10 new tests) - tests/run_agent/test_run_agent.py::TestBuildSystemPrompt:: test_datetime_is_date_only_not_minute_precision Closes #20451 (date-only), #18547 (prefix stabilization), #8689 (stabilize timestamp across compression), #15866 (timestamp caching question), #8687 (compression timestamp), #27339 (claim #3: live timestamp in cached system prompt). Co-authored-by: Martyn Forryan <9133432+iamfoz@users.noreply.github.com>	2026-05-17 23:20:37 -07:00
Teknium	9b91377bec	feat(grok): apply OpenAI execution guidance to xAI Grok / xai-oauth models (#27797 ) Grok models hit the same failure modes that OPENAI_MODEL_EXECUTION_GUIDANCE addresses for GPT/Codex: claiming completion without tool calls ('to be honest, I didn't create the file yet'), suggesting workarounds instead of using existing tools (proposing a folder-based memory system when the memory tool exists), replying with plans instead of executing. TOOL_USE_ENFORCEMENT_GUIDANCE was already injected for any model whose name contains 'grok' (TOOL_USE_ENFORCEMENT_MODELS). This extends the follow-on family-specific block — OPENAI_MODEL_EXECUTION_GUIDANCE (tool_persistence / mandatory_tool_use / act_dont_ask / prerequisite_checks / verification / missing_context) — to grok-named models too. The OPENAI_ prefix is retained for backwards compat with imports/tests; docstring + inline comment now note that the body is family-agnostic and the prefix reflects origin, not exclusivity. Tests cover the OpenRouter slug (x-ai/grok-4.3) and the xai-oauth bare name (grok-4.3), plus a negative control on claude. E2E verified against a real AIAgent build of the system prompt for both xai-oauth and openrouter grok models.	2026-05-17 23:00:37 -07:00
teknium1	43e566f77e	docs(fallback): document layered auxiliary fallback ladder Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-main (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (push) Waiting to run Details Tests / e2e (push) Waiting to run Details Adds a new 'Auxiliary Capacity-Error Fallback' section to website/docs/user-guide/features/fallback-providers.md covering: - The 4-step ladder (primary → fallback_chain → main agent → warn) - Which errors trigger fallback (402, 429 quota, connection) vs which respect explicit provider choice (transient 429 rate limits) - Optional fallback_chain config schema with vision + compression examples - Recognized quota-error phrases (Bedrock, Vertex AI, generic) Updates the bottom summary table — every auxiliary task now shows 'Layered (see above)' instead of 'Auto-detection chain' since explicit-provider users also get the main-agent safety net.	2026-05-17 17:15:31 -07:00

1 2 3 4 5 ...

8746 commits