hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-19 15:18:03 +00:00

Author	SHA1	Message	Date
Austin Pickett	6bc5d72271	Merge pull request #16419 from vincez-hms-coder/feat/dashboard-profiles-hms-coder feat(dashboard): add profiles management page	2026-04-30 12:09:23 -07:00
ethernet	b737af8226	Merge pull request #18047 from stephenschoettler/fix/acp-persist-user-message-test-mocks test(acp): accept prompt persistence kwargs in MCP E2E mocks	2026-04-30 14:43:26 -04:00
Teknium	76edc40ab0	fix(agent): extend thinking-mode reasoning_content pad to Kimi/Moonshot Builds on #16855 (@lsdsjy) which fixed DeepSeek v4 reasoning_content replay via model_extra fallback + capturing tool_calls at method entry. Kimi / Moonshot thinking mode enforces the same echo-back contract and hits the same 400 when a tool-call turn is persisted without reasoning_content. - _build_assistant_message: pad branch now uses _needs_thinking_reasoning_pad() (DeepSeek OR Kimi) instead of _needs_deepseek_tool_reasoning() alone. - Extract _needs_thinking_reasoning_pad() and reuse it in _copy_reasoning_content_for_api so both sites share one predicate. - tests/run_agent/test_deepseek_reasoning_content_echo.py: add TestBuildAssistantMessagePadsStrictProviders parametrized over DeepSeek (attr=None, attr-absent), Kimi (attr=None), Moonshot (via base_url), and an OpenRouter negative control that must NOT pad. Proven to fail 2/5 cases on Kimi/Moonshot without this change. - scripts/release.py: add AUTHOR_MAP entries for lsdsjy and season179. Refs #17400. Co-authored-by: season179 <season.saw@gmail.com>	2026-04-30 11:18:39 -07:00
lsdsjy	b9b9ee3e6c	fix(deepseek): preserve v4 reasoning_content on replay	2026-04-30 11:18:39 -07:00
Stephen Schoettler	699a9c11a9	test(acp): accept prompt persistence kwargs in mocks	2026-04-30 10:47:23 -07:00
Teknium	d60a9917d3	feat(curator): show most-used and least-used skills in `hermes curator status` (#18033 ) Alongside the existing 'least recently used' section, surface two more rankings so users can see which of their agent-created skills actually get exercised: - 'most used (top 5)' — sorted by use_count descending. Hidden when every skill has use_count=0 (noise suppression on fresh installs). - 'least used (top 5)' — sorted by use_count ascending. Always shown when the catalog is non-empty. use_count started tracking real agent skill activation in PR #17932 (bump_use wired into skill_view tool + slash invocation + --skill preload), so these rankings are now meaningful. Tests: 3 new in tests/hermes_cli/test_curator_status.py — happy path with mixed use_counts, zero-use suppression of the most-used section, and the no-skills clean-empty case.	2026-04-30 10:37:33 -07:00
y0shualee	f4b76fa272	fix: use skill activity in curator status Treat skill views and edits as activity when curator reports and applies lifecycle transitions, so recently loaded or patched skills are not displayed or transitioned as never used.\n\nAdds regression tests for activity derivation, automatic transitions, and CLI status output.	2026-04-30 10:31:47 -07:00
0xDevNinja	564a649e6a	fix(curator): scan nested archive subdirs in restore_skill restore_skill() in tools/skill_usage.py used archive_root.iterdir(), which only walked the top level of .archive/. Skills archived under nested layouts (e.g. .archive/openclaw-imports/<skill>/ from older archive paths or external imports) were invisible to both the exact-match and prefix-match candidate scans, surfacing as a misleading "skill '<name>' not found in archive" error even though the directory existed on disk. Switch both candidate scans to archive_root.rglob('*') so the lookup descends into category subdirectories. Fixes #17942	2026-04-30 10:31:44 -07:00
Teknium	8b290a5908	feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941 ) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.	2026-04-30 10:31:23 -07:00
Henkey	cdf9793d6d	fix(acp): advertise and forward image prompts	2026-04-30 10:31:16 -07:00
Brooklyn Nicholson	b9d9fa7df8	fix(tui): respect max turns config Co-authored-by: YuShu <24110240104@m.fudan.edu.cn>	2026-04-30 12:26:57 -05:00
ethernet	d499d17271	Merge pull request #17969 from stephenschoettler/fix/current-main-test-regressions fix(ci): stabilize current main test regressions	2026-04-30 13:23:38 -04:00
brooklyn!	285e9efb3f	Merge pull request #17701 from NousResearch/bb/mouse-mode-self-heal fix(cli): recover leaked mouse tracking terminal state	2026-04-30 10:09:39 -07:00
Stephen Schoettler	407dfbb021	fix(ci): stabilize current main test regressions	2026-04-30 06:36:50 -07:00
Teknium	4c792865b4	test(gateway): pin cleanup invariants for #17758 in-band drain hand-off Belt-and-suspenders on top of @briandevans' #17758 fix. The in-band drain hand-off (await->create_task + session-guard preservation) changed cleanup semantics in three places that the original PR reasoned about but didn't test directly. Pin each invariant so a future refactor can't silently regress them: 1. Normal single-message path still releases _active_sessions[sk] and _session_tasks[sk] through end-of-finally. The #17758 follow-up moved _release_session_guard under if current_task is self._session_tasks.get(session_key) For the 99%-common case current_task IS the stored task, so the guard must still fire. Test would fail if the conditional were ever tightened in a way that dropped the normal path. 2. Drain-task cancellation releases the session. If the drain task spawned by the in-band hand-off is cancelled mid-handler (e.g. /stop fired while draining a follow-up), its own finally must fire _release_session_guard. Without this a cancel would leave the session permanently pinned busy. 3. Late-arrival drain still spawns when no in-band drain preceded it. Pre-existing path, but the #17758 follow-up added a re-queue branch that only fires when ownership was already handed off. When no handoff happened the else branch must still spawn a fresh drain task — otherwise a message arriving during stop_typing gets silently dropped. All three tests pass against current main. Zero production code changes.	2026-04-30 05:00:25 -07:00
konsisumer	d1d0ef6dbd	fix(gateway): persist user message on transient agent failures (#7100 ) The #1630 fix introduced a blanket ``agent_failed_early`` transcript skip to prevent context-overflow sessions from looping. That guard also triggers for unrelated transient failures (429 rate limits, read timeouts, connection resets, provider 5xx) which have nothing to do with session size — and it silently drops the user's message, so the agent has no memory of the last turn on retry. Split the failure classification in ``GatewayRunner._run_agent``: * Context-overflow (``compression_exhausted`` flag, explicit context-length phrases, or generic 400 with a long history) → keep the existing skip, preserving the #1630/#9893 fix. * Anything else that failed → persist just the user message so the conversation survives a retry. Use specific multi-word phrases (``context length``, ``token limit``, ``prompt is too long``, etc.) to match ``run_agent.py``'s own classifier; bare ``exceed`` false-positively flagged "rate limit exceeded" as context overflow. Covered by new tests in ``tests/gateway/test_7100_transient_failure_transcript.py`` and the existing #1630 suite still passes.	2026-04-30 04:32:33 -07:00
Teknium	87f5e1a25a	test(ssh): update tar pipe assertion for --no-overwrite-dir Existing test_tar_pipe_commands asserted the literal substring 'tar xf - -C /' in ssh_str, which is no longer present after the #17767 fix adds --no-overwrite-dir between 'tar xf -' and '-C /'. Split the one substring check into three independent assertions for the tar stdin mode, the new --no-overwrite-dir flag (regression guard for #17767), and the extract target.	2026-04-30 04:32:28 -07:00
Teknium	b50bc13ef9	fix(config): preserve YAML lists in hermes config set (#17876 ) _set_nested unconditionally replaced any non-dict value with an empty dict when walking the dotted path, which silently destroyed list-typed config nodes the moment someone set a value with a numeric index (e.g. 'hermes config set custom_providers.0.api_key NEW'). Any sibling entries and any fields inside the targeted entry that the user didn't write were lost. Fix: - _set_nested now detects list nodes and navigates by numeric index, and preserves both dicts AND lists at intermediate positions (scalars are still replaced so bare-scalar -> nested overrides keep working). - set_config_value drops its duplicated navigation logic and calls _set_nested instead -- single source of truth for the rules. Regression tests (tests/hermes_cli/test_set_config_value.py): - test_indexed_set_preserves_sibling_list_entries -- exact #17876 repro - test_indexed_set_preserves_non_targeted_fields -- inner-dict fields survive - test_deeper_nesting_through_list -- dict -> list -> dict -> scalar path 35/35 existing + new tests pass. E2E-verified with the issue's repro against a real on-disk config.yaml -- list stays a list, entry 0 updated, entry 1 intact. Closes #17876	2026-04-30 04:32:17 -07:00
Teknium	3fc4c63d38	test(model_switch): update regression to reflect bare-custom guard	2026-04-30 04:32:11 -07:00
Sanjays2402	e0fa2cf972	fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335 ) Long-lived Gateway processes were sending duplicate tool names to providers that enforce uniqueness: - DeepSeek: 'Tool names must be unique.' - Xiaomi MiMo: 'tools contains duplicate names: lcm_expand' - Moonshot/Kimi: 'function name lcm_grep is duplicated' TUI was unaffected because TUI runs with quiet_mode=False and skips the cache entirely. Root cause (two layered bugs) - model_tools.get_tool_definitions(quiet_mode=True) memoizes its result in _tool_defs_cache. The cache-hit path returned list(cached) (safe), but the FIRST uncached call stored and returned the SAME object. run_agent.py mutates self.tools (memory + LCM context-engine schemas) in-place, so the very first agent init in a Gateway process poisoned the cache, and every subsequent init appended LCM schemas again on top of the already-polluted list. - run_agent.py's context-engine injection (lcm_grep / lcm_describe / lcm_expand) had no dedup, unlike the memory-tools injection right above it which already skips already-present names. Fix (defense in depth, per the issue's suggested fix) - model_tools.get_tool_definitions: on the uncached branch, cache the computed list but return list(result) to the caller. Same pattern as the cache-hit path. - run_agent.py: build _existing_tool_names from self.tools and skip schemas whose names are already present, mirroring the memory-tools block. This also defends against plugin paths that may register the same schemas via ctx.register_tool(). Tests (tests/test_get_tool_definitions_cache_isolation.py) - test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without it, first-call alias caused all the symptoms. - test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays. - test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent appending lcm_grep / lcm_expand to the returned list and asserts the next call doesn't see them. - test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the long-lived Gateway accumulation pattern across 5 agent inits. - test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI was fine. 5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.	2026-04-30 04:32:06 -07:00
Rob Moen	0dd373ec43	fix(context): honor model.context_length for Ollama num_ctx and all display paths When a user sets model.context_length in config.yaml, the value was only used for Hermes' internal compression decisions (context_compressor) but NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF metadata (often 256K+) and allocates that much VRAM regardless of the user's config — causing OOM on smaller GPUs like the P100 (16GB). Root cause: two separate context values existed independently: - context_compressor.context_length = config value (e.g. 65536) ✓ - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config Changes: 1. Cap Ollama num_ctx to config context_length (run_agent.py) When model.context_length is explicitly set and no explicit ollama_num_ctx override exists, cap the auto-detected GGUF value to the user's context_length. This is the core fix — it prevents Ollama from allocating more VRAM than the user budgeted. 2. Pass config_context_length through all secondary call sites Several paths called get_model_context_length() without the config override, falling through to the 256K default fallback: - cli.py: @-reference expansion and /model switch display - gateway/run.py: @-reference expansion and /model switch display - tui_gateway/server.py: @-reference expansion - hermes_cli/model_switch.py: resolve_display_context_length() 3. Normalize root-level context_length in config (hermes_cli/config.py) _normalize_root_model_keys() now migrates root-level context_length into the model section, matching existing behavior for provider and base_url. Users who wrote `context_length: 65536` at the YAML root instead of under `model:` had it silently ignored. 4. Fix misleading comments (agent/model_metadata.py) DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K as two comments stated. Tests: 3 new tests for root-level context_length normalization. All existing context_length tests pass (96 tests).	2026-04-30 04:31:23 -07:00
Bartok9	fbb3775770	fix(gateway): enforce auth check in busy-session path to prevent unauthorized injection (#17775 ) The busy-session handler (_handle_active_session_busy_message) bypassed the authorization gate that the cold path enforces via _is_user_authorized(). In shared-thread contexts (Slack threads, Telegram forum topics, Discord threads) where thread_sessions_per_user=False (the default), all participants share one session_key. An unauthorized user posting in the same thread as an authorized user would hit the active-session branch, skip the auth check, and have their text merged into _pending_messages or injected via agent.interrupt(). This commit adds the same _is_user_authorized() check at the top of the busy handler, before any message queuing, steering, or interrupt logic. Unauthorized messages are silently dropped (return True) with a warning log — matching the cold-path behavior. Affected platforms: Slack, Telegram, Discord, any adapter with shared-session thread contexts. Closes #17775	2026-04-30 04:29:15 -07:00
briandevans	cc5b9fb581	fix(transport): omit thinking_config for Gemma on the gemini provider (#17426 ) The `gemini` provider also serves Gemma (e.g. `gemma-4-31b-it`) and historically other Google models like PaLM. Those reject `extra_body.thinking_config` with HTTP 400: Unknown name "thinking_config": Cannot find field `_build_gemini_thinking_config()` was unconditionally producing a config dict for any model on the `gemini` / `google-gemini-cli` provider, which `ChatCompletionsTransport.build_kwargs` then dropped into `extra_body["thinking_config"]`. The result: every chat turn for Gemma users on the gemini provider blew up at the API edge. The fix is the same shape Hermes already uses for the Gemini-2.5 vs Gemini-3 family clamping: normalise the model id, strip an `OpenRouter`-style `google/` prefix, and short-circuit early when the result doesn't start with `gemini`. We return `None` rather than `{"includeThoughts": False}`, because the API rejects the field name itself — even the polite "off" form trips the same 400. Three regression tests cover Gemma with reasoning enabled, Gemma with reasoning disabled, and the `google/gemma-…` OpenRouter-style id; the existing Gemini-2.5 / Gemini-3 / `google/gemini-…` cases keep passing because the Gemini guard fires after the prefix strip. Fixes #17426 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 04:29:04 -07:00
Teknium	3de8e21683	feat(gateway): native send_multiple_images for Telegram, Discord, Slack, Mattermost, Email Ports PR #17888's send_multiple_images ABC to every gateway platform that has a native multi-attachment API, so images arrive as a single bundled message instead of N separate ones. Native overrides: - Telegram: send_media_group (10 photos per album, chunks over); animated GIFs peeled off and routed through send_animation (albums don't support animations) - Discord: channel.send(files=[...]) (10 attachments per message, chunks over); URL images downloaded into BytesIO so they render inline; forum channels use create_thread with files=[...] - Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over); respects thread_ts; records thread participation - Mattermost: single post with file_ids list (5 per post — Mattermost cap, chunks over) - Email: single SMTP message with multiple MIME attachments (no chunk cap, SMTP size governs); remote URLs remain linked in body (parity with existing send_image) All platforms fall back to the base per-image loop on any failure, so a single bad image in a batch never loses the rest. Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu, WeCom, WeChat, DingTalk) continue to use the base default loop — their server APIs only accept one attachment per message anyway. Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted tests covering base default loop, chunking, animation peel-off, fallback paths, and empty-batch no-ops across all five new overrides. Co-authored-by: Maxence Groine <maxence@groine.fr>	2026-04-30 04:28:08 -07:00
Maxence Groine	04ea895ffb	feat(gateway/signal): add support for multiple images sending Adds a new `send_multiple_images` method to the ``BasePlatformAdapter`` that implements the default "One image per message" loop and allows for platform-specific overriding. Implements such an override for the Signal adapter, batching images and trying (best-effort) to work around rate-limits for voluminous batches using a specific scheduler. Also implements batching + rate-limit handling in the `send_message` tool. New tests added for the Signal adapter, its rate-limit scheduler and the `send_message` tool	2026-04-30 04:28:08 -07:00
VinceZ-Hms-Coder	ca7f46beb5	Merge upstream/main and address Copilot review feedback Merge resolved conflicts in web/src/{i18n/{en,zh,types}.ts,lib/api.ts} by keeping both this branch's `profiles` additions and upstream's new `models` page additions. Copilot review feedback: - Implement POST /api/profiles/{name}/open-terminal endpoint (already present); align Windows branch to `cmd.exe /c start "" <cmd>` so it matches the new test and spawns a fresh window instead of /k reusing the parent console. - Move backslash escaping out of the macOS AppleScript f-string expression (Python <3.12 disallows backslashes inside f-string expression parts). - Patch `_get_wrapper_dir` via monkeypatch in test_profiles_create_creates_wrapper_alias_when_safe so the test no longer writes to the real `~/.local/bin`. - Extend test_dashboard_browser_safe_imports to scan `.ts` files in addition to `.tsx`. - Switch upstream's new ModelsPage.tsx away from the `@nous-research/ui` root barrel onto per-component subpaths to satisfy the stricter scan. - Fix NouiTypography `leading-1.4` -> `leading-[1.4]` so Tailwind actually emits the line-height for the `sm` variant. - Guard ProfilesPage.openSoulEditor against out-of-order responses by tracking the latest requested profile via a ref. - Replace ProfilesPage's hand-rolled setup command with a fetch to `/api/profiles/{name}/setup-command` so the copied command always matches what the backend would actually run (handles wrapper-alias collisions and reserved names correctly). - Wire SOUL.md textarea label `htmlFor` -> textarea `id` so screen readers and clicking the label work as expected.	2026-04-30 06:43:22 -04:00
Heltman	19f9be1dff	fix(tools): serialize concurrent hermes_tools RPC calls from execute_code The sandbox-side `_call()` in both the UDS and file-based transports was not thread-safe, so scripts that call tools from multiple threads (e.g. `ThreadPoolExecutor` over `terminal()`) inside a single `execute_code` run could silently receive each other's responses. Root cause: * UDS transport — a single module-level `_sock` was shared across all threads; the newline-framed protocol has no request-id; and the server-side RPC loop handles one connection serially. With concurrent callers, each thread would `sendall()` then race to `recv()` the next newline-terminated response from the shared buffer, so responses got delivered to the wrong caller. * File transport — `_seq += 1` is a non-atomic read-modify-write, so two threads could allocate the same sequence number and clobber each other's request/response files. Fix: guard `_call()` with a `threading.Lock` in the UDS case (covering send+recv), and guard `_seq` allocation with a lock in the file case. No protocol change. Regression tests cover both the generated-source level (lock is present and used) and an end-to-end concurrency test: running a sandboxed ThreadPoolExecutor of 10 `terminal()` calls against a slow mock dispatcher, asserting every caller sees its own tagged response. The test fails without the fix (10/10 mismatched, matching real-world repro) and passes with it.	2026-04-30 03:31:16 -07:00
Rylen Anil	3858f9419e	fix: handle gateway Ctrl+C shutdown cleanly	2026-04-30 03:29:57 -07:00
Sebastian B	362996e269	fix(runtime_provider): _get_named_custom_provider must honour transport field on v12+ providers dict The v11→v12 migrate_config step writes the API mode for every entry under the new transport: field (per the v12+ schema in _normalize_custom_provider_entry). _get_named_custom_provider read the legacy api_mode: spelling only, so for every migrated config the lookup returned None for the api mode. Downstream, _resolve_named_custom_runtime then falls back through custom_provider.get("api_mode") or _detect_api_mode_for_url(base_url) or "chat_completions". For loopback URLs (proxies, local servers) or unknown hostnames, the URL detector returns None and the resolver silently downgrades the configured codex_responses / anthropic_messages transport to chat_completions. Requests get sent to /v1/chat/completions instead of /v1/responses or /v1/messages and the provider 404s — or worse, returns a usable chat_completions response while skipping the model's reasoning / caching surface. Fix: read both field names — entry.get("api_mode") or entry.get("transport") — at the two match-by-key + match-by-name branches in _get_named_custom_provider. The runtime normaliser _normalize_custom_provider_entry already accepts both spellings; this lifts the same compat into the direct-dict reader so v12+ configs work without going through the shim. Adds three regression tests under tests/hermes_cli/test_user_providers_model_switch.py: - transport field is read on the match-by-key branch - legacy api_mode spelling still works for hand-edited configs - transport is read on the match-by-display-name branch	2026-04-30 03:29:48 -07:00
briandevans	f54935738c	fix(cron): surface agent run_conversation failure flags as job failure run_job() ignored the result's `failed=True` / `completed=False` flags that agent.run_conversation populates on API exhaustion, mid-run interrupts, and model aborts. Because final_response on those paths is often a non-empty error string ("API call failed after 3 retries: Request timed out."), the existing empty-response soft-fail in _process_job did not trip either: the error text was delivered as if it were the agent's reply and last_status was set to "ok" with no error notification. Detect those flags right after the dict-shape guard and raise so the existing except handler builds the proper failure tuple, preserving the agent's error message via result["error"]. Adds a parametrized regression covering: API-retry-exhausted with error text in final_response, completed=False with no final_response, completed=False without an explicit failed flag, and the partial-reply plus failed=True case. Plus a guard that a normal completed=True success result is still treated as success. Fixes #17855 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 03:27:37 -07:00
briandevans	f44f1f9615	fix(gateway): preserve session guard across in-band drain handoff When the in-band pending-message drain spawns a fresh task and transfers ownership via _session_tasks[session_key] = drain_task, the original task still unwinds through the finally block. The drain task picks up the same interrupt_event in its own _process_message_background entry, so an unconditional _release_session_guard(session_key, guard=interrupt_event) at the end of the finally matches and deletes _active_sessions[session_key] while the drain task is still pending its first await. A concurrent inbound message arriving in that handoff window passes the Level-1 guard (no entry exists) and spawns a second _process_message_background for the same session — two agents on one session_key, duplicate responses, duplicate tool calls. Fix: only call _release_session_guard when the current task still owns _session_tasks[session_key]. When ownership has been transferred to a drain task, leave _active_sessions populated; the drain task's own lifecycle releases it. This mirrors the late-arrival drain path in the same finally block, which already leaves both entries alone after handing off. Also reorder stdlib imports in the new regression test file to match the gateway test convention (stdlib before third-party). Regression test: capture _active_sessions[sk] identity at every handler entry across a 2-step in-band drain chain and assert the guard Event identity stays the same. Pre-fix, the original task's finally deletes the entry, the drain task falls through to the `or asyncio.Event()` branch, and a fresh Event is installed — identity diverges. Post-fix, the entry is preserved and the drain task reuses the original Event. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 03:27:08 -07:00
briandevans	663ba9a58f	fix(gateway): drain pending messages via fresh task, not recursion (#17758 ) `_process_message_background` finished a turn, found a queued follow-up, and drained it via `await self._process_message_background(pending_event, session_key)`. Each chained follow-up added a frame to the call stack instead of starting fresh. Under sustained pending-queue activity (e.g. a user sending follow-ups faster than the agent finishes turns) the C stack would exhaust at ~2000 nested frames and SIGSEGV the process. Mirror the late-arrival drain pattern that already exists in the same function: spawn a new `asyncio.create_task(...)` for the pending event and return so the current frame can unwind. The new task takes ownership via `_session_tasks[session_key]`. The late-arrival drain in `finally` could now race with the in-band drain across the `await typing_task` / `await stop_typing` window, so add a guard: if `_session_tasks[session_key]` is no longer the current task, an in-band drain already spawned a follow-up task — re-queue the late-arrival event so that task picks it up after its current event, instead of spawning a second concurrent task for the same session_key. Regression test (`test_pending_drain_no_recursion.py`) chains 12 follow-ups and asserts the recorded `_process_message_background` stack depth stays bounded at handler entry. Pre-fix: depths grow linearly `[1,2,3,…,12]`. Post-fix: all depths are `1`. `test_duplicate_reply_suppression::test_stale_response_suppressed_when_interrupted` called `_process_message_background` directly and implicitly relied on the old recursive `await` semantic — updated to wait for the spawned drain task before checking the sent list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 03:27:08 -07:00
Teknium	8d302e37a8	feat(tts): add Piper as a native local TTS provider (closes #8508 ) (#17885 ) Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the Home Assistant project that supports 44 languages with zero API keys. Adds it as a native built-in provider alongside edge/neutts/kittentts, installable via 'hermes tools' with one keystroke. What ships: - New 'piper' built-in provider in tools/tts_tool.py - Lazy import via _import_piper() - Module-level voice cache keyed on (model_path, use_cuda) so switching voices doesn't invalidate older cached voices - _resolve_piper_voice_path() accepts either an absolute .onnx path or a voice name (auto-downloaded on first use via 'python -m piper.download_voices --download-dir <cache>') - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via get_hermes_dir) - Optional SynthesisConfig knobs: length_scale, noise_scale, noise_w_scale, volume, normalize_audio, use_cuda — passed through only when configured, so older piper-tts versions aren't broken - WAV output then ffmpeg conversion path (same as neutts/kittentts) so Telegram voice bubbles work when ffmpeg is present - Piper added to BUILTIN_TTS_PROVIDERS so a user's tts.providers.piper.command cannot shadow the native provider (regression test included) - 'hermes tools' wizard entry - Piper appears under Voice and TTS as local free, with 'pip install piper-tts' auto-install via post_setup handler - Prints voice-catalog URL and default-voice info after install - config.yaml defaults - tts.piper.voice defaults to en_US-lessac-medium - Commented advanced knobs for discoverability - Docs - New 'Piper (local, 44 languages)' section in features/tts.md explaining install path, voice switching, pre-downloaded voices, and advanced knobs - Piper listed in the ten-provider table and ffmpeg table - Custom-command-providers section updated to drop the Piper example (now native) and add a piper-custom example for users with their own trained .onnx models - overview.md bumps provider count to ten - Tests (tests/tools/test_tts_piper.py, 16 tests) - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH) - _resolve_piper_voice_path across every branch: direct .onnx path, cached voice name, fresh download with correct CLI args, download failure, successful-exit-but-missing-files, empty voice to default - _generate_piper_tts: loads voice once, reuses cache, voice-name download wiring, advanced knobs flow through SynthesisConfig - text_to_speech_tool end-to-end dispatch and missing-package error - check_tts_requirements: piper availability toggles the return value - Regression guard: piper cannot be shadowed by a command provider with the same name - Pre-existing test_tts_mistral test broadened to mock the new piper/kittentts/command-provider checks (otherwise it false-passes when piper is installed in the test venv) E2E verification (live): Actual pip install piper-tts, config piper + en_US-lessac-low, text_to_speech_tool call, voice auto-downloaded from HuggingFace, WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the cache (~60ms). Cache dir populated with .onnx and .onnx.json. This caught a real bug during development: the first pass used '-d' as the download-dir flag; the actual piper.download_voices CLI wants '--download-dir'. Fixed before PR opened.	2026-04-30 02:53:20 -07:00
Teknium	2662bfb756	fix(tests): make test_update_stale_dashboard immune to hermes_cli.main reload (#17881 ) Six tests in this file failed in CI (-n auto) after #17832 landed because other tests on the same xdist worker reload hermes_cli.main: tests/hermes_cli/test_env_loader.py:85-86 sys.modules.pop('hermes_cli.main', None) importlib.import_module('hermes_cli.main') tests/hermes_cli/test_skills_subparser.py:24-25 del sys.modules['hermes_cli.main'] When either ran first on a worker, our top-of-file 'from hermes_cli.main import _kill_stale_dashboard_processes' captured a stale function object whose __globals__ points at the old module dict. patch('hermes_cli.main._find_stale_dashboard_pids', ...) then patched the new module, but the stale function resolved the dependency via its stale __globals__, so every patch became a no-op: pids=[] → early return → no signals, no output, assertions failed. Fix: add an autouse fixture that rebinds the three module-level names to whatever is currently live in sys.modules['hermes_cli.main'] before each test runs. The pollutants in the other two files are load-bearing for their own tests, so fixing it on the consumer side is correct. Repro: pytest tests/hermes_cli/test_env_loader.py tests/hermes_cli/test_update_stale_dashboard.py	2026-04-30 02:46:56 -07:00
Teknium	0da968e521	fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868 ) Voscko reported curator.auxiliary.provider/model was advertised in the docs but ignored — the review fork read only model.provider/default. The narrow fix would wire the one-off key through, but that leaves curator as a parallel system: not in `hermes model` → auxiliary picker, not in the dashboard Models tab, missing per-task base_url/api_key/timeout/ extra_body. Unify curator with the rest of the aux task system so `hermes model` and the dashboard configure it like every other aux task. Four sources of truth updated: - hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary (timeout=600 since reviews run long), drop the one-off curator.auxiliary block from DEFAULT_CONFIG.curator. - hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass') to _AUX_TASKS so the CLI picker offers it. - hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the dashboard REST endpoint accepts it. - web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard Models tab renders the task. agent/curator.py _resolve_review_model() now reads auxiliary.curator first (canonical), falls back to legacy curator.auxiliary (with an info log asking users to migrate), then falls back to the main chat model. Pre-unification users keep working. Docs updated: docs/user-guide/features/curator.md now points at `hermes model` → auxiliary → Curator and the dashboard Models tab. Tests: 6 unit tests on _resolve_review_model (auto default, canonical slot honored, partial override fallback, legacy fallback with deprecation log assertion, new-wins-over-legacy, empty-config safety) plus a cross-registry test that curator is wired into all four sources of truth. test_aux_tasks_keys_all_exist_in_default_config already covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant. Reported by Voscko on Discord.	2026-04-30 02:46:01 -07:00
teknium1	658947480a	fix(acp): drop dead message_id kwarg from replay chunks UserMessageChunk and AgentMessageChunk do not have a message_id field in the ACP schema. Passing it silently dropped the kwarg (pydantic does not raise on unknown init kwargs here) and the subsequent test assertions on .message_id raised AttributeError. Strip the dead plumbing (uuid import, message_id= kwarg on both chunk types, unused session_id/index parameters) and remove the matching .message_id asserts from the test.	2026-04-30 02:45:54 -07:00
Henkey	d2536a72bf	fix(acp): replay session history on load	2026-04-30 02:45:54 -07:00
teknium1	5d253e65b7	fix(openviking): pre-check fs/stat to route file URIs before hitting directory-only endpoints Adds a deterministic pre-check on top of htsh's exception-based fallback: before calling /content/abstract or /content/overview on a non-pseudo URI, probe /api/v1/fs/stat. If the server says the URI is a file, route straight to /content/read instead of eating a failing 500 round-trip. This is the same idea pty819 and chennest independently landed in PRs #12757 and #12937 — merged here on top of htsh's broader fix so we keep pseudo-URI normalization and v0.3.3 browse-shape handling while avoiding the slow exception path on servers that return a raised 500 every time. The exception fallback from #5886 stays in place for environments where fs/stat is unavailable or returns an unfamiliar shape. Also credits pty819, chennest, and htsh in AUTHOR_MAP so future release notes attribute them correctly.	2026-04-30 02:35:29 -07:00
hitesh	10e43edc09	fix(openviking): fallback summary reads to content/read for file URIs OpenViking returns 500 for /content/abstract and /content/overview when URI points to mem_*.md files. Add resilient fallback to /content/read for non-pseudo summary file URIs while preserving pseudo summary normalization. Also add regression tests for fallback behavior.	2026-04-30 02:35:29 -07:00
hitesh	bff8ab0311	test(openviking): add helper regression coverage	2026-04-30 02:35:29 -07:00
Teknium	0ad4f55aa8	feat(dashboard): add --stop and --status flags (#17840 ) `hermes dashboard` is a long-lived foreground server that users often start and forget about, sometimes in a shell they've since closed. We didn't have a way to stop it — users had to find the PID manually. Adds two lifecycle flags that reuse the same detection + termination path the post-`hermes update` cleanup (PR #17832) uses: hermes dashboard --status List running hermes dashboard processes with PID + cmdline. Exit 0, informational. hermes dashboard --stop Terminate all running dashboards (3s grace then force-kill survivors). Exit 0 if none remain, 1 if any couldn't be stopped. Windows uses `taskkill /F` as before. Both flags short-circuit before any fastapi/uvicorn import so they work even on installations where the dashboard extras aren't installed — useful when you're cleaning up after uninstalling. The kill helper gained an optional `reason=...` param so the output reads "(requested via --stop)" instead of the post-update-specific "running backend no longer matches the updated frontend" wording. E2E: `hermes dashboard --status` with nothing running prints the empty message; with a fake `hermes dashboard ...` cmdline spawned via `exec -a`, `--status` lists it, `--stop` terminates it (exit -15), and a follow-up `--status` returns empty.	2026-04-30 02:30:20 -07:00
Teknium	2facea7f71	feat(tts): add command-type provider registry under tts.providers.<name> (#17843 ) Reshape of PR #17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>	2026-04-30 02:29:08 -07:00
Teknium	5b85a7d351	fix(update): kill stale dashboard processes instead of warning (#17832 ) `hermes update` previously just printed a warning when it detected a running `hermes dashboard` process from the previous version, telling the user to kill and restart it themselves. In practice dashboards get started and forgotten, so the warning was routinely ignored and users ended up with a silent frontend/backend mismatch (new JS bundle served against the old in-memory Python backend, e.g. new auth headers the old code doesn't recognise → every API call 401s). The dashboard has no service manager, no PID file, and we don't record the original launch args (--host, --port, --insecure, --tui, --no-open) so we can't auto-restart it. But we CAN stop it, which is what the user wants — the failure mode when the stale process is left alive is worse than the dashboard just being down. - POSIX: SIGTERM, poll for ~3s, SIGKILL any survivors. - Windows: `taskkill /PID <pid> /F`. - Print each PID's outcome plus a one-line restart hint. - Detection logic is unchanged (same ps / wmic scan, same guards against the `pgrep -f` greedy-match trap from #16872 and the #17049 wmic UnicodeDecodeError fix). Also split the old monolithic `_warn_stale_dashboard_processes` into `_find_stale_dashboard_pids` (scan) + `_kill_stale_dashboard_processes` (kill), keeping the old name as an alias so any external callers still work. E2E verified: spawned a fake `hermes dashboard` cmdline via `exec -a 'hermes dashboard …' sleep 300`, ran `_kill_stale_dashboard_processes()`, confirmed SIGTERM exit (-15) and that a post-scan returns an empty PID list.	2026-04-30 01:34:34 -07:00
Teknium	fd0796947f	fix: stabilize CI — TS widen, sys.modules restore, WS subscriber race (#17836 ) Three narrow fixes targeting the remaining red checks after #17828: 1. ui-tui/src/app/slash/commands/ops.ts (Docker Build): /reload-mcp's local params type annotated session_id: string while ctx.sid is string \| null. Widen to string \| null — matches every other rpc call site and the test harness which passes { session_id: null }. Fixes TS2322 on line 86. The rpc signature itself is Record<string, unknown>, so this is purely a local typing fix, no behavioral change. 2. tests/plugins/test_achievements_plugin.py (13 cascading test failures): _install_fake_session_db did a raw sys.modules['hermes_state'] = fake_module without restoration, leaking the fake across xdist worker boundaries. Downstream tests doing from hermes_state import SessionDB got a module whose SessionDB was lambda: fake_db — 6 test_hermes_state.py tests failed with AttributeError: 'function' object has no attribute '_sanitize_fts5_query' / _contains_cjk, and 7 test_860_dedup.py tests failed with TypeError: got unexpected keyword argument 'db_path' (real code calls SessionDB(db_path=...)). Fix: stash monkeypatch on the plugin_api module object in the fixture, and have the helper do monkeypatch.setitem(sys.modules, 'hermes_state', fake_module) for auto-restoration at test teardown. 3. tests/hermes_cli/test_web_server.py (WS race): TestPtyWebSocket::test_pub_broadcasts_to_events_subscribers hit the 30s test timeout on CI. websocket_connect returns after ws.accept() — but /api/events registers the subscriber in _event_channels on the NEXT await (inside _event_lock). A publish immediately after connect could race ahead of registration and be dropped, and the subsequent receive_text() blocked until SIGALRM killed the test. Fix: poll _event_channels after the subscriber connects, before publishing. Validation: scripts/run_tests.sh tests/plugins/test_achievements_plugin.py tests/run_agent/test_860_dedup.py tests/test_hermes_state.py tests/hermes_cli/test_web_server.py 338 passed cd ui-tui && npm run type-check clean cd ui-tui && npm run build clean Remaining red checks are pure infra (Nix ubuntu hits TwirpErrorResponse ResourceExhausted on the GH Actions cache API; Nix macos bounces between npm build openssl-legacy and cache rate-limits) and cannot be fixed in the codebase.	2026-04-30 01:34:08 -07:00
Teknium	aa7bf329bc	feat(gateway): centralize audio routing + FLAC support + Telegram doc fallback (#17833 ) Extracted from PR #17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>	2026-04-30 01:32:31 -07:00
Teknium	26787ce638	test(gateway): isolate plugin adapter imports and guard the anti-pattern Fixes the xdist collision that broke CI on PR #17764, and structurally prevents future plugin-adapter tests from reintroducing it. Problem ------- tests/gateway/test_teams.py (new in this PR) and tests/gateway/test_irc_adapter.py (already on main) both followed the same anti-pattern: sys.path.insert(0, str(_REPO_ROOT / 'plugins' / 'platforms' / '<name>')) from adapter import <Adapter> Every platform plugin ships its own adapter.py, so the bare 'from adapter import ...' races for sys.modules['adapter']. Whichever test collected first in a given xdist worker won; the other crashed at collection with ImportError, and the polluted sys.path cascaded into 19 unrelated test failures across tools/, hermes_cli/, and run_agent/ in the same worker. Fix --- 1. tests/gateway/_plugin_adapter_loader.py (new): shared helper load_plugin_adapter('<name>') that imports plugins/platforms/<name>/adapter.py via importlib.util under the unique module name plugin_adapter_<name>. Zero sys.path mutation, no possibility of collision. 2. tests/gateway/test_irc_adapter.py and tests/gateway/test_teams.py: migrated to the helper. All 'from adapter import ...' statements (including the ones inside test methods) are replaced with module-level attribute access on the loaded module. 3. tests/gateway/conftest.py: new pytest_configure guard that AST-scans every test_.py under tests/gateway/ at session start and fails the run with a pointer to the helper if any test uses sys.path.insert into plugins/platforms/ OR a bare 'import adapter' / 'from adapter import'. Runs on the xdist controller only (skipped in workers). The next plugin adapter test that tries to reintroduce this pattern gets rejected at collection time with a clear remediation message. 4. scripts/release.py: add aamirjawaid@microsoft.com -> heyitsaamir to AUTHOR_MAP so the check-attribution workflow passes. Validation ---------- scripts/run_tests.sh tests/gateway/ 4194 passed scripts/run_tests.sh tests/gateway/test_{teams,irc} 72 passed (both orderings) scripts/run_tests.sh <11 prev-failing test files> 398 passed Guard triggers correctly on both Path-operator and string-literal forms of the anti-pattern.	2026-04-30 01:19:34 -07:00
Aamir Jawaid	b3137d758c	feat(teams): add Microsoft Teams platform adapter as a plugin Hello! I am the maintainer of the microsoft-teams-apps Python SDK and I built this Teams adapter to integrate Microsoft Teams into Hermes. Adds a `plugins/platforms/teams` platform plugin using the new PlatformRegistry system from #17751. The adapter self-registers via `register(ctx)` — no hardcoding in run.py, toolsets.py, or any other core file. Key features: - Supports personal DMs, group chats, and channel posts - Adaptive Card approval prompts with in-place button replacement (Allow Once / Allow Session / Always Allow / Deny) - aiohttp webhook server bridged from the Teams SDK to avoid the fastapi/uvicorn dependency - ConversationReference caching for correct proactive sends in non-DM chats - `interactive_setup()` for `hermes gateway setup` integration - `platform_hint` for LLM context (Teams markdown subset) - 34 tests covering adapter init, send, message handling, and plugin registration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 01:19:34 -07:00
Teknium	21e695fcb6	fix: clean up defensive shims and finish CI stabilization from #17660 (#17801 ) PR #17660 landed a sweep of CI fixes but left three loose ends: 1. tests/cli/test_cli_loading_indicator.py::test_reload_mcp_sets_busy_state_ and_prints_status — /reload-mcp gained a prompt-cache-invalidation confirmation (commit `4d7fc0f37`) that was never wired into this test. The test exercises the loading-indicator path, so pre-approve via config and go straight into _reload_mcp(). 2. tools/mcp_tool.py _make_tool_handler — the added getattr(server, '_rpc_lock', None) + 'skip the lock if missing' branch is inconsistent with four sibling call sites that still direct-access server._rpc_lock. The lock is guaranteed by MCPServerTask.__init__; falling through to an unlocked session.call_tool would silently serialize-strip RPCs if the guard ever triggered. Restore direct access. 3. tui_gateway/server.py _messages_as_conversation — the helper existed only to catch 'TypeError: include_ancestors unexpected' from mocked SessionDBs that don't actually exist. The real SessionDB.get_messages_as_conversation has accepted include_ancestors since introduction, and every test FakeDB in the repo already declares the kwarg. Remove the shim, inline the two call sites.	2026-04-29 23:53:17 -07:00
Teknium	62a5d7207d	feat(plugins): bundle hermes-achievements + scan full session history (#17754 ) * feat(plugins): bundle hermes-achievements, scan full session history Ships @PCinkusz's hermes-achievements dashboard plugin (https://github.com/PCinkusz/hermes-achievements) as a bundled plugin at plugins/hermes-achievements/ and fixes a bug in the scan path that made the plugin only see the first 200 sessions — making lifetime badges (50k tool calls, 75k errors, etc.) unreachable on long-running installs. Changes: - plugins/hermes-achievements/: vendor v0.3.1 verbatim (manifest, dist/, plugin_api.py, tests, docs, README). - plugins/hermes-achievements/dashboard/plugin_api.py: * scan_sessions(): limit=None now scans ALL sessions via SQLite LIMIT -1. Previously capped at 200, so users with 8000+ sessions saw ~2% of their history. * evaluate_all(): first-ever scans run in a background thread so the dashboard request path never blocks. Stale snapshots serve immediately while a background refresh runs. force=True still blocks synchronously for manual /rescan. * _build_pending_snapshot(), _start_background_scan(), _run_scan_and_update_cache(): supporting plumbing + idempotent thread spawn. - tests/plugins/test_achievements_plugin.py: new tests covering the 200-cap regression, the background-scan first-run flow, stale-serve-plus-background-refresh, forced sync rescan, and scan-thread idempotency. - website/docs/user-guide/features/built-in-plugins.md: lists hermes-achievements in the bundled-plugins table and documents API endpoints, state files, and performance characteristics. E2E validated against a real 8564-session ~6.4GB state.db: * Cold scan: 13m 19s (one-time, backgrounded — UI never blocks) * Warm rescan: 1.47s (8563/8564 sessions reused from checkpoint cache) * 57/60 achievements unlocked, 3 discovered — aggregates like total_tool_calls=259958, total_errors=164213, skill_events=368243 correctly surface lifetime badges that the 200-cap made unreachable. Original credit: @PCinkusz (MIT-licensed). Upstream repo remains the staging ground for new badges; this bundle keeps the dashboard feature parity with Hermes core changes. * feat(achievements): publish partial snapshots during cold scan Previously a cold scan on a large session DB (13min on 8564 sessions) showed zero badges for the entire duration, then every badge at once when the scan completed. A dashboard refresh mid-scan was indistinguishable from a fresh install with no history. Now the scanner publishes a partial snapshot to _SNAPSHOT_CACHE every 250 sessions, so each refresh during a cold scan surfaces more badges incrementally. Mechanism: - scan_sessions() takes an optional progress_callback fired every progress_every sessions with (sessions_so_far, scanned, total). - _compute_from_scan() is extracted from compute_all() and gains an is_partial flag that skips writing to state.json — we don't want to record unlocked_at based on a half-complete aggregate that a later session might rebalance. - _run_scan_and_update_cache() installs a publisher callback that builds a partial snapshot, marks it mode='in_progress', and writes it to the cache with age=0 so the UI keeps polling /scan-status and picks up the final snapshot when the scan completes. - Manual /rescan (force=True) disables partial publishing — the caller is blocking on the final result anyway. E2E against real 8564-session state.db (polled cache every 10s): t=10s: cache empty t=20s: 250/8564 scanned, 35 unlocked, 25 discovered t=40s: 500/8564 scanned, 42 unlocked, 18 discovered t=60s: 1000/8564 scanned, 49 unlocked, 11 discovered ... Tests: 9/9 pass (2 new — partial snapshot publication + no-persist-on-partial). Upstream unittest suite: 10/10 pass. * feat(achievements): in-progress scan banner with live % progress Previously the dashboard showed zero badges silently during long cold scans (13min on 8564 sessions). The backend was publishing partial snapshots every 250 sessions, but the bundled UI didn't surface any indicator that a scan was running — it just rendered the main page with whatever counts were currently published and no way for the user to know more progress was coming. UI changes (dist/index.js, dist/style.css): - Added a scan-in-progress banner rendered between the hero and stats when scan_meta.mode is 'pending' or 'in_progress'. Shows: BUILDING ACHIEVEMENT PROFILE… Scanned 1,750 of 8,564 sessions · 20%. Badges unlock as more history streams in. with a pulsing teal indicator and a filling teal/cyan progress bar. Disappears the moment the backend flips to 'full' or 'incremental'. - Added an auto-poller via useEffect — while scanInFlight is true the page re-fetches /achievements every 4s WITHOUT toggling the loading skeleton, so unlock counts tick up visibly without the user refreshing. The effect cleans itself up when the scan finishes. - Added refresh() (re-fetch, no loading flip) alongside the existing load() (full reload, used by the Rescan button). Attribution preserved: - Added a header comment to index.js crediting @PCinkusz (https://github.com/PCinkusz/hermes-achievements, MIT) as the original author, noting the banner is a layered addition on top of the original dist bundle. - Matching header comment in style.css, flagging the new .ha-scan-banner* rules as the local addition. Live-verified end to end: - Spun up `hermes dashboard --port 9229 --no-open` against a fresh HERMES_HOME symlinked to the real 8564-session state.db. - Opened /achievements in a browser, confirmed the banner renders with live progress: 'Scanned 1,000 of 8,564 sessions · 11%' → updates to '1,250 ... · 14%' → '1,750 ... · 20%' without user interaction, matching the backend's partial publications. - Stats row simultaneously climbed from 35 → 49 → 53 unlocked as more history streamed in. - Vision analysis of the rendered page confirms the banner styling matches the rest of the dashboard (dark card bg, teal accent, same small-caps typography, pulsing indicator reusing ha-pulse keyframes).	2026-04-29 23:23:57 -07:00
Teknium	ce0c3ae493	fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765 ) The _CODEX_AUX_MODEL constant had already rotated twice in 6 weeks (gpt-5.3-codex -> gpt-5.2-codex -> now broken again at gpt-5.2-codex) because ChatGPT-account Codex gates which models it accepts via an undocumented, shifting allow-list that OpenAI publishes no changelog for. Any pinned default will keep going stale. Issue #17533 reports the current breakage: every ChatGPT-account auxiliary fallback fails with HTTP 400 "model is not supported" and the 60s pause loop degrades long sessions. Rather than reset the clock with another stale pin (PR #17544 proposes gpt-5.2-codex -> gpt-5.4), remove the hardcoded second-order Codex fallback entirely: - Delete `_CODEX_AUX_MODEL`. - Drop `_try_codex` from `_get_provider_chain()` (the auto chain now ends at api-key providers; 4 rungs instead of 5). - Rename `_try_codex() -> _build_codex_client(model)` and require an explicit model from the caller. No more guessing. - `resolve_provider_client("openai-codex", model=None)` now warns and returns (None, None) instead of silently guessing a stale model ID. - Remove `_try_codex` from the `provider="custom"` fallback ladder (same stale-constant trap). - `_resolve_strict_vision_backend("openai-codex")` routes through `resolve_provider_client` so the caller's explicit model is honored. Codex-main users are unaffected: Step 1 of `_resolve_auto` already uses `main_provider` + `main_model` directly and passes the user's configured Codex model through `resolve_provider_client`, which never touched `_CODEX_AUX_MODEL`. Per-task overrides (`auxiliary.<task>.provider/model`) continue to work and are the supported way to route specific aux tasks through Codex. Users whose main provider fails with a payment/connection error and who have ONLY ChatGPT-account Codex auth will now see the 60s pause without a stale-model-rejection noise line in between -- same outcome, cleaner failure. Closes #17533. Supersedes #17544 (which resets the clock on the same stale-constant problem).	2026-04-29 23:23:50 -07:00

1 2 3 4 5 ...

2941 commits