hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-08 03:01:47 +00:00

Author	SHA1	Message	Date
Oleksii Lisikh	c4b287ba53	feat(i18n): add Ukrainian locale	2026-05-05 17:21:59 -07:00
Nicolò Boschi	3082fa0829	feat(hindsight): probe API for update_mode='append' support, dedupe across processes Mirrors the pattern already shipping in hindsight-integrations/openclaw: probe `<api_url>/version` once per process, gate on Hindsight ≥ 0.5.0. When supported, retains use a stable session-scoped `document_id` (`session_id`) plus `update_mode='append'` so cross-process retains for the same session merge into one document instead of producing N-different-process-stamped duplicates. When unsupported (or probe fails), fall back to the existing per-process unique `f"{session_id}-{start_ts}"` document_id with no `update_mode` — the resume-overwrite fix (#6654) keeps working unchanged on legacy servers. Closes the dedup half of #20115. The proposed `document_id_strategy` config knob isn't needed: auto-detection via the same /version probe the OpenClaw plugin already uses gives the same outcome with no extra config burden, and the choice is purely a function of what the server can do. Plumbing -------- - Module-level helpers (`_meets_minimum_version`, `_fetch_hindsight_api_version`, `_check_api_supports_update_mode_append`) cache the result per api_url so every provider in the process gets one /version round-trip. - One-time WARN logged when the API is older than 0.5.0, telling the user to upgrade for cross-session deduplication. - New instance helper `_resolve_retain_target(fallback_doc_id)` returns `(document_id, update_mode)` based on cached capability. Wired into `sync_turn` and the `on_session_switch` flush path. - For local_embedded mode, the probe URL is taken from the running client (`client.url`) so we hit the actual daemon port rather than the configured default. - `update_mode` is set on the per-item dict; `aretain_batch` already threads `item['update_mode']` into the API call. Tests ----- - `TestUpdateModeAppendCapability` (5 cases): legacy fallback, modern stable+append, per-url cache, one-time warn, flush-on-switch resolves against the OLD session. - Existing `_make_hindsight_provider` factory in the manager-side test file extended to seed `_mode`/`_api_url`/`_api_key`/`_client` and stub `_resolve_retain_target` so the bypass-init pattern keeps working. E2E verified against installed `~/.hermes/hermes-agent`: - Legacy probe (unreachable host) → `legacy-session-<ts>` doc_id, no `update_mode`. - Modern probe (live local_embedded 0.5.6 daemon) → stable `modern-session` doc_id + `update_mode='append'`. - `test_hermes_embedded_smoke.py` passes (90s).	2026-05-05 15:09:59 -07:00
misery-hl	56b4795115	guard kanban worker lifecycle by run id	2026-05-05 15:09:28 -07:00
0xVox	0b9cbc8b23	test(kanban): cover metadata handoff round-trip	2026-05-05 15:09:28 -07:00
Teknium	1fc8733a69	fix(kanban): unify failure counter across spawn/timeout/crash outcomes (#20410 ) The dispatcher's circuit breaker only protected against spawn-side failures (profile missing, workspace mount error, exec failure). Workers that successfully spawned but then timed out or crashed re-queued to ``ready`` with no counter increment, so the next tick re-spawned them — loops forever until someone noticed. Reported externally on Twitter (Forbidden Seeds) and confirmed by walking the kernel: ``enforce_max_runtime`` flipped the task back to ready, emitted a ``timed_out`` event, and never touched ``spawn_failures``; same for ``detect_crashed_workers``. Fix: unify the counter across all non-success outcomes. Schema ------ * ``tasks.spawn_failures`` → ``tasks.consecutive_failures`` * ``tasks.last_spawn_error`` → ``tasks.last_failure_error`` * Migration renames the columns in-place on existing DBs (``ALTER TABLE RENAME COLUMN`` — SQLite >= 3.25) so historical counter values are preserved. Row mappers fall through to the legacy names if both column renames and a migration somehow got out of sync. Counter lifecycle ----------------- New helper ``_record_task_failure(conn, task_id, error, , outcome, release_claim, end_run, event_payload_extra)`` is the single point every non-success outcome funnels through: ``spawn_failed`` → ``_record_spawn_failure`` (kept as alias) calls it with ``release_claim=True, end_run=True`` — transitions running→ready, clears claim, closes run. * ``timed_out`` → ``enforce_max_runtime`` already does the status transition + run close + event emission, then calls ``_record_task_failure`` with ``release_claim=False, end_run=False`` just to bump the counter (and trip the breaker if needed). * ``crashed`` → ``detect_crashed_workers`` same pattern, but the counter increment runs after the main write_txn closes (SQLite doesn't nest write transactions). If the counter hits the breaker threshold (``DEFAULT_FAILURE_LIMIT=5``, same as before), the task transitions to ``blocked`` with a ``gave_up`` event on top of whatever outcome-specific event was already emitted. Reset semantics changed: the counter now clears only on successful ``complete_task`` (and operator ``reclaim_task`` — an explicit "I've looked at this, try again with a fresh budget"). Previously ``_clear_spawn_failures`` ran on every successful spawn, which would have wiped the counter before a timeout could accumulate past threshold — exactly the loop this fix prevents. Diagnostics ----------- * ``_rule_repeated_spawn_failures`` → ``_rule_repeated_failures``. Now fires regardless of which outcome is at fault. Classifies the most recent failure (spawn_failed / timed_out / crashed) from the run history so the title ("Agent timeout x3", "Agent crash x4", "Agent spawn x5") and suggested action (``doctor`` for spawn, ``log`` for timeout/crash) stay outcome-specific without N duplicate rules. * ``_rule_repeated_crashes`` kept as a narrower early-warning at threshold 2 (vs 3 for the unified rule), but now suppresses itself when the unified rule would also fire — avoids double-flagging. * Diagnostic ``data`` payload now carries ``{consecutive_failures, most_recent_outcome, last_error}`` instead of spawn-specific keys. CLI --- * ``Task.consecutive_failures`` / ``Task.last_failure_error`` are the public fields now. Existing callers that referenced the old names get migrated (tests updated in this commit). * Backward-compat: ``DEFAULT_SPAWN_FAILURE_LIMIT``, ``_clear_spawn_failures``, ``_record_spawn_failure`` stay as aliases. Tests ----- * 6 new kernel tests: timeout increments counter, 3 consecutive timeouts trip the breaker (was the reported gap), crash increments counter, reclaim clears counter, completion clears counter, spawn success does NOT clear counter. * Diagnostic tests: updated ``repeated_spawn_failures`` cases to use the new kind name and add a timeout-loop test. * Dashboard API test: spawn_failures column update → consecutive_failures. 389/389 kanban-suite tests pass. Live verification ----------------- Seeded 4 tasks in an isolated HERMES_HOME: 3 timeouts, 4 crashes, 2-spawn-failed + 2-timed-out, and a task that had prior failures but completed successfully. Board correctly shows "!! 3 tasks need attention" (the successful one has no badge because the counter reset). Drawer for the timeout-loop task renders "Agent timeout x3" with most_recent_outcome=timed_out and the "Check logs" suggested action (not the spawn-flavoured "Verify profile"). The successful task has zero diagnostics. Closes the Forbidden-Seeds-reported gap.	2026-05-05 13:55:37 -07:00
LeonSGP43	80c579a9dd	docs(skills): explain restoring bundled skills	2026-05-05 13:46:20 -07:00
brooklyn!	794f48766c	fix(tui): close slash parity gaps with CLI (#20339 ) * fix(tui): close slash parity gaps with CLI Route unsupported /skills subcommands through slash.exec, support /new <name> titles, and handle /redraw natively so TUI behavior matches classic CLI. Also filter gateway-only commands out of the TUI catalog while keeping /status discoverable. * fix(tui): run remaining CLI parity paths natively Forward chat launch flags into the TUI runtime and handle live-session status and skill reloads in the gateway process so TUI state no longer depends on the slash worker's stale CLI instance. * fix(tui): block stale snapshot restores Prevent snapshot restore from running through the isolated slash worker because it mutates disk state without refreshing the live TUI agent. * chore: uptick * fix(tui): guard async session title updates Handle failures from the fire-and-forget session.title RPC so title-setting errors do not surface as unhandled promise rejections while preserving session-scoped messaging.	2026-05-05 15:42:39 -05:00
Teknium	9022804d78	feat(providers): make all 33 providers pluggable under plugins/model-providers/ Every provider profile is now a self-contained plugin under plugins/model-providers/<name>/, mirroring the plugins/platforms/ pattern established for IRC and Teams. The ProviderProfile ABC stays in providers/; the per-provider profile data moves out. - plugins/model-providers/<name>/__init__.py calls register_provider() - plugins/model-providers/<name>/plugin.yaml declares kind: model-provider - providers/__init__.py._discover_providers() lazily scans bundled plugins then $HERMES_HOME/plugins/model-providers/<name>/ (user override path) - User plugins with the same name override bundled ones (last-writer-wins in register_provider) - Legacy providers/<name>.py layout still supported for back-compat with out-of-tree editable installs - Hermes PluginManager: new kind=model-provider; skipped like memory plugins (providers/ discovery owns them); standalone plugins with register_provider+ProviderProfile in their __init__.py auto-coerce to this kind (same heuristic as memory providers) - skip_names extended to include 'model-providers' so the general PluginManager doesn't double-scan the category - 4 new tests in tests/providers/test_plugin_discovery.py covering bundled discovery, user override, and general-loader isolation - Docs updated: website/docs/developer-guide/adding-providers.md, provider-runtime.md, providers/README.md, plugins/model-providers/README.md No API break: auth.py / config.py / doctor.py / models.py / runtime_provider.py / model_metadata.py / auxiliary_client.py / chat_completions.py / run_agent.py all still consume providers via get_provider_profile() / list_providers() — they just now see plugin-discovered entries instead of pkgutil-iterated ones. Third parties can now drop a single directory into ~/.hermes/plugins/model-providers/<name>/ to add or override an inference provider without touching the repo.	2026-05-05 13:40:01 -07:00
kshitijk4poor	20a4f79ed1	feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path Introduces providers/ package — single source of truth for every inference provider. Adding a simple api-key provider now requires one providers/<name>.py file with zero edits anywhere else. What this PR ships: - providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes) - ProviderProfile declarative fields: name, api_mode, aliases, display_name, env_vars, base_url, models_url, auth_type, fallback_models, hostname, default_headers, fixed_temperature, default_max_tokens, default_aux_model - 4 overridable hooks: prepare_messages, build_extra_body, build_api_kwargs_extras, fetch_models - chat_completions.build_kwargs: profile path via _build_kwargs_from_profile, legacy flag path retained for lmstudio/tencent-tokenhub (which have session-aware reasoning probing that doesn't map cleanly to hooks yet) - run_agent.py: profile path for all registered providers; legacy path variable scoping fixed (all flags defined before branching) - Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS, doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER - GeminiProfile: thinking_config translation (native + openai-compat nested) - New tests/providers/ (79 tests covering profile declarations, transport parity, hook overrides, e2e kwargs assembly) Deltas vs original PR (salvaged onto current main): - Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth (were added to main since original PR) - Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their reasoning_effort probing has no clean hook equivalent yet) - Removed lmstudio alias from custom profile (it's a separate provider now) - Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension (resolve_provider special-cases them; adding breaks runtime resolution) - runtime_provider: profile.api_mode only as fallback when URL detection finds nothing (was breaking minimax /v1 override) - Preserved main's legacy-path improvements: deepseek reasoning_content preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M beta recovery, etc. - Kept agent/copilot_acp_client.py in place (rejected PR's relocation — main has 7 fixes landed since; relocation would revert them) - _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing test imports Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Closes #14418	2026-05-05 13:40:01 -07:00
Teknium	f67063ba81	feat(kanban): generic diagnostics engine for task distress signals (#20332 ) * feat(kanban): generic diagnostics engine for task distress signals Replaces the hallucination-specific ``warnings`` / ``RecoverySection`` surface (shipped in PR #20232) with a reusable diagnostic-rule engine that covers five distress kinds in v1 and can be extended without touching UI code. The "something's wrong with this task" signal is no longer limited to phantom card ids. Closes the follow-up from #20232 discussion. New module ---------- ``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule engine. Each rule is a pure function of ``(task, events, runs, now, config) -> list[Diagnostic]``. Registry is a simple list; adding a new distress kind is one function + one import, no UI or API changes required. v1 rule set ----------- * ``hallucinated_cards`` (error) — folds the existing ``completion_blocked_hallucination`` event into the new surface. * ``prose_phantom_refs`` (warning) — folds ``suspected_hallucinated_references``. * ``repeated_spawn_failures`` (error → critical at 2x threshold) — fires when ``tasks.spawn_failures >= 3``; suggests ``hermes -p <profile> doctor`` / ``auth``. * ``repeated_crashes`` (error → critical) — fires after N consecutive ``crashed`` run outcomes with no successful completion between; suggests ``hermes kanban log <id>``. * ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked`` state with no comments / unblock attempts; suggests commenting. Every diagnostic carries structured ``actions`` (reclaim, reassign, unblock, cli_hint, comment, open_docs) that render consistently in both CLI and dashboard. Suggested actions are highlighted; generic recovery actions (reclaim / reassign) are available on every kind as fallbacks. Diagnostics auto-clear when the underlying failure resolves — a clean ``completed``/``edited`` event drops hallucination diagnostics, a successful run drops crash diagnostics, a comment drops stuck-blocked diagnostics. Audit events persist; the badge goes away. API --- ``plugin_api.py``: * ``/board`` now attaches ``diagnostics`` (full list) and ``warnings`` (compact summary with ``highest_severity``) per task. * ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics section auto-opens on flagged tasks. * NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by severity, sorted critical-first. CLI --- * NEW ``hermes kanban diagnostics [--severity X] [--task id] [--json]`` — fleet view or single-task view, matches dashboard rule output so CLI users see the same picture. * ``hermes kanban show <id>`` now renders a Diagnostics section near the top with severity markers + suggested actions. Dashboard --------- * Card badge is severity-coloured (⚠ amber warning, !! orange error, !!! red critical) using ``warnings.highest_severity``. * Attention strip above the toolbar counts EVERY task with active diagnostics (not just hallucinations), severity-coloured, lists affected tasks with Open buttons when expanded. * Drawer's old ``RecoverySection`` replaced with generic ``DiagnosticsSection`` rendering a card per active diagnostic: title + detail + structured data (task-id chips when payload keys look like id lists) + action buttons. Reassign profile picker is inline per-diagnostic. Clipboard fallback uses ``.catch()`` for environments where writeText rejects. * Three-rung severity palette; amber for warning, orange for error, red for critical. Uses CSS variables so theming is straightforward. Tests ----- * NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests covering each rule's positive/negative/threshold paths, severity sorting, broken-rule isolation, and sqlite3.Row integration. * Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty, populated, severity-filtered), ``/board`` exposes both diagnostic list and compact summary with ``highest_severity``. * Existing hallucination-specific test (``test_board_surfaces_ warnings_field_for_hallucinated_completions``) updated to reflect the new contract: warning summary keys by diagnostic kind (``hallucinated_cards``) not event kind. 379 kanban-suite tests pass (+16 net from this PR). Live verification ----------------- Seeded all 5 diagnostic kinds + one clean + one plain-running task (7 total) into an isolated HERMES_HOME, spun up the dashboard, and verified: * Attention strip: shows ``!! 5 tasks need attention`` in the error-severity orange; Show expands to a list of 5 rows ordered critical > error > warning. * Card badges: error tasks render ``!!`` orange, warning tasks render ``⚠`` amber, clean and plain-running tasks render no badge. * Each of the 5 rules opens a correctly-coloured, correctly-styled diagnostic card in the drawer with its specific suggested action. * Live reassign from a diagnostic card flipped ``broken-ml-worker → alice`` and the drawer refreshed with the new assignee + the same diagnostic still firing (correct: spawn_failures counter hasn't reset yet). * CLI ``hermes kanban diagnostics`` prints all 5 in severity order; ``--severity error`` narrows to 3; ``kanban show <id>`` includes the Diagnostics block at the top with suggested action hint. Migration note -------------- The old ``warnings`` shape (``{count, kinds, latest_at}``) is preserved on the API but ``kinds`` now keys by diagnostic kind (``hallucinated_cards``) instead of event kind (``completion_blocked_hallucination``). ``highest_severity`` is a new required field. The dashboard was the only consumer and has been updated in the same commit; external API consumers of the ``warnings`` field will need to update their kind-match logic. * feat(kanban/diagnostics): lead titles with the actual error text The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn N times' titles buried the actual cause in the data section. Operators had to open logs or expand the diagnostic to see WHY the worker is stuck — rate-limit vs insufficient quota vs bad auth vs context overflow vs network blip all looked identical at a glance. New titles: Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance Agent crashed 3x: provider auth error: 401 Unauthorized Agent spawn failed 4x: insufficient_quota: You exceeded your current Detail keeps the full error snippet (capped at 500 chars + ellipsis for tracebacks). Title takes the first line capped at 160 chars. Fallback title if no error recorded stays honest ('no error recorded'). Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total pass (+4). Live-verified on dashboard with 6 seeded scenarios (rate-limit, billing, auth, context, network, spawn-billing) — each card title leads with the actionable error text.	2026-05-05 13:32:42 -07:00
Teknium	d5357f816d	refactor(telegram): make typing thread-id resolver symmetric with send Mirror _message_thread_id_for_typing() with _message_thread_id_for_send(): both now map the General forum topic (thread id "1") to None upfront. That removes the need for the retry-without-thread fallback in send_typing() entirely — if _message_thread_id_for_typing() returns a non-None value, it's a real user-created topic and falling back to the root chat is never correct. If Telegram rejects the typing action (e.g. topic deleted mid-session), we swallow it at debug level instead of bleeding the indicator into All Messages. Updates the General-topic typing regression test to assert the new single-call contract.	2026-05-05 13:28:08 -07:00
helix4u	41545f7ec5	fix(telegram): keep DM topic typing scoped	2026-05-05 13:28:08 -07:00
Siddharth Balyan	3b750715a3	fix: resolve lazy session creation regressions (#18370 fallout) (#20363 ) Fix three regressions introduced by PR #18370 (lazy session creation): 1. _finalize_session() uses stale session_key after compression (#20001) 2. session_key not synced after auto-compression in run_conversation (#20001) 3. pending_title ValueError leaves title wedged forever (#19029) 4. Gateway silently swallows null responses when agent did work (#18765) 5. One-time cleanup for accumulated ghost compression continuations (#20001) Changes: - tui_gateway/server.py: _finalize_session() now uses agent.session_id (falls back to session_key when agent is None). Refactor _sync_session_key_after_compress() with clear_pending_title and restart_slash_worker policy flags. Call it post-run_conversation() to sync session_key after auto-compression. Add ValueError handler to pending_title flush. - gateway/run.py: Extract _normalize_empty_agent_response() helper that consolidates failed/partial/null response handling. Surfaces user-facing error when agent did work (api_calls > 0) but returned no text. - hermes_state.py: Add finalize_orphaned_compression_sessions() — marks ghost continuation sessions as ended (non-destructive, preserves data). - cli.py: One-time startup migration for orphaned compression sessions. Test changes: - tests/test_tui_gateway_server.py: Update pending_title ValueError test for post-#18370 architecture (title applied post-message, not at create). - tests/test_lazy_session_regressions.py: 14 new regression tests covering all fixed paths.	2026-05-06 01:11:49 +05:30
Traemond Anderson	60235dba5e	feat(cli): add list_picker_providers for credential-filtered picker The Telegram/Discord /model pickers currently call list_authenticated_providers(), which returns every provider whose credentials resolve locally and every model in its curated snapshot. Two failure modes fall out: - OpenRouter rows can include IDs the live catalog no longer carries. - Provider rows can surface with zero callable models (e.g. a slug whose credential pool entry exists but has nothing behind it). list_picker_providers() wraps the base function and post-processes the result so the interactive picker only shows models the user can actually select: - OpenRouter's models come from fetch_openrouter_models() (live-catalog filtered against the curated OPENROUTER_MODELS snapshot). - Rows with an empty models list are dropped, except custom endpoints (is_user_defined=True with an api_url) where the user may enter model ids manually. - All other fields pass through unchanged. The gateway /model handler switches to the new helper for the interactive picker payload only. Typed /model <name> and the text fallback list stay on list_authenticated_providers() so nothing is hidden from power users or platforms without a picker. Covered by nine focused unit tests in tests/hermes_cli/test_list_picker_providers.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 10:18:58 -07:00
Aslaaen	e8e9147377	fix(acp): preserve assistant reasoning metadata in session persistence	2026-05-05 10:18:28 -07:00
Zeejay	f8ba265340	fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client When a provider returns a 429 rate-limit error (not billing-related), the auxiliary client's call_llm/async_call_llm previously did NOT trigger the fallback chain. This caused auxiliary tasks like session_search to exhaust all 3 retries against the same rate-limited endpoint, losing session metadata that depended on the summarization completing. Root cause: `_is_payment_error()` only matched 429s containing billing keywords ("credits", "insufficient funds", etc.). Provider-specific rate-limit messages like Nous's "Hold up for a bit, you've exceeded the rate limit on your API key" didn't match, so `_is_payment_error` returned False, `_is_connection_error` returned False, and `should_fallback` was False — all retries hit the same rate-limited provider. Fix: - New `_is_rate_limit_error()` function that detects 429 + rate-limit keywords, generic 429 without billing keywords, and OpenAI SDK `RateLimitError` class instances (which may omit .status_code). - Updated `should_fallback` in both `call_llm` and `async_call_llm` to include `_is_rate_limit_error`. - Updated the max_tokens retry path to also check for rate-limit errors. - Updated the reason string to include "rate limit". This complements the Nous rate guard (PR #10568) which prevents new calls to Nous when already rate-limited — this fix handles the case where a request is already in flight when the 429 arrives. Related: #8023, #12554, #11034 Co-authored-by: Zeejay <zjtan1@gmail.com>	2026-05-05 10:15:57 -07:00
LeonSGP43	244bacd0dc	fix(skills): support category-qualified local skill names	2026-05-05 10:15:31 -07:00
Es1la	a877c3f6d9	fix(feishu): tolerate malformed dedup timestamps Salvages @Es1la's PR #13632 — a non-numeric timestamp in the persisted feishu dedup state crashed adapter startup with ValueError/TypeError from the unguarded float() call. Wrap the float() conversion in try/except; skip the bad key and keep loading the rest. The original PR also restructured existing TestDedupTTL tests to use tempfile.TemporaryDirectory + HERMES_HOME patching — that was test-hygiene scope creep unrelated to the bug. Kept only the malformed-timestamp fix and added a focused regression test.	2026-05-05 10:15:09 -07:00
Justin Kausel	526742199b	Prefer fallback for Gemini CloudCode rate limits	2026-05-05 10:14:48 -07:00
Wysie	0120d8f31e	fix: merge plugin tools into builtin toolsets	2026-05-05 10:14:17 -07:00
hharry11	247c9d468c	fix(gateway): ensure deterministic thread eviction in helpers	2026-05-05 10:13:55 -07:00
Jonathan Troyer	6430d67569	fix(openrouter): use canonical X-Title attribution header OpenRouter's dashboard attributes usage via the `X-Title` header. Hermes was sending `X-OpenRouter-Title`, which OpenRouter does not recognize, so Hermes usage showed up unlabeled. Rename to `X-Title` to match the canonical header (already used elsewhere in the same file via _AI_GATEWAY_HEADERS). Salvages the core fix from @JTroyerOvermatch's PR #13649. Dropped the PR's `HERMES_OPENROUTER_TITLE` / `HERMES_OPENROUTER_REFERER` env-var override plumbing per the '.env is for secrets only' policy — if per-deployment attribution is needed later it should go under `openrouter.title` / `openrouter.referer` in config.yaml instead.	2026-05-05 10:13:34 -07:00
Teknium	e4e0090b54	test(acp): regression for #13675 — save_session preserves existing messages on encode failure	2026-05-05 10:05:23 -07:00
Teknium	b014a3d315	test(cron): update _isolate_tick_lock fixture for _get_lock_paths After PR #13725 replaced the module-level _LOCK_DIR/_LOCK_FILE constants with a dynamic _get_lock_paths() helper, the xdist-isolation fixture needs to patch the function instead of the removed constants.	2026-05-05 09:57:06 -07:00
Teknium	de9238d37e	feat(kanban): hallucination gate + recovery UX for worker-created-card claims (#20232 ) Workers completing a kanban task can now claim the ids of cards they created via an optional ``created_cards`` field on ``kanban_complete``. The kernel verifies each id exists and was created by the completing worker's profile; any phantom id blocks the completion with a ``HallucinatedCardsError`` and records a ``completion_blocked_hallucination`` event on the task so the rejected attempt is auditable. Successful completions also get a non-blocking prose-scan pass over their ``summary`` + ``result`` that emits a ``suspected_hallucinated_references`` event for any ``t_<hex>`` reference that doesn't resolve. Closes #20017. Recovery UX (kernel + CLI + dashboard) -------------------------------------- A structural gate alone isn't enough — operators also need to see and act on stuck workers, especially when a profile's model is the root cause. This PR ships the full loop: * ``kanban_db.reclaim_task(task_id)`` — operator-driven reclaim that releases an active worker claim immediately (unlike ``release_stale_claims`` which only acts after claim_expires has passed). Emits a ``reclaimed`` event with ``manual: True`` payload. * ``kanban_db.reassign_task(task_id, profile, reclaim_first=...)`` — switch a task to a different profile, optionally reclaiming a stuck running worker in the same call. * ``hermes kanban reclaim <id> [--reason ...]`` and ``hermes kanban reassign <id> <profile> [--reclaim] [--reason ...]`` CLI subcommands wired through to the same helpers. * ``POST /api/plugins/kanban/tasks/{id}/reclaim`` and ``POST /api/plugins/kanban/tasks/{id}/reassign`` endpoints on the dashboard plugin. Dashboard surfacing ------------------- * ⚠ warning badge on cards with active hallucination events. * attention strip at the top of the board listing all flagged tasks; dismissible per session. * events callout in the task drawer — hallucination events render with a red left border, amber icon, and phantom ids as styled chips. * recovery section in the task drawer with three actions: Reclaim, Reassign (with profile picker + reclaim-first checkbox), and a copy-to-clipboard hint for ``hermes -p <profile> model`` since profile config lives on disk and can't be edited from the browser. Auto-opens when the task has warnings, collapsed otherwise. Keyed by task id so state doesn't leak between drawers. Active-vs-stale rule: warnings clear when a clean ``completed`` or ``edited`` event supersedes the hallucination, so recovery is never permanently stigmatising — the audit events persist for debugging but the badge goes away once the worker succeeds. Skill updates ------------- * ``skills/devops/kanban-worker/SKILL.md`` documents the ``created_cards`` contract with good/bad examples. * ``skills/devops/kanban-orchestrator/SKILL.md`` gains a "Recovering stuck workers" section with the three actions and when to use each. Tests ----- * Kernel gate: verified-cards manifest, phantom rejection + audit event, cross-worker rejection, prose scan positive + negative. * Recovery helpers: reclaim on running task, reclaim on non-running returns False, reassign refuses running without reclaim_first, reassign with reclaim_first succeeds on running. * API endpoints: warnings field present on /board and /tasks/:id, warnings cleared after clean completion, reclaim 200 + 409 paths, reassign 200 + 409 + reclaim_first paths. * CLI smoke: reclaim + reassign subcommands. Live-verified end-to-end on a dashboard with seeded scenarios: attention strip renders, badges land on the right cards, drawer callout shows phantom chips, Reclaim on a running task flips status to ready + emits manual reclaimed event + refreshes the drawer, Reassign swaps the assignee and triggers board refresh. 359/359 kanban-suite tests pass (test_kanban_{db,cli,boards,core_functionality} + dashboard + tools).	2026-05-05 08:06:55 -07:00
Teknium	7de3c86c5a	feat(i18n): add display.language for static message translation (zh/ja/de/es) (#20231 ) * revert(gateway): remove stale-code self-check and auto-restart Removes the _detect_stale_code / _trigger_stale_code_restart mechanism introduced in #17648 and iterated in #19740. On every incoming message the gateway compared the boot-time git HEAD SHA to the current SHA on disk, and if they differed it would reply with Gateway code was updated in the background -- restarting this gateway so your next message runs on the new code. Please retry in a moment. and then kick off a graceful restart. This is unwanted behaviour: users who run a long-lived gateway and do their own ad-hoc git operations on the checkout end up with their chat interrupted and the current message dropped every time HEAD moves, with no way to opt out. If an operator really needs the old protection against stale sys.modules after "hermes update", the SIGKILL-survivor sweep in hermes update (hermes_cli/main.py, also tagged #17648) already handles the supervisor-respawn case on its own. Removed: gateway/run.py: - _STALE_CODE_SENTINELS, _GIT_SHA_CACHE_TTL_SECS - _read_git_head_sha(), _compute_repo_mtime() module helpers - class-level _boot_wall_time / _boot_repo_mtime / _boot_git_sha / _stale_code_restart_triggered defaults - __init__ boot-snapshot block (_boot_, _cached_current_sha, _repo_root_for_staleness, _stale_code_notified) - _current_git_sha_cached(), _detect_stale_code(), _trigger_stale_code_restart() methods - stale-code check + user-facing restart notice at the top of _handle_message() tests/gateway/test_stale_code_self_check.py (deleted, 412 lines) No new logic added. Zero remaining references to any removed symbol. Gateway test suite passes the same 4589 tests it passed before; the 3 pre-existing unrelated failures (discord free-channel, feishu bot admission, teams typing) are unchanged by this commit. * feat(i18n): add display.language for static message translation (zh/ja/de/es) Adds a thin-slice i18n layer covering the highest-impact static user-facing messages: the CLI dangerous-command approval prompt and a handful of gateway slash-command replies (restart-drain, goal cleared, approval expired, config read/save errors). Out of scope (stays English): agent responses, log lines, tool outputs, slash-command descriptions, error tracebacks. Infrastructure: - agent/i18n.py: catalog loader, t() helper, language resolution (HERMES_LANGUAGE env var > display.language config > en) - locales/{en,zh,ja,de,es}.yaml: ~19 translated strings per language - display.language in DEFAULT_CONFIG (hermes_cli/config.py) Tests: - tests/agent/test_i18n.py: 21 tests covering catalog parity, placeholder parity across locales, fallback behavior, env-var override, alias normalization, missing-key graceful degradation. Docs: - website/docs/user-guide/configuration.md: display.language entry plus a short section explaining scope so users don't expect agent responses to translate via this knob.	2026-05-05 08:03:07 -07:00
MaHaoHao-ch	02147cc850	fix(cli): sanitize bracketed paste markers during setup Strip bracketed-paste control sequences from setup prompt input so pasted API keys work on Linux and WSL terminals, and add regression tests for normal/password prompts. Closes #16491	2026-05-05 06:12:42 -07:00
Teknium	285c208cf7	fix(gateway): also tolerate malformed env vars in custom human-delay mode Widens @Krionex's PR #16933 fix to cover the second bug class at the sibling site. natural mode used to pass env values through int() before the PR caught mis-typed values crashing the gateway; custom mode had the exact same bug one branch away (HERMES_HUMAN_DELAY_MIN_MS=oops in custom mode still crashed). Same try/except/fallback pattern, scoped to the two int() calls that feed random.uniform().	2026-05-05 06:11:38 -07:00
Krionex	3b16c590e0	fix(gateway): ignore malformed custom delay env vars in natural mode	2026-05-05 06:11:38 -07:00
novax635	4e6f51167d	fix(cli): fall back on invalid HERMES_MAX_ITERATIONS	2026-05-05 06:11:03 -07:00
Leon	19eebf6e0d	fix(openrouter): treat xiaomi models as reasoning-capable	2026-05-05 06:07:44 -07:00
JC的AI分身	80b386a472	fix(feishu): refresh bot identity during hydration	2026-05-05 06:04:20 -07:00
Teknium	314361733f	test(api_server): _run_agent result now carries session_id for #16938	2026-05-05 06:01:03 -07:00
briandevans	c1a2710a32	test(aux): cover effort: 0 fallback in Codex reasoning translation Copilot review on PR #17012 noted the docstring/comment lists `0` among the falsy effort values that fall back to `medium`, but the existing regression tests only cover `None` and `""`. Add the third case to lock in the full contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 05:47:50 -07:00
briandevans	9e893d16d1	fix(aux): default Codex reasoning effort to medium when extra_body.reasoning.effort is falsy auxiliary.<task>.extra_body.reasoning, but the new translation path in _CodexCompletionsAdapter.create() reads the effort with ``reasoning_cfg.get("effort", "medium")``. That returns the configured value verbatim when the key is present, so ``effort: null`` / ``effort: ""`` (both common YAML shapes) flow through as ``{"effort": null, "summary": "auto"}`` and Codex rejects the request with "Invalid value for parameter ``reasoning.effort``". agent/transports/codex.py::build_kwargs() — which the new adapter is documented to mirror — uses a truthy check (``elif reasoning_config.get("effort"):``) so the same falsy values keep the "medium" default. Switch the auxiliary adapter to the same ``or "medium"`` truthy form so identical config produces identical requests on both paths. - [x] Two new regression tests cover ``effort: None`` and ``effort: ""`` and assert the request goes out as ``{"effort": "medium", "summary": "auto"}``. - [x] Old behaviour fails the new tests (``{'effort': None} != {'effort': 'medium'}``); fixed behaviour passes all 11 tests in the ``TestCodexAdapterReasoningTranslation`` class. - [x] Adjacent suites green: ``tests/agent/test_auxiliary_client.py`` (108 passed) and ``tests/agent/transports/test_codex_transport.py + test_chat_completions.py`` (73 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-05 05:47:50 -07:00
beardthelion	f15b0fbb4f	fix: add PLATFORM_HINTS entry for api_server platform The API server is a documented, first-class messaging platform with its own gateway adapter, docs pages, and toolset. But it's the only messaging platform missing from PLATFORM_HINTS in agent/prompt_builder.py. Without a platform hint, the agent has no context about the API server's rendering environment and defaults to markdown-heavy document-style outputs (code fences, bold, bullet points) — which break on the plain-text frontends most API server consumers wrap (Open WebUI, custom agents, third-party bridges). Adds a generic api_server entry that describes the medium (unknown rendering, assume plain text) without encoding any specific use case. Individual consumers can layer additional style guidance via ephemeral system prompts. Before (DeepSeek V4 Pro via API server, no hint): Sendblue bridge at /opt/sendblue-bridge - 68MB on disk After (same prompt, with hint): Sendblue bridge at /opt/sendblue-bridge, 68MB on disk No breaking changes — new dict entry only. Existing API server consumers see no behavioral change except for models that previously defaulted to markdown formatting, which now produce cleaner plain-text output.	2026-05-05 05:46:16 -07:00
Teknium	b10e38e392	fix(skills): pin protects against deletion only, not edits (#20220 ) Previously, pinning a skill blocked every skill_manage write action (edit, patch, delete, write_file, remove_file). The 'hard fence' design conflated two concerns: 1. Pin as deletion protection — don't let the curator archive or the agent delete a stable skill. 2. Pin as content freeze — don't let the agent rewrite it mid-conversation. In practice (1) is what users pin for: they want a skill to survive curator passes. (2) created friction — agents finding a new pitfall in a pinned skill had to ask the user to unpin, then the agent patches, then the user re-pins. The dance discouraged skill maintenance and pinned skills went stale. This narrows the _pinned_guard to skill_manage(action='delete') only. Patches, edits, and supporting-file writes go through on pinned skills so the agent can keep improving them. The curator's own pinned-skip behavior (agent/curator.py:271 for auto-archive, line 349 for the LLM review prompt) is unchanged — curator still never touches pinned skills. Changes: - tools/skill_manager_tool.py: remove _pinned_guard calls from _edit_skill, _patch_skill, _write_file, _remove_file; keep on _delete_skill. Updated _pinned_guard docstring and error message. - tools/skill_manager_tool.py: updated skill_manage model-facing tool description to reflect the new semantic. - website/docs/user-guide/features/curator.md: updated pinning section. - tests/tools/test_skill_manager_tool.py: flipped refuses-pinned tests for edit/patch/write_file/remove_file into allowed-when-pinned; kept test_delete_refuses_pinned (strengthened assertion to check the 'cannot be deleted' wording). Closes #18354	2026-05-05 05:43:10 -07:00
Teknium	fe8560fc12	feat(api-server): X-Hermes-Session-Key header for long-term memory scoping (#20199 ) * feat(api-server): X-Hermes-Session-Key header for long-term memory scoping API Server integrations (Open WebUI, custom web UIs) can now pass a stable per-channel identifier via X-Hermes-Session-Key that scopes long-term memory (Honcho, etc.) independently of the transcript-scoped X-Hermes-Session-Id. This matches the native gateway's session_key / session_id split: one stable key per assistant channel, many independent transcripts that rotate on /new. - _create_agent and _run_agent accept gateway_session_key and pass it to AIAgent(gateway_session_key=...), which is already honored by the Honcho memory provider (plugins/memory/honcho/client.py resolve_session_name). - New shared helper _parse_session_key_header applies the same API-key gate, control-character sanitization, and a 256-char length cap as the existing session-id header. - All three agent endpoints honor the header: /v1/chat/completions, /v1/responses, /v1/runs. JSON and SSE responses echo it back. - /v1/capabilities advertises session_key_header so clients can feature-detect. Closes #20060. Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com> * chore: AUTHOR_MAP entry for manateelazycat --------- Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>	2026-05-05 05:34:47 -07:00
Teknium	436672de0e	feat(curator): add archive and prune subcommands (#20200 ) * fix(curator): protect hub skills by frontmatter name * test(skill_usage): add mark_agent_created to regression test The cherry-picked test predates #19618/#19621 which rewrote list_agent_created_skill_names() to require an explicit created_by: 'agent' provenance marker. Without mark_agent_created(), my-skill is excluded from the list and the positive assertion fails. * feat(curator): add archive and prune subcommands Adds 'hermes curator archive <skill>' and 'hermes curator prune [--days N] [--yes] [--dry-run]' alongside the existing status, run, pause, resume, pin, unpin, restore, backup, rollback verbs. These are the two genuinely new user-facing verbs requested in #19384. The other verbs proposed there ('stats' and 'restore') already exist as 'curator status' and 'curator restore', so no duplicate surface is added — all skill lifecycle commands live under the single 'hermes curator' namespace. - archive: manual archive of an agent-created skill. Refuses pinned skills with a hint pointing at 'hermes curator unpin'. - prune: bulk-archive unpinned skills idle for >= N days (default 90). Falls back to created_at when last_activity_at is null so never-used skills can still be pruned. --dry-run previews, --yes skips prompt. Adapted from @elmatadorgh's PR #19454 which placed the same verbs under 'hermes skills' with a separate hermes_cli/skills_config.py handler and rich table for stats. The 'stats' and 'restore' parts of that PR duplicated existing surface, so only archive and prune are kept, rewritten to match hermes_cli/curator.py's existing plain-text handler style. Tests rewritten from scratch against the new handlers. Closes #19384 Co-authored-by: elmatadorgh <coktinbaran5@gmail.com> --------- Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com> Co-authored-by: elmatadorgh <coktinbaran5@gmail.com>	2026-05-05 05:15:54 -07:00
Teknium	9e0ef2a1bc	test: pin per-turn reasoning extraction semantics Covers four scenarios for the reasoning-box extraction loop: - simple turn with reasoning - simple turn with no reasoning - tool-calling turn where reasoning lives on the tool-call step - prior turn had reasoning, current turn does not (the stale-display bug the fix exists for) - tool-calling turn where reasoning lives on BOTH steps (latest wins) - empty-string reasoning treated as missing Also updates the four inline replica loops in tests/cli/test_reasoning_command.py to match the new turn-boundary shape so the test file reflects production semantics.	2026-05-05 05:00:05 -07:00
Asher Morse	6b76ea4707	fix(gateway): load reply_to_mode from config.yaml for Discord and Telegram The YAML-to-env-var bridge in load_gateway_config() mapped every Discord and Telegram config key (require_mention, auto_thread, reactions, etc.) except reply_to_mode. Users setting discord.reply_to_mode or telegram.reply_to_mode in ~/.hermes/config.yaml got no effect — the adapter only read the env var, which nothing populated from YAML. Add the missing bridge for both platforms, following the existing pattern. Top-level <platform>.reply_to_mode preferred, falls back to <platform>.extra.reply_to_mode, env var never overwritten. Handles YAML 1.1 bare `off` → Python False coercion. This is a re-submission of the work from #9837 and #13930, which both implemented the same fix but neither landed (see co-authors below). Co-authored-by: Matteo De Agazio <hypnosis.mda@gmail.com> Co-authored-by: ishardo <239075732+ishardo@users.noreply.github.com>	2026-05-05 04:58:23 -07:00
LeonSGP43	354502ee48	fix(kanban): preserve dashboard completion summaries	2026-05-05 04:57:38 -07:00
Teknium	4d0f59fa5a	test(skill_usage): add mark_agent_created to regression test The cherry-picked test predates #19618/#19621 which rewrote list_agent_created_skill_names() to require an explicit created_by: 'agent' provenance marker. Without mark_agent_created(), my-skill is excluded from the list and the positive assertion fails.	2026-05-05 04:55:22 -07:00
LeonSGP43	68c1a08ad1	fix(curator): protect hub skills by frontmatter name	2026-05-05 04:55:22 -07:00
Teknium	5168226d60	feat(file_tools): post-write delta lint on write_file + patch, add JSON/YAML/TOML/Python in-process linters (#20191 ) Closes the gap where write_file skipped the post-edit syntax check that patch already ran, so silent file corruption (bad quote escaping, truncated writes, etc.) would persist on disk until a later read. ## Changes tools/file_operations.py: - Add in-process linters for .py, .json, .yaml, .toml (LINTERS_INPROC). Python uses ast.parse, JSON/YAML/TOML use stdlib/PyYAML parsers. Zero subprocess overhead; preferred over shell linters when both apply. - _check_lint() now accepts optional content and routes to in-process linter first. Shell linter (py_compile, node --check, tsc, go vet, rustfmt) remains the fallback for languages without an in-process equivalent. - New _check_lint_delta() implements the post-first/pre-lazy pattern borrowed from Cline and OpenCode: lint post-write state first; only if errors are found AND pre-content was captured does it lint the pre-state and diff. If the pre-existing file had the SAME errors the edit didn't introduce anything new, so the file is reported as 'still broken, pre-existing' with success=False but a message explaining the errors were pre-existing. If the edit introduced genuinely new errors, those are surfaced and pre-existing ones are filtered out. - WriteResult gains a lint field. - write_file() captures pre-content for in-process-lintable extensions and calls _check_lint_delta after a successful write. - patch_replace() switches from _check_lint to _check_lint_delta, reusing the pre-edit content it already has in scope. tools/file_tools.py: - Update write_file schema description to mention the post-write lint. tests/tools/test_file_operations_edge_cases.py: - Update existing brace-path tests to use .js (shell linter) now that .py is in-process. - Add TestCheckLintInproc (9 tests) covering Python/JSON/YAML/TOML in-process linters. - Add TestCheckLintDelta (5 tests) covering the post-first/pre-lazy short-circuit, new-file path, and the single-error-parser caveat. ## Performance In-process linters are microseconds per call (ast.parse, json.loads). The hot path (clean write) runs exactly one lint — matches main's cost for patch. Pre-state capture is skipped when the file has no applicable linter. Measured 4.89ms/write average over 100 .py writes including lint. ## Inspiration - Cline's DiffViewProvider.getNewDiagnosticProblems() — filters pre-write diagnostics from post-write diagnostics (src/integrations/editor/DiffViewProvider.ts). - OpenCode's WriteTool — runs lsp.diagnostics() after write and appends errors to tool output (packages/opencode/src/tool/write.ts). - Claude Code's DiagnosticTrackingService — captures baseline via beforeFileEdited() and returns new-diagnostics-only from getNewDiagnostics() (src/services/diagnosticTracking.ts). ## Validation - tests/tools/test_file_operations.py + test_file_operations_edge_cases.py + test_file_tools.py + test_file_tools_live.py + test_file_write_safety.py + test_write_deny.py + test_patch_parser.py + test_file_ops_cwd_tracking.py: 228 passed locally. - Live E2E reproduction of the tips.py corruption incident: broken content written; lint field surfaces 'SyntaxError: invalid syntax. Perhaps you forgot a comma? (line 6, column 5)' — the exact error that would have self-corrected the bug on the next turn.	2026-05-05 04:54:17 -07:00
wmagev	2eef395e1c	fix(compaction): mark end of context summary in role=user fallback When the head ends with assistant/tool and the tail starts with assistant, the summary is inserted as a standalone role="user" message. The body's verbatim "## Active Task" quote then gets read as fresh user input by weak/local models (#11475, #14521). The merge-into-tail path already appends an explicit end-of-summary marker for this reason. Mirror it on the standalone path so both insertion routes give the model the same "summary above, not new input" signal.	2026-05-05 04:51:29 -07:00
LeonSGP43	1a03e3b1c6	fix(kanban): detect darwin zombie workers	2026-05-05 04:43:40 -07:00
0xsir0000	f6b68f0f50	fix(gateway): keep DoH-confirmed Telegram IPs that match system DNS (#14520 ) discover_fallback_ips() filtered out any DoH-resolved IP that also appeared in the system resolver's answer set, on the assumption that the system IP was unreachable. When DoH and system DNS agreed (a common case), the function returned the hardcoded _SEED_FALLBACK_IPS list instead — and on networks where those seed addresses are not routable, the Telegram fallback transport had nothing usable to retry against and polling failed. Drop the system_ips exclusion so DoH-confirmed IPs are preserved regardless of system DNS overlap. The TelegramFallbackTransport already tries the primary path first via system DNS, then falls through to the IP-rewrite path on connect failure; including the same IP in both lanes lets a transient primary failure recover via the explicit IP route instead of escalating to seed addresses. Update the two tests that codified the old exclusion to reflect the new, inclusion-by-default behaviour. Fixes #14520	2026-05-05 04:42:59 -07:00
revaraver	aacf36e943	fix(cli): persist manual compress handoff	2026-05-05 04:42:48 -07:00
revaraver	4a3e3e20e5	fix(compression): preserve iterative summary continuity	2026-05-05 04:42:44 -07:00

... 2 3 4 5 6 ...

3373 commits