Closes#12954
- Add README.zh-CN.md with complete Simplified Chinese translation
- Add language switcher badge in README.md linking to Chinese version
- Add language switcher badge in README.zh-CN.md linking to English version
Step-by-step guide covering Ollama installation, model selection,
Hermes configuration, speed optimization, and optional gateway bot
setup — all running on local hardware with zero API cost.
Includes hardware requirements, model comparison table with tool-call
support status, context window tuning, GPU offloading tips, fallback
provider setup, troubleshooting, and cost comparison.
The dispatcher's circuit breaker only protected against spawn-side
failures (profile missing, workspace mount error, exec failure).
Workers that successfully spawned but then timed out or crashed
re-queued to ``ready`` with no counter increment, so the next tick
re-spawned them — loops forever until someone noticed. Reported
externally on Twitter (Forbidden Seeds) and confirmed by walking the
kernel: ``enforce_max_runtime`` flipped the task back to ready, emitted
a ``timed_out`` event, and never touched ``spawn_failures``; same for
``detect_crashed_workers``.
Fix: unify the counter across all non-success outcomes.
Schema
------
* ``tasks.spawn_failures`` → ``tasks.consecutive_failures``
* ``tasks.last_spawn_error`` → ``tasks.last_failure_error``
* Migration renames the columns in-place on existing DBs (``ALTER
TABLE RENAME COLUMN`` — SQLite >= 3.25) so historical counter
values are preserved. Row mappers fall through to the legacy names
if both column renames and a migration somehow got out of sync.
Counter lifecycle
-----------------
New helper ``_record_task_failure(conn, task_id, error, *, outcome,
release_claim, end_run, event_payload_extra)`` is the single point
every non-success outcome funnels through:
* ``spawn_failed`` → ``_record_spawn_failure`` (kept as alias)
calls it with ``release_claim=True, end_run=True`` — transitions
running→ready, clears claim, closes run.
* ``timed_out`` → ``enforce_max_runtime`` already does the status
transition + run close + event emission, then calls
``_record_task_failure`` with ``release_claim=False, end_run=False``
just to bump the counter (and trip the breaker if needed).
* ``crashed`` → ``detect_crashed_workers`` same pattern, but the
counter increment runs after the main write_txn closes (SQLite
doesn't nest write transactions).
If the counter hits the breaker threshold (``DEFAULT_FAILURE_LIMIT=5``,
same as before), the task transitions to ``blocked`` with a ``gave_up``
event on top of whatever outcome-specific event was already emitted.
Reset semantics changed: the counter now clears only on successful
``complete_task`` (and operator ``reclaim_task`` — an explicit "I've
looked at this, try again with a fresh budget"). Previously
``_clear_spawn_failures`` ran on every successful spawn, which would
have wiped the counter before a timeout could accumulate past threshold
— exactly the loop this fix prevents.
Diagnostics
-----------
* ``_rule_repeated_spawn_failures`` → ``_rule_repeated_failures``. Now
fires regardless of which outcome is at fault. Classifies the most
recent failure (spawn_failed / timed_out / crashed) from the run
history so the title ("Agent timeout x3", "Agent crash x4", "Agent
spawn x5") and suggested action (``doctor`` for spawn, ``log`` for
timeout/crash) stay outcome-specific without N duplicate rules.
* ``_rule_repeated_crashes`` kept as a narrower early-warning at
threshold 2 (vs 3 for the unified rule), but now suppresses itself
when the unified rule would also fire — avoids double-flagging.
* Diagnostic ``data`` payload now carries
``{consecutive_failures, most_recent_outcome, last_error}`` instead
of spawn-specific keys.
CLI
---
* ``Task.consecutive_failures`` / ``Task.last_failure_error`` are the
public fields now. Existing callers that referenced the old names
get migrated (tests updated in this commit).
* Backward-compat: ``DEFAULT_SPAWN_FAILURE_LIMIT``,
``_clear_spawn_failures``, ``_record_spawn_failure`` stay as aliases.
Tests
-----
* 6 new kernel tests: timeout increments counter, 3 consecutive
timeouts trip the breaker (was the reported gap), crash increments
counter, reclaim clears counter, completion clears counter, spawn
success does NOT clear counter.
* Diagnostic tests: updated ``repeated_spawn_failures`` cases to use
the new kind name and add a timeout-loop test.
* Dashboard API test: spawn_failures column update → consecutive_failures.
389/389 kanban-suite tests pass.
Live verification
-----------------
Seeded 4 tasks in an isolated HERMES_HOME: 3 timeouts, 4 crashes,
2-spawn-failed + 2-timed-out, and a task that had prior failures but
completed successfully. Board correctly shows "!! 3 tasks need
attention" (the successful one has no badge because the counter
reset). Drawer for the timeout-loop task renders "Agent timeout x3"
with most_recent_outcome=timed_out and the "Check logs" suggested
action (not the spawn-flavoured "Verify profile"). The successful
task has zero diagnostics.
Closes the Forbidden-Seeds-reported gap.
Salvage of #11350. Kept:
- Code: add an explicit /voice join Choice in the slash UI (runner accepts both 'join' and 'channel' but only 'channel' was in autocomplete).
- Docs: Server Members Intent is conditional (only needed if DISCORD_ALLOWED_USERS contains usernames); SSRC → user_id mapping uses the voice websocket SPEAKING opcode, not the Members intent.
Dropped from the original PR:
- HERMES_DISCORD_VOICE_PACKET_DUMP — this env var doesn't exist on main (it was in a different PR that isn't merged).
- DISCORD_PROXY docs — already documented on current main.
- DISCORD_ALLOW_MENTION_* docs — already on main.
- "barge-in mode" rewrite — current main actually does pause the listener during TTS (VoiceReceiver.pause() at discord.py:192); there is no barge_in_guard/barge_in_rms on main.
Co-authored-by: Michel Belleau <michel.belleau@malaiwah.com>
Salvage of #11758. The PR's original diff was stale (the Docker Compose section on main has been heavily refactored — dashboard is now an embedded side-process, not a separate service), so the useful bit (API server env var requirements) is applied as a note on the basic `docker run` example.
Co-authored-by: xiangyong <xiangyong@zspace.cn>
Adds a comprehensive guide for connecting Dockerized Hermes to local
inference servers like vLLM and Ollama, covering:
- Docker Compose networking (recommended)
- Standalone Docker run with host.docker.internal / --network host
- Connectivity verification steps
- Ollama-specific example
Closes#12308
AGENTS.md is the AI-assistant entry doc, so its counts get used as ground
truth. Several values had drifted, and the same drift had spread to a few
user-facing surfaces. Fixing all of them in one commit so the count claims
agree and clearly distinguish gateway-core from plugin-shipped platforms.
AGENTS.md:
- run_agent.py "~12k LOC" → "~14k LOC as of 2026-05-03" (actual 14,097)
- cli.py "~11k LOC" → "~12k LOC as of 2026-05-03" (actual 12,043)
- tools/environments/ list now lists all 7 user-selectable terminal backends
in canonical order, matching tools/terminal_tool.py:2214-2215
- gateway/platforms/ list adds yuanbao and wecom_callback; the 19 names
match the user-facing list at website/docs/integrations/index.md
- plugins/ tree now mentions plugins/platforms/ (irc, teams)
- tests/ snapshot "~15k tests across ~700 files as of Apr 2026" →
"~19k tests across ~890 files as of 2026-05-03"
User-facing count claims:
- hermes_cli/tips.py:195 — "19 platforms" → "21 messaging platforms" with
IRC and Microsoft Teams added to the named list
- website/docs/index.md:49 — "6 terminal backends" → "7 terminal backends:
..., Vercel Sandbox" (also corrected by PR #19044; same edit content)
- website/docs/index.md:50 — "15+ platforms from one gateway" → "21+ messaging
platforms (19 in the gateway, plus IRC and Microsoft Teams via plugins)"
- website/docs/integrations/index.md:83-85 — "15+ messaging platforms" → "19+",
added yuanbao to the linked list. The surrounding text scopes it to "configured
through the same gateway subsystem", so plugin platforms (IRC, Teams) are
intentionally not in this list
- website/scripts/generate-llms-txt.py:205 — "15+ platforms" → "21+ messaging
platforms — 19 native to the gateway plus IRC and Microsoft Teams via plugins"
LOC and date stamps follow the existing AGENTS.md "as of <date>" convention
(line 56 already used this pattern). Source of truth for the gateway count is
gateway/config.py:130-148 (PlatformID enum); plugin platforms live in
plugins/platforms/.
Out of scope:
- RELEASE_v0.9.0.md historical "16 platforms" claim (immutable history)
- userStories.json verbatim user quotes
- Programmatic count generation from gateway/config.py + plugin manifests
is a worthwhile build-system change but separate from these content fixes
README:24 claimed "Six terminal backends" while tools/environments/ exposes
seven top-level backend choices through TERMINAL_ENV: local, docker, ssh,
singularity, modal, daytona, vercel_sandbox. Modal additionally has direct
and Nous-managed modes selected via terminal.modal_mode (the
ManagedModalEnvironment class is a Modal sub-mode, not a separate top-level
backend).
The same drift appeared in five other doc and code-comment sites with
inconsistent counts (six, seven, or implicit) and varying lists. Updated
all sites to a consistent seven-backend list in canonical order. The
configuration guide also clarifies how Modal's two modes are selected so
operators do not search for a non-existent backend: managed_modal value.
CONTRIBUTING.md:160 lists six backend filenames in a code tree but does
not carry the "Six terminal" prose; left out of scope per cohesion sweep
guidance to bundle only identical wording.
Files updated:
- README.md (line 24, marketing copy)
- website/docs/index.md (line 49, landing page)
- website/docs/user-guide/configuration.md (line 86, config guide)
- tools/environments/__init__.py (lines 3-6, package docstring)
- tools/file_operations.py (line 6, module docstring)
- environments/README.md (line 43, RL training docs — TERMINAL_ENV list)
* fix(tui): close slash parity gaps with CLI
Route unsupported /skills subcommands through slash.exec, support /new <name>
titles, and handle /redraw natively so TUI behavior matches classic CLI. Also
filter gateway-only commands out of the TUI catalog while keeping /status
discoverable.
* fix(tui): run remaining CLI parity paths natively
Forward chat launch flags into the TUI runtime and handle live-session status
and skill reloads in the gateway process so TUI state no longer depends on the
slash worker's stale CLI instance.
* fix(tui): block stale snapshot restores
Prevent snapshot restore from running through the isolated slash worker because
it mutates disk state without refreshing the live TUI agent.
* chore: uptick
* fix(tui): guard async session title updates
Handle failures from the fire-and-forget session.title RPC so title-setting errors do not surface as unhandled promise rejections while preserving session-scoped messaging.
Three worked recipes for OpenAI-compatible cloud providers, plus the
Copilot HTTP 401 auto-recovery info block and the GMI Cloud row in the
compatible providers table. All three additions were on the original
docs/custom-providers-cookbook branch but its merge base predated 1186
main commits, making the rebase impractical (84k+ line conflict).
Replays just the providers.md additions onto current main.
Every provider profile is now a self-contained plugin under
plugins/model-providers/<name>/, mirroring the plugins/platforms/
pattern established for IRC and Teams. The ProviderProfile ABC
stays in providers/; the per-provider profile data moves out.
- plugins/model-providers/<name>/__init__.py calls register_provider()
- plugins/model-providers/<name>/plugin.yaml declares kind: model-provider
- providers/__init__.py._discover_providers() lazily scans bundled plugins
then $HERMES_HOME/plugins/model-providers/<name>/ (user override path)
- User plugins with the same name override bundled ones (last-writer-wins
in register_provider)
- Legacy providers/<name>.py layout still supported for back-compat with
out-of-tree editable installs
- Hermes PluginManager: new kind=model-provider; skipped like memory
plugins (providers/ discovery owns them); standalone plugins with
register_provider+ProviderProfile in their __init__.py auto-coerce to
this kind (same heuristic as memory providers)
- skip_names extended to include 'model-providers' so the general
PluginManager doesn't double-scan the category
- 4 new tests in tests/providers/test_plugin_discovery.py covering
bundled discovery, user override, and general-loader isolation
- Docs updated: website/docs/developer-guide/adding-providers.md,
provider-runtime.md, providers/README.md, plugins/model-providers/README.md
No API break: auth.py / config.py / doctor.py / models.py / runtime_provider.py /
model_metadata.py / auxiliary_client.py / chat_completions.py / run_agent.py
all still consume providers via get_provider_profile() / list_providers() —
they just now see plugin-discovered entries instead of pkgutil-iterated ones.
Third parties can now drop a single directory into
~/.hermes/plugins/model-providers/<name>/ to add or override an inference
provider without touching the repo.
Introduces providers/ package — single source of truth for every
inference provider. Adding a simple api-key provider now requires one
providers/<name>.py file with zero edits anywhere else.
What this PR ships:
- providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes)
- ProviderProfile declarative fields: name, api_mode, aliases, display_name,
env_vars, base_url, models_url, auth_type, fallback_models, hostname,
default_headers, fixed_temperature, default_max_tokens, default_aux_model
- 4 overridable hooks: prepare_messages, build_extra_body,
build_api_kwargs_extras, fetch_models
- chat_completions.build_kwargs: profile path via _build_kwargs_from_profile,
legacy flag path retained for lmstudio/tencent-tokenhub (which have
session-aware reasoning probing that doesn't map cleanly to hooks yet)
- run_agent.py: profile path for all registered providers; legacy path
variable scoping fixed (all flags defined before branching)
- Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS,
doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER
- GeminiProfile: thinking_config translation (native + openai-compat nested)
- New tests/providers/ (79 tests covering profile declarations, transport
parity, hook overrides, e2e kwargs assembly)
Deltas vs original PR (salvaged onto current main):
- Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth
(were added to main since original PR)
- Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their
reasoning_effort probing has no clean hook equivalent yet)
- Removed lmstudio alias from custom profile (it's a separate provider now)
- Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension
(resolve_provider special-cases them; adding breaks runtime resolution)
- runtime_provider: profile.api_mode only as fallback when URL detection
finds nothing (was breaking minimax /v1 override)
- Preserved main's legacy-path improvements: deepseek reasoning_content
preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M
beta recovery, etc.
- Kept agent/copilot_acp_client.py in place (rejected PR's relocation —
main has 7 fixes landed since; relocation would revert them)
- _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing
test imports
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes#14418
The BuiltinMemoryProvider class was removed from the codebase but its
name lingered in the module-level docstrings of memory_manager.py and
memory_provider.py, creating false expectations:
- memory_manager.py docstring showed example code doing
add_provider(BuiltinMemoryProvider(...)) which ImportError at runtime
- memory_provider.py docstring listed BuiltinMemoryProvider as
'always present, not removable' — misleading for new contributors
The regression test (test_memory_user_id.py) already passes without
any reference to BuiltinMemoryProvider; it uses RecordingProvider
instances directly. The stale references were docs-only drift.
Update both docstrings to reflect the actual current architecture:
MemoryManager accepts external plugin providers only (one at a time).
Closes#14402
* feat(kanban): generic diagnostics engine for task distress signals
Replaces the hallucination-specific ``warnings`` / ``RecoverySection``
surface (shipped in PR #20232) with a reusable diagnostic-rule engine
that covers five distress kinds in v1 and can be extended without
touching UI code. The "something's wrong with this task" signal is
no longer limited to phantom card ids.
Closes the follow-up from #20232 discussion.
New module
----------
``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule
engine. Each rule is a pure function of
``(task, events, runs, now, config) -> list[Diagnostic]``. Registry
is a simple list; adding a new distress kind is one function + one
import, no UI or API changes required.
v1 rule set
-----------
* ``hallucinated_cards`` (error) — folds the existing
``completion_blocked_hallucination`` event into the new surface.
* ``prose_phantom_refs`` (warning) — folds
``suspected_hallucinated_references``.
* ``repeated_spawn_failures`` (error → critical at 2x threshold) —
fires when ``tasks.spawn_failures >= 3``; suggests
``hermes -p <profile> doctor`` / ``auth``.
* ``repeated_crashes`` (error → critical) — fires after N consecutive
``crashed`` run outcomes with no successful completion between;
suggests ``hermes kanban log <id>``.
* ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked``
state with no comments / unblock attempts; suggests commenting.
Every diagnostic carries structured ``actions`` (reclaim, reassign,
unblock, cli_hint, comment, open_docs) that render consistently in
both CLI and dashboard. Suggested actions are highlighted; generic
recovery actions (reclaim / reassign) are available on every kind as
fallbacks.
Diagnostics auto-clear when the underlying failure resolves — a
clean ``completed``/``edited`` event drops hallucination diagnostics,
a successful run drops crash diagnostics, a comment drops
stuck-blocked diagnostics. Audit events persist; the badge goes away.
API
---
``plugin_api.py``:
* ``/board`` now attaches ``diagnostics`` (full list) and
``warnings`` (compact summary with ``highest_severity``) per task.
* ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics
section auto-opens on flagged tasks.
* NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by
severity, sorted critical-first.
CLI
---
* NEW ``hermes kanban diagnostics [--severity X] [--task id]
[--json]`` — fleet view or single-task view, matches dashboard rule
output so CLI users see the same picture.
* ``hermes kanban show <id>`` now renders a Diagnostics section near
the top with severity markers + suggested actions.
Dashboard
---------
* Card badge is severity-coloured (⚠ amber warning, !! orange error,
!!! red critical) using ``warnings.highest_severity``.
* Attention strip above the toolbar counts EVERY task with active
diagnostics (not just hallucinations), severity-coloured, lists
affected tasks with Open buttons when expanded.
* Drawer's old ``RecoverySection`` replaced with generic
``DiagnosticsSection`` rendering a card per active diagnostic:
title + detail + structured data (task-id chips when payload keys
look like id lists) + action buttons. Reassign profile picker is
inline per-diagnostic. Clipboard fallback uses ``.catch()`` for
environments where writeText rejects.
* Three-rung severity palette; amber for warning, orange for error,
red for critical. Uses CSS variables so theming is straightforward.
Tests
-----
* NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests
covering each rule's positive/negative/threshold paths, severity
sorting, broken-rule isolation, and sqlite3.Row integration.
* Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty,
populated, severity-filtered), ``/board`` exposes both diagnostic
list and compact summary with ``highest_severity``.
* Existing hallucination-specific test (``test_board_surfaces_
warnings_field_for_hallucinated_completions``) updated to reflect
the new contract: warning summary keys by diagnostic kind
(``hallucinated_cards``) not event kind.
379 kanban-suite tests pass (+16 net from this PR).
Live verification
-----------------
Seeded all 5 diagnostic kinds + one clean + one plain-running task
(7 total) into an isolated HERMES_HOME, spun up the dashboard, and
verified:
* Attention strip: shows ``!! 5 tasks need attention`` in the
error-severity orange; Show expands to a list of 5 rows ordered
critical > error > warning.
* Card badges: error tasks render ``!!`` orange, warning tasks
render ``⚠`` amber, clean and plain-running tasks render no badge.
* Each of the 5 rules opens a correctly-coloured, correctly-styled
diagnostic card in the drawer with its specific suggested action.
* Live reassign from a diagnostic card flipped
``broken-ml-worker → alice`` and the drawer refreshed with the
new assignee + the same diagnostic still firing (correct:
spawn_failures counter hasn't reset yet).
* CLI ``hermes kanban diagnostics`` prints all 5 in severity order;
``--severity error`` narrows to 3; ``kanban show <id>`` includes
the Diagnostics block at the top with suggested action hint.
Migration note
--------------
The old ``warnings`` shape (``{count, kinds, latest_at}``) is
preserved on the API but ``kinds`` now keys by diagnostic kind
(``hallucinated_cards``) instead of event kind
(``completion_blocked_hallucination``). ``highest_severity`` is a
new required field. The dashboard was the only consumer and has
been updated in the same commit; external API consumers of the
``warnings`` field will need to update their kind-match logic.
* feat(kanban/diagnostics): lead titles with the actual error text
The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn
N times' titles buried the actual cause in the data section. Operators
had to open logs or expand the diagnostic to see WHY the worker is
stuck — rate-limit vs insufficient quota vs bad auth vs context
overflow vs network blip all looked identical at a glance.
New titles:
Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached
Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance
Agent crashed 3x: provider auth error: 401 Unauthorized
Agent spawn failed 4x: insufficient_quota: You exceeded your current
Detail keeps the full error snippet (capped at 500 chars + ellipsis
for tracebacks). Title takes the first line capped at 160 chars.
Fallback title if no error recorded stays honest ('no error recorded').
Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total
pass (+4).
Live-verified on dashboard with 6 seeded scenarios
(rate-limit, billing, auth, context, network, spawn-billing) —
each card title leads with the actionable error text.
Subscribe overlay components to computed theme/session selectors instead of the full UI store so unrelated UI state updates trigger fewer overlay renders.
PR #12473 (merged 2026-04-19) added a new --deliver-only flag to
`hermes webhook subscribe` for zero-LLM direct delivery, but
website/docs/reference/cli-commands.md options table did not
reference it. Add the row so CLI users can discover the flag from
the reference page instead of having to read the source.
Mirrors the AGENTS.md #20226 additions (Toolsets / Delegation / Curator /
Cron / Kanban) into the user-facing hermes-agent skill, and closes the
drift in the in-session slash command list.
User report (wxrrior in Discord): the skill did not mention /goal, so a
brand-new session answering "/hermes-agent do you have any info on /goal"
confidently said it did not exist. Cross-check against the CommandDef
registry found 16 commands missing from the static list: /goal, /agents,
/busy, /copy, /curator, /debug, /footer, /gquota, /indicator, /kanban,
/redraw, /reload, /reload-skills, /snapshot, /steer, /topic.
Changes:
- Slash Commands header now tells the reader to run /help or check the
live docs reference as the source of truth, and names the registry
of record (hermes_cli/commands.py) so future drift gets flagged
honestly instead of answered confidently wrong.
- Added all 16 missing commands, slotted into existing subsections
(/goal and /steer in Session; /busy + /indicator + /footer in
Configuration; /curator + /kanban + /reload-skills + /reload in
Tools & Skills; /topic in Gateway; /copy in Utility; /gquota +
/debug in Info).
- Toolsets table updated to the authoritative 30-key list from
toolsets.py (added kanban, yuanbao, spotify, safe, debugging, video,
feishu_doc, feishu_drive, discord, discord_admin, clarify; previously
stopped at 20 keys).
- New "Durable & Background Systems" section before Troubleshooting
covers Delegation, Cron, Curator, Kanban - each with a short rundown
of CLI verbs, key invariants, and a pointer to the user-facing docs.
Mirrors AGENTS.md #20226 but in the skill's user-facing register.
- Bumped version 2.0.0 -> 2.1.0.
PR #13743 replaced the global MAX_TEXT_LENGTH=4000 with a per-provider
table and a user-override 'max_text_length:' key, but the user-guide
TTS page documented no length behaviour at all. Users hitting truncation
had no way to discover the new caps or the override.
Add an 'Input length limits' subsection after the existing Configuration
YAML block: provider default caps (Edge 5000 / OpenAI 4096 / xAI 15000 /
MiniMax 10000 / Mistral 4000 / Gemini 5000 / ElevenLabs model-aware /
NeuTTS,KittenTTS 2000), ElevenLabs model_id -> cap table (5k-40k), an
override example, and the validation rules (non-positive / non-integer /
boolean values fall through to the provider default).