hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-08 03:01:47 +00:00

History

Teknium f67063ba81 feat(kanban): generic diagnostics engine for task distress signals (#20332 ) * feat(kanban): generic diagnostics engine for task distress signals Replaces the hallucination-specific ``warnings`` / ``RecoverySection`` surface (shipped in PR #20232) with a reusable diagnostic-rule engine that covers five distress kinds in v1 and can be extended without touching UI code. The "something's wrong with this task" signal is no longer limited to phantom card ids. Closes the follow-up from #20232 discussion. New module ---------- ``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule engine. Each rule is a pure function of ``(task, events, runs, now, config) -> list[Diagnostic]``. Registry is a simple list; adding a new distress kind is one function + one import, no UI or API changes required. v1 rule set ----------- * ``hallucinated_cards`` (error) — folds the existing ``completion_blocked_hallucination`` event into the new surface. * ``prose_phantom_refs`` (warning) — folds ``suspected_hallucinated_references``. * ``repeated_spawn_failures`` (error → critical at 2x threshold) — fires when ``tasks.spawn_failures >= 3``; suggests ``hermes -p <profile> doctor`` / ``auth``. * ``repeated_crashes`` (error → critical) — fires after N consecutive ``crashed`` run outcomes with no successful completion between; suggests ``hermes kanban log <id>``. * ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked`` state with no comments / unblock attempts; suggests commenting. Every diagnostic carries structured ``actions`` (reclaim, reassign, unblock, cli_hint, comment, open_docs) that render consistently in both CLI and dashboard. Suggested actions are highlighted; generic recovery actions (reclaim / reassign) are available on every kind as fallbacks. Diagnostics auto-clear when the underlying failure resolves — a clean ``completed``/``edited`` event drops hallucination diagnostics, a successful run drops crash diagnostics, a comment drops stuck-blocked diagnostics. Audit events persist; the badge goes away. API --- ``plugin_api.py``: * ``/board`` now attaches ``diagnostics`` (full list) and ``warnings`` (compact summary with ``highest_severity``) per task. * ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics section auto-opens on flagged tasks. * NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by severity, sorted critical-first. CLI --- * NEW ``hermes kanban diagnostics [--severity X] [--task id] [--json]`` — fleet view or single-task view, matches dashboard rule output so CLI users see the same picture. * ``hermes kanban show <id>`` now renders a Diagnostics section near the top with severity markers + suggested actions. Dashboard --------- * Card badge is severity-coloured (⚠ amber warning, !! orange error, !!! red critical) using ``warnings.highest_severity``. * Attention strip above the toolbar counts EVERY task with active diagnostics (not just hallucinations), severity-coloured, lists affected tasks with Open buttons when expanded. * Drawer's old ``RecoverySection`` replaced with generic ``DiagnosticsSection`` rendering a card per active diagnostic: title + detail + structured data (task-id chips when payload keys look like id lists) + action buttons. Reassign profile picker is inline per-diagnostic. Clipboard fallback uses ``.catch()`` for environments where writeText rejects. * Three-rung severity palette; amber for warning, orange for error, red for critical. Uses CSS variables so theming is straightforward. Tests ----- * NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests covering each rule's positive/negative/threshold paths, severity sorting, broken-rule isolation, and sqlite3.Row integration. * Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty, populated, severity-filtered), ``/board`` exposes both diagnostic list and compact summary with ``highest_severity``. * Existing hallucination-specific test (``test_board_surfaces_ warnings_field_for_hallucinated_completions``) updated to reflect the new contract: warning summary keys by diagnostic kind (``hallucinated_cards``) not event kind. 379 kanban-suite tests pass (+16 net from this PR). Live verification ----------------- Seeded all 5 diagnostic kinds + one clean + one plain-running task (7 total) into an isolated HERMES_HOME, spun up the dashboard, and verified: * Attention strip: shows ``!! 5 tasks need attention`` in the error-severity orange; Show expands to a list of 5 rows ordered critical > error > warning. * Card badges: error tasks render ``!!`` orange, warning tasks render ``⚠`` amber, clean and plain-running tasks render no badge. * Each of the 5 rules opens a correctly-coloured, correctly-styled diagnostic card in the drawer with its specific suggested action. * Live reassign from a diagnostic card flipped ``broken-ml-worker → alice`` and the drawer refreshed with the new assignee + the same diagnostic still firing (correct: spawn_failures counter hasn't reset yet). * CLI ``hermes kanban diagnostics`` prints all 5 in severity order; ``--severity error`` narrows to 3; ``kanban show <id>`` includes the Diagnostics block at the top with suggested action hint. Migration note -------------- The old ``warnings`` shape (``{count, kinds, latest_at}``) is preserved on the API but ``kinds`` now keys by diagnostic kind (``hallucinated_cards``) instead of event kind (``completion_blocked_hallucination``). ``highest_severity`` is a new required field. The dashboard was the only consumer and has been updated in the same commit; external API consumers of the ``warnings`` field will need to update their kind-match logic. * feat(kanban/diagnostics): lead titles with the actual error text The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn N times' titles buried the actual cause in the data section. Operators had to open logs or expand the diagnostic to see WHY the worker is stuck — rate-limit vs insufficient quota vs bad auth vs context overflow vs network blip all looked identical at a glance. New titles: Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance Agent crashed 3x: provider auth error: 401 Unauthorized Agent spawn failed 4x: insufficient_quota: You exceeded your current Detail keeps the full error snippet (capped at 500 chars + ellipsis for tracebacks). Title takes the first line capped at 160 chars. Fallback title if no error recorded stays honest ('no error recorded'). Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total pass (+4). Live-verified on dashboard with 6 seeded scenarios (rate-limit, billing, auth, context, network, spawn-billing) — each card title leads with the actionable error text.		2026-05-05 13:32:42 -07:00
..
__init__.py	fix(windows): enforce UTF-8 stdout/stderr to prevent UnicodeEncodeError crash	2026-05-03 16:58:25 -07:00
_parser.py	refactor(cli): derive relaunch flag table from argparse introspection	2026-04-29 20:33:29 -07:00
auth.py	feat(nous): persist Nous OAuth across profiles via shared token store (#19712 )	2026-05-04 04:54:55 -07:00
auth_commands.py	feat(nous): persist Nous OAuth across profiles via shared token store (#19712 )	2026-05-04 04:54:55 -07:00
azure_detect.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
backup.py	fix(backup): floor pre-update backup_keep to 1 so the new backup survives	2026-05-04 05:07:13 -07:00
banner.py	fix(banner): show correct update status on nix-built hermes (#17550 )	2026-04-30 07:03:00 +05:30
browser_connect.py	fix(browser): address Copilot review on /browser connect	2026-04-28 22:11:10 -07:00
callbacks.py	fix: ESC cancels secret/sudo prompts, clearer skip messaging (#9902 )	2026-04-14 16:11:37 -07:00
claw.py	fix(claw): handle missing dir in _scan_workspace_state	2026-05-05 06:08:14 -07:00
cli_output.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
clipboard.py	feat: fix img pasting in new ink plus newline after tools	2026-04-11 13:14:32 -05:00
codex_models.py	feat(codex): add gpt-5.5 and wire live model discovery into picker (#14720 )	2026-04-23 13:32:43 -07:00
colors.py	feat: respect NO_COLOR env var and TERM=dumb (#4079 )	2026-03-30 17:07:21 -07:00
commands.py	feat(telegram): /topic off + help + auth gate + screenshot debounce	2026-05-04 12:07:17 -07:00
completion.py	fix: preserve profile name completion in dynamic shell completion	2026-04-14 10:45:42 -07:00
config.py	feat(i18n): add display.language for static message translation (zh/ja/de/es) (#20231 )	2026-05-05 08:03:07 -07:00
copilot_auth.py	fix(copilot): exchange raw GitHub token for Copilot API JWT	2026-04-24 05:09:08 -07:00
cron.py	feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) (#19709 )	2026-05-04 12:31:01 -07:00
curator.py	feat(curator): add archive and prune subcommands (#20200 )	2026-05-05 05:15:54 -07:00
curses_ui.py	fix: treat ctrl-c as curses cancel	2026-05-04 01:36:44 -07:00
debug.py	fix(debug): redact log content at upload time in hermes debug share	2026-05-03 11:42:20 -07:00
default_soul.py	fix: reset default SOUL.md to baseline identity text (#3159 )	2026-03-26 01:34:27 -07:00
dingtalk_auth.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
doctor.py	refactor(env): use shared Hermes dotenv loader	2026-05-05 10:13:13 -07:00
dump.py	refactor(env): use shared Hermes dotenv loader	2026-05-05 10:13:13 -07:00
env_loader.py	refactor: consolidate symlink-safe atomic replace into shared helper	2026-04-28 04:58:22 -07:00
fallback_cmd.py	feat(cli): add 'hermes fallback' command to manage fallback providers (#16052 )	2026-04-26 06:19:04 -07:00
gateway.py	fix(gateway): handle planned service stops	2026-05-04 16:00:49 -07:00
goals.py	feat: /goal — persistent cross-turn goals (Ralph loop) (#18262 )	2026-04-30 23:10:20 -07:00
hooks.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
kanban.py	feat(kanban): generic diagnostics engine for task distress signals (#20332 )	2026-05-05 13:32:42 -07:00
kanban_db.py	feat(kanban): hallucination gate + recovery UX for worker-created-card claims (#20232 )	2026-05-05 08:06:55 -07:00
kanban_diagnostics.py	feat(kanban): generic diagnostics engine for task distress signals (#20332 )	2026-05-05 13:32:42 -07:00
logs.py	feat: component-separated logging with session context and filtering (#7991 )	2026-04-11 17:23:36 -07:00
main.py	fix(cli): pin HERMES_KANBAN_BOARD at chat boot to stop subprocess board drift	2026-05-05 04:37:47 -07:00
mcp_config.py	refactor(config): migrate remaining 33 cfg_get call sites (#17311 )	2026-04-29 04:03:03 -07:00
memory_setup.py	fix(cli): decode .env as UTF-8 to avoid GBK crash on Windows	2026-05-02 01:40:31 -07:00
model_catalog.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
model_normalize.py	feat(minimax-oauth): full integration with peer OAuth providers	2026-04-29 09:53:42 -07:00
model_switch.py	feat(cli): add list_picker_providers for credential-filtered picker	2026-05-05 10:18:58 -07:00
models.py	fix(models): strip :cloud/-cloud suffix from models.dev Ollama Cloud IDs	2026-05-04 12:38:15 -07:00
nous_subscription.py	fix(cli): coerce use_gateway config flags in tool routing	2026-04-26 19:02:55 -07:00
oneshot.py	fix(tui): honor launch toolsets (#17623 )	2026-04-29 16:55:27 -07:00
pairing.py	fix(pairing): handle null user_name in pairing list display	2026-04-23 02:34:11 -07:00
platforms.py	feat: complete plugin platform parity — all 12 integration points	2026-04-29 21:56:51 -07:00
plugins.py	fix(plugins): bound async plugin command await with 30s timeout	2026-04-30 19:56:18 -07:00
plugins_cmd.py	feat(dashboard): add Plugins page with enable/disable, auth status, install/remove	2026-04-30 20:29:37 -04:00
profiles.py	fix(profiles): keep validate_profile_name strict; callers normalize first	2026-05-04 04:44:37 -07:00
providers.py	fix: prevent bare 'custom' slug in model.provider (#17478 )	2026-04-30 04:32:11 -07:00
pty_bridge.py	fix(pty): default TERM for resize probes	2026-05-04 02:38:54 -07:00
relaunch.py	remove relaunch_chat	2026-04-29 20:33:29 -07:00
runtime_provider.py	fix(fallback): let custom_providers shadow built-in aliases	2026-04-30 20:18:44 -07:00
setup.py	fix(cli): sanitize bracketed paste markers during setup	2026-05-05 06:12:42 -07:00
skills_config.py	refactor(config): migrate remaining 33 cfg_get call sites (#17311 )	2026-04-29 04:03:03 -07:00
skills_hub.py	feat(skills): install skills from a direct HTTP(S) URL (#16323 )	2026-04-26 20:57:10 -07:00
skin_engine.py	fix(tui): restore macOS copy behavior and theme polish (#17131 )	2026-04-28 18:47:14 -05:00
slack_cli.py	fix(paths): route achievements plugin + profile-tui through HERMES_HOME	2026-04-30 23:21:54 -07:00
status.py	fix(status): add missing popular provider API keys to hermes status display	2026-05-04 05:14:13 -07:00
timeouts.py	refactor(timeouts): drop redundant ImportError in except clause	2026-04-26 20:48:20 -07:00
tips.py	feat(tips): add 100 new CLI startup tips (#20168 )	2026-05-05 04:15:58 -07:00
tools_config.py	fix(cli): sync use_gateway in _reconfigure_provider for tts, browser, and web	2026-05-04 02:33:55 -07:00
uninstall.py	feat(uninstall): offer to remove named profiles when uninstalling from default	2026-04-18 19:18:13 -07:00
vercel_auth.py	feat: add Vercel Sandbox backend	2026-04-29 07:22:33 -07:00
voice.py	fix(tui): respect voice.record_key config (supersedes #19028 , #19339 ) (#19835 )	2026-05-04 15:49:28 -07:00
web_server.py	feat(docker): launch dashboard as side-process via HERMES_DASHBOARD=1	2026-05-04 15:37:27 +10:00
webhook.py	refactor(config): migrate remaining 33 cfg_get call sites (#17311 )	2026-04-29 04:03:03 -07:00