hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-04 12:33:08 +00:00

History

Teknium 404640a2b7 feat(goals): /goal checklist + /subgoal user controls (#23456 ) * feat(goals): /goal checklist + /subgoal user controls Two-phase judge for /goal — Phase A decomposes the goal into a detailed checklist on first turn; Phase B evaluates each pending item harshly against the agent's most recent response. The goal completes only when every item is in a terminal status (completed or impossible). Adds /subgoal so the user can append, complete, mark impossible, undo, remove, or clear items the judge missed or got wrong. Mechanics: - GoalState gains `checklist` and `decomposed` fields, both backwards compatible (old state_meta rows load unchanged). - Phase A: aux call writes a harsh, exhaustive checklist; biased toward more items not fewer. Falls through to legacy freeform judge when decompose fails. - Phase B: judge gets the checklist + last-response snippet + path to a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json. A bounded read_file tool (max 5 calls per turn, restricted to that one file) lets the judge inspect history when the snippet is ambiguous. Stickiness in code: terminal items are frozen, only the user can revert via /subgoal undo. - Continuation prompt shows checklist progress when non-empty; reverts to old prompt when empty. - Status line shows M/N done counts. CLI + gateway + TUI gateway all pass the agent reference into evaluate_after_turn so the dump can be written. Gateway-side /subgoal is allowed mid-run since it only modifies the checklist the judge consults at turn boundaries. Tests: 24 new cases — backcompat round-trip, Phase A decompose, Phase B updates + new_items + stickiness, user override flows, conversation dump (incl. unsafe-sid sanitization), judge read_file restriction. Existing freeform-mode tests updated to patch the renamed `judge_goal_freeform` and skip Phase A explicitly. * fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning Three live-test findings from running /goal end-to-end against gemini-3-flash-preview as the judge: 1. Off-by-one bug — the judge sees the checklist rendered with 1-based indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed state.checklist as 0-based. Result: every judge update landed on the wrong item, evidence got attached to neighbouring rows, and the genuine 'first pending' item (usually #1) never got marked. Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the user prompt to call out the 1-based scheme explicitly. New tests cover the parser conversion + an end-to-end fake-judge round-trip. 2. Conversation dump never happened — _extract_agent_messages tried common AIAgent attribute names (.messages, .conversation_history, etc.) but AIAgent doesn't expose the message list as an instance attribute; it lives inside run_conversation()'s scope. Result: the judge's read_file tool always saw history_path=unavailable. Fix: added an explicit messages= kwarg to evaluate_after_turn that all three call sites (CLI, gateway, TUI gateway) now pass directly. Agent-attribute extraction kept as back-compat fallback. 3. Prompt was too harsh on simple goals. The original 'be HARSH, default to leaving items pending' wording made the judge refuse to mark 'file exists' completed even after the agent ran ls, test -f, os.path.isfile, and find — burning the entire 8-turn budget on a fizzbuzz task. Softened to 'strict but not absurd' with explicit guidance on what counts as evidence and a directive not to require re-proving items already established earlier. Re-tested live with the same fizzbuzz goal: now terminates in 2 turns with all 8 checklist items correctly attributed to their own evidence. /subgoal user-action flow (add / complete / undo / impossible) verified live as well.		2026-05-10 16:56:51 -07:00
..
__init__.py	chore: release v0.13.0 (2026.5.7) (#21406 )	2026-05-07 09:22:48 -07:00
_parser.py	fix: add dashboard to CLI help epilogue and Docker CI smoke test	2026-05-07 06:16:23 -07:00
_subprocess_compat.py	feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags	2026-05-08 14:27:40 -07:00
auth.py	feat(cross-platform): psutil for PID/process management + Windows footgun checker	2026-05-08 14:27:40 -07:00
auth_commands.py	auth: use get_default_hermes_root() for shared nous_auth.json path	2026-05-08 14:27:40 -07:00
azure_detect.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
backup.py	codebase: add encoding='utf-8' to all bare open() calls (PLW1514)	2026-05-08 14:27:40 -07:00
banner.py	fix(banner): resolve update-check repo from running code, not profile-scoped path	2026-05-09 04:10:35 -07:00
browser_connect.py	fix(browser): address Copilot review on /browser connect	2026-04-28 22:11:10 -07:00
callbacks.py	fix: ESC cancels secret/sudo prompts, clearer skip messaging (#9902 )	2026-04-14 16:11:37 -07:00
checkpoints.py	feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709 )	2026-05-06 05:44:35 -07:00
claw.py	Merge origin/main and resolve conflict in nix/tui.nix	2026-05-07 22:56:19 +00:00
cli_output.py	refactor: remove dead code — 1,784 lines across 77 files (#9180 )	2026-04-13 16:32:04 -07:00
clipboard.py	feat: fix img pasting in new ink plus newline after tools	2026-04-11 13:14:32 -05:00
codex_models.py	docs(codex-spark): document ChatGPT Pro entitlement gating	2026-05-09 23:17:25 -07:00
colors.py	feat: respect NO_COLOR env var and TERM=dumb (#4079 )	2026-03-30 17:07:21 -07:00
commands.py	feat(goals): /goal checklist + /subgoal user controls (#23456 )	2026-05-10 16:56:51 -07:00
completion.py	fix(completion): use valid zsh _arguments exclusion-group syntax	2026-05-09 13:36:44 -07:00
config.py	fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV	2026-05-09 17:53:35 -07:00
copilot_auth.py	fix(oauth,gateway): monotonic deadlines for polling/timeout loops	2026-05-07 05:09:39 -07:00
cron.py	feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) (#19709 )	2026-05-04 12:31:01 -07:00
curator.py	feat(curator): show rename map in user-visible summary (#22910 )	2026-05-09 18:43:40 -07:00
curses_ui.py	fix: treat ctrl-c as curses cancel	2026-05-04 01:36:44 -07:00
debug.py	fix(debug): redact log content at upload time in hermes debug share	2026-05-03 11:42:20 -07:00
default_soul.py	fix: reset default SOUL.md to baseline identity text (#3159 )	2026-03-26 01:34:27 -07:00
dingtalk_auth.py	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 )	2026-04-28 06:46:45 -07:00
doctor.py	fix(doctor): normalize provider name and aliases before dedicated-skip check	2026-05-09 13:36:33 -07:00
dump.py	refactor(env): use shared Hermes dotenv loader	2026-05-05 10:13:13 -07:00
env_loader.py	feat(cross-platform): psutil for PID/process management + Windows footgun checker	2026-05-08 14:27:40 -07:00
fallback_cmd.py	feat(cli): add 'hermes fallback' command to manage fallback providers (#16052 )	2026-04-26 06:19:04 -07:00
gateway.py	fix(gateway): detect gateway process via /proc in Docker without procps	2026-05-09 17:54:17 -07:00
gateway_windows.py	fix(gateway): preserve Ctrl+C for Windows foreground runs	2026-05-09 14:34:18 -07:00
goals.py	feat(goals): /goal checklist + /subgoal user controls (#23456 )	2026-05-10 16:56:51 -07:00
hooks.py	codebase: add encoding='utf-8' to all bare open() calls (PLW1514)	2026-05-08 14:27:40 -07:00
kanban.py	fix(kanban): /kanban slash command emits argparse garbage instead of help	2026-05-09 22:49:29 -07:00
kanban_db.py	fix(kanban): extend stale claim instead of killing live worker	2026-05-10 15:23:04 -07:00
kanban_diagnostics.py	fix(kanban): unify failure counter across spawn/timeout/crash outcomes (#20410 )	2026-05-05 13:55:37 -07:00
kanban_specify.py	feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks (#21435 )	2026-05-07 13:04:41 -07:00
logs.py	feat: component-separated logging with session context and filtering (#7991 )	2026-04-11 17:23:36 -07:00
main.py	fix(windows): unbreak install + update on Windows (#23394 )	2026-05-10 13:07:08 -07:00
mcp_config.py	feat(mcp): add codex preset for built-in MCP server discovery	2026-05-09 11:11:28 -07:00
memory_setup.py	codebase: add encoding='utf-8' to all bare open() calls (PLW1514)	2026-05-08 14:27:40 -07:00
model_catalog.py	codebase: add encoding='utf-8' to all bare open() calls (PLW1514)	2026-05-08 14:27:40 -07:00
model_normalize.py	fix(opencode-go): keep users on opencode-go instead of hijacking to native providers (#20802 )	2026-05-06 09:08:33 -07:00
model_switch.py	docs(codex-spark): document ChatGPT Pro entitlement gating	2026-05-09 23:17:25 -07:00
models.py	fix(xai): drop models being retired May 15, 2026 from pickers (#23291 )	2026-05-10 12:12:55 -07:00
nous_subscription.py	feat(web): add SearXNG as a native search-only backend	2026-05-06 10:05:29 -07:00
oneshot.py	fix: make session search initialize session db	2026-05-09 14:36:58 -07:00
pairing.py	fix(pairing): enforce lockout on approve_code, not just generate_code (#10195 ) (#21325 )	2026-05-07 07:18:21 -07:00
platforms.py	feat: complete plugin platform parity — all 12 integration points	2026-04-29 21:56:51 -07:00
plugins.py	feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194 )	2026-05-10 07:09:28 -07:00
plugins_cmd.py	fix(plugins): resolve Git binary for installs under minimal PATH	2026-05-09 11:10:04 -07:00
profile_distribution.py	feat(profile): shareable profile distributions via git (#20831 )	2026-05-08 10:04:32 -07:00
profiles.py	fix(profiles): exclude infrastructure artifacts when cloning with --clone-all	2026-05-09 04:10:35 -07:00
providers.py	fix: prevent bare 'custom' slug in model.provider (#17478 )	2026-04-30 04:32:11 -07:00
pt_input_extras.py	fix(cli): make Ctrl+Enter insert newline on WSL/SSH/Windows Terminal (#22777 )	2026-05-09 12:48:14 -07:00
pty_bridge.py	feat(cross-platform): psutil for PID/process management + Windows footgun checker	2026-05-08 14:27:40 -07:00
relaunch.py	fix(windows): prefer npm.cmd over npm.ps1, skip .py argv0 in relaunch	2026-05-08 14:27:40 -07:00
runtime_provider.py	fix: use credential_pool for custom endpoint model listing probes	2026-05-09 17:54:58 -07:00
setup.py	fix(xai): drop models being retired May 15, 2026 from pickers (#23291 )	2026-05-10 12:12:55 -07:00
skills_config.py	refactor(config): migrate remaining 33 cfg_get call sites (#17311 )	2026-04-29 04:03:03 -07:00
skills_hub.py	codebase: add encoding='utf-8' to all bare open() calls (PLW1514)	2026-05-08 14:27:40 -07:00
skin_engine.py	fix(tui): honor skin highlight colors (#20895 )	2026-05-06 14:01:56 -07:00
slack_cli.py	fix(slack): enable writable app home DMs in manifest	2026-05-08 17:01:12 -07:00
status.py	fix(status): add missing popular provider API keys to hermes status display	2026-05-04 05:14:13 -07:00
stdio.py	fix(windows): quote cache paths in bash + augment PATH so rg/bash resolve on first launch	2026-05-08 14:27:40 -07:00
timeouts.py	refactor(timeouts): drop redundant ImportError in except clause	2026-04-26 20:48:20 -07:00
tips.py	feat: Ctrl+Enter inserts newline on Windows Terminal	2026-05-08 14:27:40 -07:00
tools_config.py	fix(windows): unbreak install + update on Windows (#23394 )	2026-05-10 13:07:08 -07:00
uninstall.py	feat(windows uninstall): clean up User env, PATH, Scheduled Task, and portable tooling	2026-05-08 14:27:40 -07:00
vercel_auth.py	feat: add Vercel Sandbox backend	2026-04-29 07:22:33 -07:00
voice.py	fix(tui): restore voice push-to-talk parity (#20897 )	2026-05-06 15:49:59 -07:00
web_server.py	fix(security): require dashboard auth for plugin API routes	2026-05-10 07:04:18 -07:00
webhook.py	refactor(config): migrate remaining 33 cfg_get call sites (#17311 )	2026-04-29 04:03:03 -07:00