hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-24 16:54:43 +00:00

Author	SHA1	Message	Date
Gille	039fbb41fc	fix(desktop): show newly configured model providers (#41545 )	2026-06-08 01:39:37 -07:00
floory	15c99b437f	fix(cli): set PYTHON env for node-gyp native builds on NixOS (#40690 ) * fix(cli): set PYTHON env for node-gyp native builds on NixOS node-gyp (triggered by node-pty during npm ci) looks for python3 on PATH, which fails on NixOS because python3 lives in the nix store and is not on the system PATH. Add _nixos_build_env() — a two-tier helper that detects NixOS and: 1. Fast path: hermes venv python3 (~0s) 2. Fallback: nix-shell which python3 (~2-5s) Wire it into _run_npm_install_deterministic() via a new env= parameter, then pass it through cmd_gui() and _update_node_dependencies(). Non-NixOS systems: _nixos_build_env() returns None, behavior unchanged. * fix(cli): merge _nixos_build_env() with os.environ, fix NixOS detection, add explicit return None - Critical fix: both Tier 1 (venv) and Tier 2 (nix-shell) now return {**os.environ, "PYTHON": ...} instead of {"PYTHON": ...} — subprocess.run with env= replaces the entire environment, so the old code wiped PATH and broke npm/node on NixOS entirely. - Uses re.search(r"^ID=nixos$", ...) for anchored NixOS detection instead of unanchored substring match (could match ID_LIKE=...nixos). - Removes redundant Path.exists() guard before read_text(); just catches OSError (one filesystem read instead of two). - Adds explicit return None at end of function for type-hint consistency.	2026-06-08 13:57:37 +05:30
teknium1	7a5827c8b0	test: repoint percentage-clamp source guard to gateway/slash_commands.py test_gateway_run_clamped read gateway/run.py asserting the /usage stats handler clamps pct with min(100, ...). That handler moved to gateway/slash_commands.py in this PR's extraction; repoint the guard so it still fires on clamp removal. tests/run_agent/ + tests/gateway/ 8024 passed / 0 failed.	2026-06-08 01:25:35 -07:00
teknium1	de5fe2fa7d	test(gateway): repoint slash-command mocks after mixin extraction Tests for the extracted handlers mocked symbols at gateway.run.*; the handlers now resolve top-level-imported deps (atomic_json_write, fetch_account_usage, render_account_usage_lines) and __file__ from gateway.slash_commands. Repoint those mocks. run.py-resident methods (_increment_restart_failure_counts, _clear_restart_failure_count) keep their gateway.run.atomic_json_write mock — only the moved handlers' mocks change. tests/gateway/ 6415 passed / 0 failed.	2026-06-08 01:25:35 -07:00
teknium1	619bd78273	refactor(gateway): extract 42 slash-command handlers into GatewaySlashCommandsMixin (god-file Phase 3b) The in-session slash commands (/model, /reset, /usage, /compress, /voice, ...) — 42 _handle__command handlers, ~3,200 LOC — move out of gateway/run.py into a mixin GatewayRunner inherits. self._handle__command dispatch + all test references resolve unchanged via the MRO. Neutral deps (MessageEvent, EphemeralReply, Platform, t, cfg_get, atomic_*_write, account-usage helpers, stdlib) imported at the mixin top level. The ~10 run.py- internal helpers (_hermes_home, _load_gateway_config, _resolve_gateway_model, _AGENT_PENDING_SENTINEL, ...) imported lazily inside the handlers that need them to avoid an import cycle. gateway/run.py 19157 -> 15870 LOC; GatewayRunner direct methods 214 -> 172. Behavior-neutral: voice/update/model/compress command test suites pass; all 42 resolve to the mixin via MRO.	2026-06-08 01:25:35 -07:00
teknium1	02a4d66951	fix(auxiliary): retry transient transport error once before fallback (#16587 ) Some checks failed Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (1) (push) Waiting to run Details Tests / test (2) (push) Waiting to run Details Tests / test (3) (push) Waiting to run Details Tests / test (4) (push) Waiting to run Details Tests / test (5) (push) Waiting to run Details Tests / test (6) (push) Waiting to run Details Tests / save-durations (push) Blocked by required conditions Details Tests / e2e (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Has been cancelled Details Nix Lockfile Fix / fix (push) Has been cancelled Details A one-off transient transport failure (streaming-close / incomplete chunked read / 5xx / 408) on an auxiliary LLM call escalated straight to provider/model fallback (or, for context compression, dropped the summary and entered cooldown), even when an immediate retry on the same provider would have succeeded. Add a single same-target retry at the top of call_llm() and async_call_llm() — before the existing except-chain — gated on a new _is_transient_transport_error() that reuses the canonical _is_connection_error() detector plus a 5xx/408 status check. A second failure (or any non-transient error: auth, other 4xx, malformed payload) falls through to first_err and the existing fallback handling unchanged. This lives in call_llm so every auxiliary task (compression, memory flush, title generation, session search, vision) shares one transient-retry surface, rather than each caller re-implementing it. The context compressor needs no change — it calls call_llm and inherits the retry; its existing fallback-to-main path (#18458) now composes naturally (retry the aux model once, then fall back to main only if the retry also fails). Co-authored-by: ARegalado1 <alberto.regalado@ymail.com>	2026-06-08 01:05:45 -07:00
kshitij	4107076128	Merge pull request #41155 from kshitijk4poor/fix/cli-modal-direct-invalidate-41098 fix(cli): paint approval/clarify/sudo/secret modal prompts directly, not via the throttle (#41098)	2026-06-08 01:01:51 -07:00
Teknium	4d18717b6c	fix(gateway): drop --replace from systemd unit templates (#41892 ) Under systemd's Restart=always, --replace turns every restart into a self-kill loop: the new instance reads gateway.pid, kills the previous process, writes its own PID, and on the next restart the cycle repeats. A process supervisor owns the lifecycle — --replace is for manual one-shot takeovers and fights the supervisor. Remove --replace from both the system-level and user-level systemd ExecStart lines. The --replace flag stays available for manual 'hermes gateway run --replace' and on the macOS launchd fallback path (#23387), which is a deliberate manual takeover, not a supervised unit. Also drop RestartMaxDelaySec / RestartSteps from the templates — they require systemd v255+ and are silently ignored on older versions. The _strip_optional_systemd_directives normalizer stays so existing installs whose on-disk unit still carries those directives aren't flagged as outdated. Credit: reported and diagnosed by @Skippy-the-Magnificent-one (PR #37145); reimplemented here under project authorship because the original commit was authored under a non-existent email.	2026-06-08 00:20:08 -07:00
Siddharth Balyan	d02a59b679	fix(nix): cold npm builds + fix-lockfiles real-build verification + auto-fix workflow (#41867 ) * fix(nix): fix-lockfiles real-build verification + point auto-fix at nix/lib.nix Two related fixes to the npm lockfile-hash tooling that, together, let a broken nix build slip onto main and stay there: 1. fix-lockfiles trusted prefetch-npm-deps. It computes the hash from the lockfile contents and early-exited "ok" whenever that matched the pin, never running the real fetchNpmDeps + npmConfigHook build. Those two can disagree (the --apply path already works around it), so `--check` reported "ok" while a cold build was actually broken (e.g. lockfile engines/os/cpu fields the pinned nixpkgs strips from the deps cache, tripping npmConfigHook's consistency diff). Now, when prefetch says the hash matches, confirm with `nix build .#<attr>` before believing it: adopt the real fetchNpmDeps hash if nix reports a 'got:' mismatch, surface non-hash failures honestly (exit 1) instead of claiming "ok", and keep the transient-cache-failure skip. 2. nix-lockfile-fix.yml's auto-fix-main (and the PR-fix job) whitelisted and staged nix/tui.nix + nix/web.nix, but the single npmDepsHash moved to nix/lib.nix. So fix-lockfiles --apply edited nix/lib.nix, the guard flagged it as an "unexpected modified file", and the job exited without committing — the auto-healer could never push a fix. Point the guard regex and both `git add` lines at nix/lib.nix. * fix(nix): fix cold npm builds — adopt the deps-cache lockfile in patchPhase hermes-tui/hermes-agent could not be built from source on the pinned nixpkgs: prefetch-npm-deps strips advisory lockfile fields (engines/os/cpu/funding/ bin/…) that newer npm writes into package-lock.json, then npmConfigHook byte-compares the source lockfile against the cache's stripped copy and fails on the difference. CI only stayed green because it substitutes the prebuilt hermes-tui from Cachix and never cold-builds it; anyone building cold (e.g. a local path: input, or a cache miss) hit the failure. mkNpmPassthru's patchPhase now copies the cache's own normalized package-lock.json over the source before npmConfigHook runs, so the consistency check is trivially satisfied. The resolved dependency set (version/resolved/integrity/dependencies) is identical — fetchNpmDeps derived the cache from this very lockfile — so `npm ci` installs the same tree; only advisory metadata is dropped. Genuine drift is still caught by the fixed-output npmDepsHash check, which runs before this phase. Verified by cold-building .#tui and .#default (full hermes-agent) from scratch on the pinned nixpkgs (6201e2) — both succeed where they previously failed at npmConfigHook.	2026-06-08 12:41:37 +05:30
Teknium	e45b745835	fix(file-tools): reject sentinel TERMINAL_CWD; anchor worktree edits before live cwd exists (#41861 ) Completes the worktree-misroute fix from #35399, which made misroutes visible (resolved_path) but did not prevent them: its divergence warning only fired once a terminal command had populated the live cwd registry. A fresh worktree session (registry still empty) with a stale TERMINAL_CWD='.' got neither a worktree anchor nor a warning, so a relative write_file/patch silently landed in the MAIN checkout. Two changes in tools/file_tools.py: - Treat sentinel TERMINAL_CWD values ('', '.', './', 'auto', 'cwd') and any relative value as UNSET rather than a literal anchor. Previously '.' was joined onto the process cwd, silently routing edits to wherever the process happened to be (the main repo, in a worktree session). The gateway already sanitizes the same set at import time; the file-tool layer now matches. - New _authoritative_workspace_root(): prefers the live terminal cwd, else a sentinel-free absolute TERMINAL_CWD (the worktree path cli.py/main.py set for -w). _resolve_base_dir() and _path_resolution_warning() both use it, so a worktree session resolves into — and warns about escaping — the worktree from the very first write, before any cd has run. Validation: 11 new/parametrized tests (sentinel handling, empty-registry anchoring, early divergence warning, live-cwd precedence). 32/32 pass under scripts/run_tests.sh. Live E2E: relative write in an empty-registry worktree session lands in the worktree, main untouched.	2026-06-07 23:58:47 -07:00
LeonSGP43	e02f4c03c3	fix(gateway): abort --replace when old PID survives SIGKILL When --replace force-kills an unresponsive old gateway, SIGKILL can fail to reap it (uninterruptible sleep, zombie-reaping parent, etc.). The old code unconditionally cleared the PID file and scoped locks and started a fresh instance anyway, leaving two live gateways fighting over the same bot token — a duplicate-gateway failure mode of #19471. Re-verify the process is actually gone (via the Windows-safe _pid_exists helper) after the force-kill; if it still appears alive, clear the takeover marker and abort the replacement instead of duplicating. Co-authored-by: Hermes <noreply@nousresearch.com>	2026-06-07 23:57:32 -07:00
konsisumer	3714caa1b9	fix(session): follow compression continuations for transcript reads	2026-06-07 23:57:20 -07:00
teknium1	329c33dac3	fix(terminal): read cwd overrides under raw task_id after container collapse PR #41822 collapsed CWD-only overrides to the shared 'default' container via _resolve_container_task_id, but three call sites kept routing the env/override lookup through that collapsed id: - the foreground exec path read _task_env_overrides[effective_task_id], yet register_task_env_overrides writes under the raw task_id, so a CWD-only override's cwd was silently dropped (env spun up at the wrong root, exit 126); - the get-or-create env lookup keyed solely on effective_task_id, so an env cached under the raw task_id was missed and duplicated; - register_task_env_overrides synced the new cwd onto the env under the collapsed id, missing a live env cached under the raw task_id. Container identity still collapses to 'default' (sharing preserved); only the per-session env/override lookup now prefers the raw task_id and falls back to the collapsed id. Fixes the 3 regressions in test_terminal_task_cwd.py left red by #41822.	2026-06-07 23:44:04 -07:00
teknium1	d759c13c09	chore(salvage): lint fix + AUTHOR_MAP for desktop source-folders PR #40272 eslint --fix (import sort + padding-line-between-statements) on sidebar/index.tsx after cherry-picking @dangelo352's commits; add release.py AUTHOR_MAP entry so CI doesn't block on the unmapped author email.	2026-06-07 23:44:04 -07:00
D'Angelo Rodriguez	694adec635	Smooth desktop sidebar drag sorting	2026-06-07 23:44:04 -07:00
D'Angelo Rodriguez	f0fcaa1e54	Preserve dragged order inside source folders	2026-06-07 23:44:04 -07:00
D'Angelo Rodriguez	0f500fc41d	Render grouped sessions when local list is empty	2026-06-07 23:44:04 -07:00
D'Angelo Rodriguez	3fc67b7333	Persist desktop sidebar drag order	2026-06-07 23:44:04 -07:00
D'Angelo Rodriguez	ede4f5a4a3	Show messaging source folders in desktop sessions	2026-06-07 23:44:04 -07:00
D'Angelo Rodriguez	9d6992ee8a	Show platform sources in desktop sessions	2026-06-07 23:44:04 -07:00
teknium1	1c68f6f81f	refactor(gateway): extract kanban watcher loops into GatewayKanbanWatchersMixin (god-file Phase 3) gateway/run.py is the largest god file (20k LOC, GatewayRunner with 220 methods). This lifts the cohesive kanban-watcher cluster — _kanban_notifier_watcher, _kanban_dispatcher_watcher, _kanban_advance/unsub/rewind, _deliver_kanban_artifacts (~1,035 LOC, 6 methods) — into gateway/kanban_watchers.py as a mixin that GatewayRunner inherits. Mixin (not free functions) because the methods use only self state: inheriting keeps every self._kanban_* call site working unchanged via the MRO, making this a behavior-neutral move. The methods' lazy imports (_kb, _decomp, _load_config, Platform) travel with them; the mixin needs only stdlib + a matching logging.getLogger('gateway.run'). run.py 20187 -> 19157 LOC; GatewayRunner direct methods 220 -> 214. Behavior-neutral: gateway test suite 6582 passed / 0 failed; start() still wires both watchers via self._kanban_*; MRO resolves all 6 to the mixin. One test (corrupt-board quarantine retry) keyed its time-travel mock on the caller's filename being gateway/run.py — updated to also accept gateway/kanban_watchers.py. Establishes the mixin-extraction pattern for further GatewayRunner decomposition (the 2406-LOC _run_agent and 1164-LOC _handle_message remain, but their callback closures need a context-object redesign — deferred).	2026-06-07 23:14:18 -07:00
liuhao1024	6459b3d991	fix(terminal): collapse CWD-only overrides to shared container When register_task_env_overrides is called with only a 'cwd' key (ACP adapter workspace tracking), the task_id should collapse to 'default' so all interactive surfaces (TUI, gateway, dashboard) share one long-lived container. Previously, any override registration — even CWD-only — caused _resolve_container_task_id to return the session key unchanged, spinning up a separate container per session. This made it impossible to authenticate into external services once and have that auth available across all surfaces. Now only overrides containing isolation keys (docker_image, modal_image, singularity_image, daytona_image, env_type) trigger per-task container isolation. Fixes #37361	2026-06-07 23:04:54 -07:00
teknium1	1a626470ca	refactor(cli): promote 9 closure handlers to top-level + extract their parsers (god-file Phase 2 follow-up) Subcommands whose handler was a closure defined inside main() — memory, acp, tools, insights, skills, pairing, plugins, mcp, claw — have their handler promoted to a top-level function and their parser block extracted into hermes_cli/subcommands/<name>.py (build_<name>_parser, injected handler). These 9 had zero closure-over-main-locals, so promotion is a pure relocation. acp/mcp parser blocks use the shared add_accept_hooks_flag helper. main() 1798 -> 954 LOC (71% below the 3297 Phase-2 starting point); add_parser calls in main.py 89 -> 28. Deferred: sessions, computer-use, secrets handlers reference <name>_parser (for a no-subcommand print_help fallback) — left in place to avoid the _self_parser indirection; minority, low value. Behavior-neutral: all 9 subcommands' --help (incl nested subactions) byte- identical to pre-extraction (diff-verified). tests/hermes_cli/ 6519 passed / 0 failed; new test_subcommands_followup.py covers the 9 builders.	2026-06-07 22:56:23 -07:00
teknium1	524453dab5	refactor(agent): consolidate inner-retry-loop recovery flags into TurnRetryState (god-file Phase 1b) run_conversation's inner retry loop tracked recovery state in ~15 scattered bare booleans (per-provider OAuth refresh guards, format-recovery guards, restart signals). They are now fields on a single TurnRetryState dataclass the loop mutates in place (_retry.<flag>), giving the recovery bookkeeping a named, testable home. Loop-control vars (retry_count, max_retries, max_compression_attempts) stay as plain locals — they're while-mechanics, not recovery bookkeeping. Behavior-neutral: pure local→attribute rewrite of 42 references; kwarg NAMES preserved (e.g. has_retried_429=_retry.has_retried_429). Live simple + tool turns OK. Validation: tests/run_agent/ 1615 passed / 0 failed under per-file process isolation; new test_turn_retry_state.py pins the field contract.	2026-06-07 22:42:05 -07:00
teknium1	4d926f248d	chore(release): add AUTHOR_MAP entry for rodboev	2026-06-07 22:39:51 -07:00
Rod Boev	648706936d	test(gateway): add compression session_id rotation integration tests (#34089 )	2026-06-07 22:39:51 -07:00
teknium1	39c4ac3af1	chore(release): add AUTHOR_MAP entry for JimStenstrom	2026-06-07 22:30:02 -07:00
JimStenstrom	cb5c24e37d	fix(agent): sync logging session context on compaction id rotation When context compaction rotates agent.session_id, it updates the gateway/tools session context (set_current_session_id -> HERMES_SESSION_ID env + ContextVar) but never updates the separate logging session context. The [session_id] tag on log lines comes from hermes_logging._session_context (set once per turn in conversation_loop.py), so post-compaction log lines in the same turn carry the STALE old id while the message/DB/gateway state carry the new one — breaking log correlation exactly at the compaction boundary. Call hermes_logging.set_session_context(agent.session_id) alongside the existing set_current_session_id, guarded so a logging failure can't regress the routing update. Logs-only; no runtime or caching impact. Refs #34089	2026-06-07 22:30:02 -07:00
Teknium	8e223b36ed	fix(curator): protect load-bearing built-in skills from archival/consolidation (#41817 ) The curator's idle-archival path (apply_automatic_transitions under prune_builtins) could archive the bundled `plan` skill, killing the /plan slash command silently — typing /plan then returned 'Unknown command' with no signal that a skill had vanished. The archived skill's hash stays in .bundled_manifest, so 'hermes update' wouldn't re-seed it. Add PROTECTED_BUILTIN_SKILLS ({plan}) enforced at the master gate is_curation_eligible() (covers archive_skill + the transition walk) and in the candidate enumerator (so the LLM consolidation pass never sees them). Immune to prune_builtins, pin state, and LLM judgment.	2026-06-07 22:23:29 -07:00
Teknium	777dc9da62	feat(acp): emit session provenance metadata for compression rotation (#41724 ) Closes #33617. Adds additive _meta.hermes.sessionProvenance to ACP session surfaces so clients can detect compression-driven internal session rotation without parsing status text, guessing from token drops, or reading state.db. Derived on demand from the existing compression chain (parent_session_id / end_reason) — no new persisted state, no schema change, no ACP protocol change. ACP session_id stays the stable client handle. - acp_adapter/provenance.py: derive provenance from SessionDB - server.py: attach _meta to new/load/resume responses; emit a session_info_update when the internal head rotates during a prompt	2026-06-07 22:22:21 -07:00
teknium1	240c5d4543	chore: map martin.alca@gmail.com -> draix in AUTHOR_MAP Salvage follow-up for PR #33221 — the cherry-picked commit is authored under martin.alca@gmail.com (not the draixagent@gmail.com already mapped), which would fail the CI author-attribution gate.	2026-06-07 22:22:01 -07:00
Martín Alcalá Rubí	132d6fe6d6	fix(volcengine): strip XML attribute fragments from tool_use.name (#33007 ) VolcEngine's api/plan endpoint occasionally leaks raw XML attribute fragments into tool_use.name when its protocol-translation layer converts the model's native XML-style tool emission to Anthropic Messages tool_use blocks, producing names like: terminal" parameter="command" string="true execute_code" parameter="code" string="true session_search" parameter="session_id" string="true The corruption happens server-side at the provider, but it breaks every tool call for affected users — no normalization rule in repair_tool_call can rescue them, so each request runs through three retries and then aborts as partial. Add an early sanitizer in agent_runtime_helpers.repair_tool_call that trims at the first ' " ', " ' ", '<', or '>' character (idx > 0 only) so the rest of the existing repair pipeline (lowercase / snake_case / fuzzy match) can resolve the cleaned name normally. Whitespace is deliberately NOT a separator — the legitimate "write file" -> write_file repair path (covered by test_space_to_underscore) must keep working. Tests: 11 new regression cases in TestVolcEngineXmlPollution covering all three observed polluted names, CamelCase + pollution mix, single-quote variants, angle-bracket variants, clean-name passthrough, and the whitespace-preservation guard. All 18 pre- existing repair tests still pass (29 total in the file).	2026-06-07 22:22:01 -07:00
teknium1	f5bd09af4b	refactor(acp): share interrupt-sentinel prefix, simplify guard Replace the ACP-local prefix/suffix matcher + helper with a single startswith() check against INTERRUPT_WAITING_FOR_MODEL_PREFIX, now defined once in conversation_loop.py where the sentinel is produced. Keeps the source of truth in one place so the guard cannot drift if the status string changes. Net -17 LOC in server.py. Also add lsaether to release.py AUTHOR_MAP.	2026-06-07 22:20:43 -07:00
lsaether	9b631e4ae1	fix(acp): suppress cancel interrupt sentinel	2026-06-07 22:20:43 -07:00
Teknium	2789bf4e25	fix(auxiliary): route Codex Responses path through shared converter (#5709 ) The auxiliary Codex adapter maintained its own chat->Responses conversion loop that forwarded every non-system message's role verbatim into Responses input[]. When flush_memories()/compression replayed session history containing assistant tool_calls + role=tool results, those tool messages leaked into the request and the Responses API rejected them with HTTP 400: Invalid value: 'tool'. Route _CodexCompletionsAdapter.create() through the same shared converter the main agent transport uses (_chat_messages_to_responses_input), so tool calls become function_call items and tool results become function_call_output items with a valid call_id. Single conversion path means no future drift. Also remove the now-dead _convert_content_for_responses() helper — its only caller was the private conversion loop this change deletes. Co-authored-by: ProgramCaiCai <techxacm@gmail.com>	2026-06-07 22:18:31 -07:00
teknium1	568e127612	refactor(cli): extract 25 more subcommand parsers into hermes_cli/subcommands/ Batch extraction of every remaining subcommand whose handler is top-level and whose parser block is pure argparse: model, setup, postinstall, whatsapp, slack, login, logout, auth, status, webhook, hooks, doctor, security, dump, debug, backup, import, config, version, update, uninstall, dashboard, gui, logs, prompt-size. Each becomes hermes_cli/subcommands/<name>.py with build_<name>_parser() and an injected handler (no main import). dashboard also injects cmd_dashboard_register for its nested 'register' action. Behavior-neutral: all 25 subcommands' --help output (and nested subaction help) diff-verified byte-identical to pre-extraction. Two RawDescriptionHelpFormatter epilogs (debug, logs) needed their multi-line string interiors preserved at column 0 — caught by the --help diff, not compile. main() 3297 -> 1798 LOC across this PR; add_parser calls in main.py 179 -> 89. Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process isolation; new test_subcommands_batch.py smoke-tests all 25 builders + the dashboard two-handler case.	2026-06-07 22:18:14 -07:00
teknium1	4da45e8727	refactor(cli): extract profile + gateway/proxy parsers into hermes_cli/subcommands/ Follow-on to the cron extraction in the same Phase 2 PR. Same pattern: per-group build_<name>_parser() functions with injected handlers, no main import. - subcommands/profile.py: build_profile_parser (190-line block out of main()). - subcommands/gateway.py: build_gateway_parser (gateway + proxy, 238-line block; they shared one inline section). Imports argparse for SUPPRESS defaults. - main(): two more inline blocks become single builder calls. Behavior-neutral: 'profile [sub] --help' and 'gateway/proxy [sub] --help' byte-identical to pre-extraction (diff-verified). main() now 2723 LOC (was 3297 at Phase 2 start); add_parser calls in main.py 179 -> 141. Validation: tests/hermes_cli/ 6476 passed / 0 failed under per-file process isolation; new builder unit tests cover subactions, aliases, dispatch, flags.	2026-06-07 22:18:14 -07:00
teknium1	b2e6053243	refactor(cli): extract hermes cron parser into hermes_cli/subcommands/ (god-file Phase 2) Phase 2 of the god-file decomposition plan. main()'s argparse tree is 179 inline add_parser calls in one 3,297-line function. This establishes the hermes_cli/subcommands/ package and extracts the first group (cron) as the proof-of-pattern: - hermes_cli/subcommands/_shared.py: shared parser helpers (add_accept_hooks_flag), re-exported from main.py for backwards compat. - hermes_cli/subcommands/cron.py: build_cron_parser(subparsers, cmd_cron=...). Handler injected so the module never imports main (cycle avoidance). - main()'s ~155-line inline cron block becomes one build_cron_parser() call. Behavior-neutral: 'hermes cron create --help' output is byte-identical to origin/main. main() 3297 -> 3143 LOC. Validation: tests/hermes_cli/ 6466 passed / 0 failed under per-file process isolation; new test_subcommands_cron.py covers subactions, aliases, options, no-agent tristate, injected dispatch, and --accept-hooks.	2026-06-07 22:18:14 -07:00
teknium1	54870847cb	refactor(agent): extract run_conversation prologue into agent/turn_context.py Phase 1 of the god-file decomposition plan. run_conversation's ~470-line once-per-turn setup block (stdio guarding, retry-counter resets, user-message sanitization, todo/nudge hydration, system-prompt restore-or-build, crash-resilience persistence, preflight compression, the pre_llm_call hook, and external-memory prefetch) is moved verbatim into build_turn_context(), which returns a TurnContext dataclass the loop unpacks. Behavior-neutral move-and-name refactor: the builder mutates `agent` exactly as the inline code did; only the locals the loop reads back are returned. - run_conversation: 4602 -> 4217 LOC (-385) - agent/conversation_loop.py: 4965 -> ~4580 LOC - new agent/turn_context.py: focused, dependency-injected, unit-tested in isolation Tests: tests/run_agent/ 1570 passed / 0 failed under per-file process isolation. Relocation follow-ups: 413_compression mocks now patch both module references; nudge/on_turn_start source-inspection guards point at the extracted module.	2026-06-07 22:17:35 -07:00
Teknium	86c537d209	fix(memory): instruct in-turn consolidation + retry on overflow (#41755 ) * fix(memory): make overflow errors instruct in-turn consolidation + retry When bounded memory is full, the add/replace overflow errors now explicitly tell the model to consolidate (merge/remove/shorten) and retry the write in the same turn, matching the documented behavior. The replace-overflow path now also echoes current_entries + usage for parity with add-overflow, so the model has the same context to act on. Closes #23378 (working-as-documented; this sharpens runtime to match docs). * fix(memory): broaden overflow remediation hint beyond 'stale' Say 'stale or less important' — entries don't have to be stale to be the right ones to drop when making room.	2026-06-07 22:16:28 -07:00
teknium1	2a10da3a16	fix(gateway): keep /model + /reasoning overrides on topic recovery & compression splits Session-scoped /model and /reasoning overrides were silently lost on Telegram DM/forum topics and after compression session splits (#30479). Root cause: _handle_message_with_agent rewrites source.thread_id via _recover_telegram_topic_thread_id (lobby/stripped reply -> the user's bound topic) before deriving the session key. The /model and /reasoning handlers derived their override key from the raw inbound event.source, skipping that recovery, so the override was stored under one key and the next message turn read a different key. Fix: add _normalize_source_for_session_key (applies the same recovery a message turn does) and use it in both handlers before deriving the key. session_id rotation on compression was never the cause — overrides are keyed by the durable session_key; the split path preserves it. Author: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-06-07 22:10:32 -07:00
Hariharan Ayappane	b8469a81e3	fix(weixin): add rate-limit circuit breaker	2026-06-07 22:10:17 -07:00
Teknium	2e62862784	fix(telegram): use get_running_loop in polling-conflict retry reschedule (#41716 ) The conflict-retry path called asyncio.get_event_loop() to reschedule itself when a retry's start_polling raised. On Python 3.11+ (our floor) that raises 'RuntimeError: There is no current event loop in thread MainThread' when no loop is attached to the thread, which is what happens when PTB dispatches this error callback. The retry never gets scheduled, the adapter goes silent-but-alive, and gateway --replace keeps spawning fresh instances that hit the same wall — the crash loop reported in #19471 (worse under multi-profile, where two bots hold the same conflict open). We are inside a coroutine here, so asyncio.get_running_loop() is the correct, guaranteed-valid replacement. Only get_event_loop() call in any platform adapter, so no sibling sites. Fixes #19471	2026-06-07 22:10:03 -07:00
teknium1	b5f7a1f299	chore(release): add basilalshukaili to AUTHOR_MAP	2026-06-07 22:09:45 -07:00
dusterbloom	cca3b77a4b	fix(compression): clear _previous_summary on session end (defense-in-depth) ContextCompressor inherited a no-op on_session_end() from ContextEngine, so per-session iterative-summary state (_previous_summary) survived a real session boundary on a reused compressor instance. Override it to clear the summary the moment the owning session ends, complementing the point-of-use guard in compress(). Closes the cross-session contamination path in #38788. Co-authored-by: dusterbloom <32869278+dusterbloom@users.noreply.github.com>	2026-06-07 22:09:45 -07:00
Basil Al Shukaili	8513a6aec7	fix(compression): guard against cross-session stale _previous_summary contamination When a cron or background session compacts, it sets _previous_summary for iterative updates. If that session ends without /new or /reset (which calls on_session_reset()), the stale summary survives on the ContextCompressor instance. A subsequent live messaging session's compaction then injects it as 'PREVIOUS SUMMARY:' into the summarizer prompt — contaminating the live session with unrelated content from the prior session. Add an else guard in compress(): when no handoff summary is found in the current messages but _previous_summary is non-empty, discard it so _generate_summary() starts fresh instead of iteratively updating a stale cross-session summary. Fixes #38788	2026-06-07 22:09:45 -07:00
Teknium	ad8e57793d	fix(hermes_time): implement reset_cache() referenced in docstrings (#41728 ) The module docstring and get_timezone()/cache comments documented a reset_cache() helper for forcing tz re-resolution after config changes, but the function was never defined — doc-followers calling it hit AttributeError. Adds the helper to clear the cached tz state. Surfaced in #32043.	2026-06-07 22:08:01 -07:00
Teknium	5408013369	fix(gateway): isolate DM sessions on user_id when chat_id is absent (#41764 ) build_session_key collapsed every DM that arrived without a chat_id into one shared 'agent:main:<platform>:dm' key. A single cached AIAgent then served multiple users' conversations, bleeding history across senders. DMs now fall back to the sender's user_id_alt/user_id (mirroring the group-path participant precedence and the telegram auth-path fallback) before the bare per-platform sink. Telegram's normal event path always sets chat_id, so this hardens the synthetic-source / non-standard-adapter paths that don't.	2026-06-07 22:07:07 -07:00
Teknium	a77bc2c08d	fix(compression): disable compression on background-review fork to prevent cross-turn stale-parent fork (#41708 ) The per-session compression lock prevents same-window concurrent forks but not cross-turn ones: the background-review fork shares the parent's session_id, so if it won a compression race its new child session was never adopted by the gateway (the fork is single-lifecycle). The next foreground turn then started from the stale parent and compressed it again, leaving the same parent with two sibling children. Set review_agent.compression_enabled = False so the fork never triggers compression. Both trigger sites in conversation_loop.py gate on compression_enabled before calling _compress_context, so the fork can never rotate the shared parent. Review needs full context anyway — compressing would degrade the memory/skill summary. The per-session lock is kept as defense-in-depth for any future shared-session path. Adds a regression test that fails without the flag and passes with it. Closes #38727	2026-06-07 22:06:48 -07:00
Teknium	48ae8029aa	fix(delegate): resolve custom-endpoint subagent pools by endpoint identity (#41730 ) Subagents delegated to a custom endpoint were misrouted when the parent ran on a different custom endpoint. Both runtimes collapse to provider="custom", so _resolve_child_credential_pool() treated them as interchangeable and handed the child the parent's pool. Leasing from it then overwrote the child's delegated base_url with the parent's endpoint via _swap_credential() — the child sent the delegated model name to the wrong endpoint. Custom runtimes now resolve by endpoint identity (the custom:<name> pool key derived from base_url). The parent pool is reused only when both parent and child resolve to the same custom endpoint; unregistered raw endpoints return None so the child keeps its fixed delegated credential. Non-custom provider paths are unchanged. Fixes #7833.	2026-06-07 22:05:14 -07:00

1 2 3 4 5 ...

10990 commits