hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-30 06:41:51 +00:00

Author	SHA1	Message	Date
teknium1	9b5dae17a5	feat(context-engine): host contract for external context engines Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373 into a minimal generic host contract that external context engine plugins (e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that duplicated existing infrastructure or had marginal value. Five concrete changes: 1. `_transition_context_engine_session()` on AIAgent — generic lifecycle helper that fires on_session_end → on_session_reset → on_session_start → optional carry_over_new_session_context. Engines implement only the hooks they need; missing hooks are skipped. Built-in compressor keeps its existing reset-only behavior because callers default to no metadata. `reset_session_state()` now optionally accepts previous_messages / old_session_id / carry_over_context and delegates to the transition helper when provided. (#16453) 2. `conversation_id` passed to `on_session_start()` — both the agent-init call site and the compression-boundary call site now forward `self._gateway_session_key` so plugin engines have a stable conversation identity that survives session_id rotation (compression splits, /new, resume). The key already existed on AIAgent; it just wasn't reaching engines. (#16453) 3. Canonical cache buckets forwarded to engines — the usage dict passed to `update_from_response()` now includes input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of the legacy prompt/completion/total keys. Engines can make decisions on cache-hit ratios and reasoning costs instead of only aggregates. ABC docstring updated. (#17453) 4. Plugin-registered context engines visible in the picker — `_discover_context_engines()` in plugins_cmd.py now also includes engines registered via `ctx.register_context_engine()` from plugin manifests, deduplicating by name so repo-shipped descriptions win on collision. (#16451) 5. `_EngineCollector.register_command()` — context engines using the standard `register(ctx)` pattern can now expose slash commands (e.g. `/lcm`). Routes to the global plugin command registry with the same conflict-rejection policy regular plugins use (no shadowing built-ins, no clobbering other plugins). Previously these calls hit a no-op and the slash commands silently never appeared. (#17600) Dropped from the original 5 PRs: - Compression boundary signal (`boundary_reason="compression"`) from #16453 — already on main at `agent/conversation_compression.py:412-424`, landed via the bg-review extraction. - `discover_plugins()` before fallback in run_agent.py from #16451 — redundant: `get_plugin_context_engine()` already routes through `_ensure_plugins_discovered()` which is idempotent. - Runtime identity diagnostics method + helpers from #13373 (+251 LOC) — operators can already read engine state via `engine.get_status()`; the diagnostics view added marginal value relative to its surface area. - The 553-LOC slash-command machinery from #17600 — replaced with a 20-LOC `register_command` method on the collector that reuses the existing plugin command registry instead of building a parallel one. Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs ~1,176 LOC across the original 5 PRs. Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com> Closes #16453. Closes #17453. Closes #16451. Closes #17600. Closes #13373. Related: stephenschoettler/hermes-lcm#68.	2026-05-28 01:45:30 -07:00
novax635	86871ee25a	fix(cli): synchronize HERMES_SESSION_ID across environment and contextvar during session switches	2026-05-23 17:46:55 -07:00
yoniebans	cebd480818	refactor(session-log): drop branch/compress re-point of session_log_file The attribute no longer exists; nothing to re-point.	2026-05-20 11:44:10 -07:00
Teknium	6cb9917c73	perf(compression): defer feasibility check to first compression attempt (#28957 ) `AIAgent.__init__` was eagerly calling `_check_compression_model_feasibility()` which probes the auxiliary provider chain and runs `get_model_context_length()` (potentially network-bound) to decide whether the configured auxiliary model can fit a full compression-threshold window. That cost ~440ms cold on every agent construction. Most `chat -q` invocations finish in 1-5 seconds and never accumulate enough context to trip the compression threshold, so the feasibility check is pure overhead. The result is also only consumed when compression actually fires (the function adjusts the live threshold downward if the aux model can't fit; absent that mutation, the gate in `conversation_loop.py:442` would never fire anyway). Defer to first `compress_context()` call via `agent._compression_feasibility_checked` sentinel. Runs at most once per agent lifetime, just before the first compression pass. The warning storage (`_compression_warning`) and gateway replay machinery is unchanged — it still emits to status_callback on the first turn that actually needs compression. E2E timing (chat -q 'hi', 3 runs each): BEFORE AFTER delta median wall 2.03s 1.86s -8% (-169ms) min wall 1.92s 1.63s -15% (-293ms) Real cold-start observation (synthetic 31-turn agent loop): identical behavior since feasibility check fires once on first compression and caches. No semantic difference for sessions that DO compress. UX trade-off: users with broken auxiliary-provider config no longer see the warning at session start. They see it when compression first fires — which is exactly when it matters. For users with working config (the vast majority), the warning never fires anyway, so the deferral is invisible. Tests: - tests/run_agent/test_compression_feasibility.py — 16/16 pass (the one test that asserted call-at-init was updated to drive the lazy check explicitly via agent._check_compression_model_feasibility()) - Live tmux session: 2-turn conversation + tool call completes clean, zero errors in agent.log	2026-05-19 17:27:17 -07:00
Teknium	1634397ddb	fix(compress): abort instead of dropping messages when summary LLM fails (#28102 ) When auxiliary compression's summary generation returns None (aux model errored, returned non-JSON, timed out, etc.) the compressor previously still dropped every middle message between compress_start..compress_end and replaced them with a static 'Summary generation was unavailable' placeholder. The session kept going but the user silently lost N turns of context for nothing. New behavior: on summary failure, compress() aborts entirely — returns the input messages unchanged and sets _last_compress_aborted=True. The existing _summary_failure_cooldown_until gate (30-60s) keeps the aux model from being burned on every turn. Auto-compress callers detect the no-op (len(after) == len(before)) and stop looping. The chat is 'frozen' at its current size until the next /compress or /new. Manual /compress (CLI + gateway) now passes force=True which clears the cooldown so users can retry immediately after an auto-abort. If the manual retry also fails, the user gets a visible warning telling them nothing was dropped and how to retry. - agent/context_compressor.py: compress() gains force= kwarg; failure branch sets _last_compress_aborted and returns messages unchanged instead of inserting placeholder. - run_agent.py: _compress_context() detects abort, surfaces warning, skips session-rotation entirely, returns messages unchanged. - cli.py + gateway/run.py: manual /compress paths pass force=True. - gateway/run.py: hygiene + /compress handlers detect _last_compress_aborted and emit the new 'Compression aborted' warning (gateway.compress.aborted) instead of the old 'N historical messages were removed' message. - locales/*.yaml: new gateway.compress.aborted key in all 16 locales. - tests: updated to assert the abort contract (messages preserved, compression_count not incremented, abort flag set, no placeholder leaked). New test_force_true_bypasses_failure_cooldown covers the manual-retry path.	2026-05-18 10:19:40 -07:00
glennc	9df9816dab	feat(azure-foundry): add Microsoft Entra ID auth Use azure-identity DefaultAzureCredential for keyless Foundry auth. Preserve refreshable callable credentials through OpenAI and Anthropic client paths. Add setup, doctor, auth status, docs, and tests for Entra auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-18 10:14:38 -07:00
teknium1	f885be030c	fix(auxiliary): resolve xai oauth compression from pool — port to conversation_compression Original commit `97a32afdc` by helix4u targeted _check_compression_model_feasibility in pre-refactor run_agent.py. The function body now lives in agent/conversation_compression.py — re-applied the configured-but-unavailable provider message there. Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>	2026-05-16 23:33:59 -07:00
teknium1	5311d9959e	refactor(run_agent): extract context compression to agent/conversation_compression.py Move four compression-related methods to a dedicated module: * check_compression_model_feasibility — startup probe + auto-lowered threshold + hard floor * replay_compression_warning — re-emit stored warning through gateway status_callback * compress_context — run compressor, split SQLite session, notify plugins+memory * try_shrink_image_parts_in_messages — image-too-large recovery via re-encode AIAgent keeps thin forwarder methods so existing call sites and tests that patch run_agent.AIAgent methods keep working. tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure as before). run_agent.py: 15013 -> 14535 lines (-478).	2026-05-16 18:09:33 -07:00

8 commits