hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-30 06:41:51 +00:00

Author	SHA1	Message	Date
Teknium	e77f1ed5f7	fix(agent): widen toolset gate to context engine tools (#5544 sibling) The memory-provider gate added in the prior commit closes one of two blind-injection sites in agent_init.py. The context engine block (lines ~1445) follows the identical pattern: agent.context_compressor.get_tool_schemas() (lcm_grep, lcm_describe, lcm_expand) was appended to agent.tools unconditionally, ignoring enabled_toolsets. Same bug class, same local-model latency penalty, same one-line gate — using 'context_engine' as the toolset name (matches the existing plugin-system convention in plugins.py, plugins_cmd.py, etc.). Also adds Lempkey to scripts/release.py AUTHOR_MAP for the prior commit's authorship.	2026-05-21 23:18:37 -07:00
lempkey	4c61fb6cf6	fix(agent): gate memory tool injection on enabled_toolsets (#5544 ) MemoryManager.get_all_tool_schemas() output was appended to AIAgent.tools unconditionally — bypassing the enabled_toolsets / platform_toolsets filter. Setting `platform_toolsets: telegram: []` had no effect: fact_store and other memory provider tools still leaked into the tool surface on every session. Impact on local models (per @thundercat49's benchmarks on Qwen3-30B-A3B Q4_K_M / RTX 3090): tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain text. With 8 memory tool schemas injected, a simple 'hello' on Telegram took ~42s instead of ~1.7s. Small models also entered tool-call loops when memory tools were the only tools present. Gate condition (matches the natural meaning of enabled_toolsets): None → no filter, inject (backward compat) contains 'memory' → user opted in, inject otherwise (including []) → skip injection Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>	2026-05-21 23:18:37 -07:00
helix4u	ba9964ff0d	fix(custom): pass custom provider extra body Allow custom OpenAI-compatible providers declared under `custom_providers:` to set provider-specific `extra_body` fields and have Hermes merge them into chat-completions requests when the matching custom endpoint is active. This is a manual per-provider override rather than a model-name heuristic. OpenAI-compatible Gemma thinking support is real, but the on-wire payload shape is backend-specific: some servers want top-level `enable_thinking`, while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`. A per-provider override is safer than picking one assumed payload. Example config: ```yaml custom_providers: - name: gemma-local base_url: http://localhost:8080/v1 model: google/gemma-4-31b-it extra_body: enable_thinking: true reasoning_effort: high ``` For vLLM Gemma or NIM-style endpoints, use the nested shape those servers expect: ```yaml extra_body: chat_template_kwargs: enable_thinking: true ``` Changes: - `hermes_cli/config.py`: preserve `extra_body` in normalized `custom_providers:` entries and allow it in the validated field set. - `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body` as `request_overrides.extra_body` for named custom runtime resolution, including credential-pool paths. - `agent/agent_init.py`: at agent init, locate the matching custom-provider entry by `base_url` (+ optional model) and merge its `extra_body` into `AIAgent.request_overrides`, with caller-provided overrides winning on conflicting top-level keys. - `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile behavior (Ollama `num_ctx`, `think=False` when reasoning disabled); user-configured `extra_body` flows through `request_overrides`. - `website/docs/integrations/providers.md`: document the explicit `extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant. - Tests cover config normalization, runtime propagation, model matching, trailing-slash equivalence, fallback when no `model` field is set, and caller-override merging precedence. Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`: configured `extra_body` reaches `kwargs.extra_body` on the wire request, and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`) without clobber. Salvaged from #29022 onto current `main`. Cosmetic typing edit in `plugins/model-providers/custom/__init__.py` and a stale-base docs revert in `providers.md` were dropped during cherry-pick. Closes #29022	2026-05-21 07:48:53 -07:00
Teknium	eeb747de25	feat(sessions): opt-in per-session JSON snapshot writer PR #29182 deleted the per-session JSON snapshot writer outright because state.db is canonical and the snapshots had no in-tree consumer. Some users have external tooling that reads `~/.hermes/sessions/session_{sid}.json` directly, so reintroduce the writer behind a config flag that defaults to off. - Add `sessions.write_json_snapshots` (default False) to DEFAULT_CONFIG - Restore `AIAgent._save_session_log` + `_clean_session_content` as gated methods. When the flag is off the call is a fast no-op; when on, the writer behaves as before (atomic write, truncation guard preserved, REASONING_SCRATCHPAD → think tag normalization) - Re-derive the target path from `agent.session_id` on each call so `/branch` and `/compress` re-points happen automatically — no need to restore the explicit re-point bookkeeping at call sites - Wire the single call site in `_persist_session` (the cleanup-on-exit hook). Did NOT restore the 7 intra-turn calls the original PR deleted — those were redundant writes within the same turn that doubled disk I/O without adding any persistence guarantee `_persist_session` does not already provide - Read the flag once at agent init via `load_config()`, cache as `agent._session_json_enabled` - Update `TestNoSessionJsonSnapshot` → `TestSessionJsonSnapshotOptIn` to pin behavior: default off (no file), opt-in true (file written), no-op method on default agents, logs_dir retained unconditionally - Update CONTRIBUTING.md and the bundled `hermes-agent` skill to document the flag and its default	2026-05-20 11:44:10 -07:00
yoniebans	c547392fd4	refactor(session-log): stop initializing session_log_file attribute	2026-05-20 11:44:10 -07:00
Teknium	6cb9917c73	perf(compression): defer feasibility check to first compression attempt (#28957 ) `AIAgent.__init__` was eagerly calling `_check_compression_model_feasibility()` which probes the auxiliary provider chain and runs `get_model_context_length()` (potentially network-bound) to decide whether the configured auxiliary model can fit a full compression-threshold window. That cost ~440ms cold on every agent construction. Most `chat -q` invocations finish in 1-5 seconds and never accumulate enough context to trip the compression threshold, so the feasibility check is pure overhead. The result is also only consumed when compression actually fires (the function adjusts the live threshold downward if the aux model can't fit; absent that mutation, the gate in `conversation_loop.py:442` would never fire anyway). Defer to first `compress_context()` call via `agent._compression_feasibility_checked` sentinel. Runs at most once per agent lifetime, just before the first compression pass. The warning storage (`_compression_warning`) and gateway replay machinery is unchanged — it still emits to status_callback on the first turn that actually needs compression. E2E timing (chat -q 'hi', 3 runs each): BEFORE AFTER delta median wall 2.03s 1.86s -8% (-169ms) min wall 1.92s 1.63s -15% (-293ms) Real cold-start observation (synthetic 31-turn agent loop): identical behavior since feasibility check fires once on first compression and caches. No semantic difference for sessions that DO compress. UX trade-off: users with broken auxiliary-provider config no longer see the warning at session start. They see it when compression first fires — which is exactly when it matters. For users with working config (the vast majority), the warning never fires anyway, so the deferral is invisible. Tests: - tests/run_agent/test_compression_feasibility.py — 16/16 pass (the one test that asserted call-at-init was updated to drive the lazy check explicitly via agent._check_compression_model_feasibility()) - Live tmux session: 2-turn conversation + tool call completes clean, zero errors in agent.log	2026-05-19 17:27:17 -07:00
RyanRana	206f595f66	perf(prompt): cache kanban worker guidance at session init Salvages #24402 by @RyanRana. The KANBAN_GUIDANCE block (~835 tokens) is session-static — the dispatcher decides at spawn time whether the process is a kanban worker via the kanban_show tool's check_fn (gated on HERMES_KANBAN_TASK env var). Re-checking 'kanban_show' in valid_tool_names and re-loading the reference on every system-prompt rebuild (init + each context compression) is wasted work. Caches the resolved string on agent._kanban_worker_guidance once in agent_init and consumes it in system_prompt.build_system_prompt(), with a getattr fallback for code paths that bypass agent_init.	2026-05-18 20:56:44 -07:00
Teknium	9aae59feab	fix(compress): make abort-on-summary-failure opt-in via config flag (#28117 ) PR #28102 made the summary-failure abort path the unconditional default, changing established behavior. Gate it behind config.yaml flag `compression.abort_on_summary_failure` (default False = historical fallback-placeholder behavior). - hermes_cli/config.py: new `compression.abort_on_summary_failure` key, default False, documented inline. - agent/agent_init.py: read the flag from compression config and pass to ContextCompressor. - agent/context_compressor.py: `__init__` accepts `abort_on_summary_failure` (default False). `compress()` failure branch gates the abort on the flag; when False, falls through to the restored legacy fallback path (static "summary unavailable" placeholder + drop middle window). - tests: restore original fallback expectations as default; add new TestAbortOnSummaryFailure class for the opt-in mode. Gateway/CLI plumbing (force=True on /compress, hygiene/handler abort detection, locale `gateway.compress.aborted` key) from PR #28102 stays intact — those paths only fire when `_last_compress_aborted` is True, which now only happens when the flag is enabled.	2026-05-18 10:28:20 -07:00
glennc	9df9816dab	feat(azure-foundry): add Microsoft Entra ID auth Use azure-identity DefaultAzureCredential for keyless Foundry auth. Preserve refreshable callable credentials through OpenAI and Anthropic client paths. Add setup, doctor, auth status, docs, and tests for Entra auth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-18 10:14:38 -07:00
teknium1	36ad8336f9	fix(run_agent): guard memory provider init against empty/whitespace string Original commit `8d756a421` by austrian_guy targeted __init__ in pre-refactor run_agent.py. The body now lives in agent/agent_init.init_agent — re-applied there. Co-authored-by: austrian_guy <33156212+ether-btc@users.noreply.github.com>	2026-05-16 23:43:09 -07:00
teknium1	27df249564	feat(nvidia): add NIM billing origin header — port to extracted modules Original commit `13c3d4b4e` by kchantharuan touched __init__ and _apply_client_headers_for_base_url in pre-refactor run_agent.py. Re-applied to: - __init__: agent/agent_init.py (3 hunks — NVIDIA branch + _custom_headers fallback in routed-client and fallback-client paths) - _apply_client_headers_for_base_url: still in run_agent.py (1 hunk) build_nvidia_nim_headers was already present in agent/auxiliary_client.py from the prior merge — no additional port needed. Co-authored-by: kchantharuan <kchantharuan@nvidia.com>	2026-05-16 23:25:11 -07:00
teknium1	b07524e53a	feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider — port to extracted modules Original commit `b62c99797` by Jaaneek targeted six locations in pre-refactor run_agent.py. Re-applied to the extracted post-PR locations: - api_mode dispatch → agent/agent_init.py - is_xai_responses build_api_kwargs → agent/chat_completion_helpers.py - codex_auth_retry block + 401 hint → agent/conversation_loop.py - _try_refresh_codex_client_credentials body → run_agent.py (kept) The non-run_agent.py portions of the commit (auxiliary_client, codex transport, hermes_cli/auth, tools/xai_http, tests, docs) merged cleanly from main via the prior merge commit. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-05-16 23:23:38 -07:00
teknium1	9f408989c4	refactor(run_agent): extract __init__ (1,381 LOC) to agent/agent_init.py The largest method left on AIAgent (60+ parameters, the entire startup sequence — credential resolution, provider auto-detection, context engine bootstrap, memory store hydration, plugin lifecycle hooks) moves into agent/agent_init.py. AIAgent.__init__ is now a thin wrapper that calls agent.agent_init.init_agent(self, ...) with the original full parameter list preserved. Module-level run_agent names referenced in the body (_openrouter_prewarm_done, _qwen_portal_headers, _routermint_headers, _hermes_home, OpenAI, get_tool_definitions, check_toolset_requirements) are resolved through _ra() so test patches on those names keep working. agent_init's logger warnings are routed via _ra().logger so tests patching run_agent.logger capture them (TestStringKSuffixContextLengthWarns, TestCustomProvidersInvalidContextLengthWarns). Live E2E reconfirmed on three model paths (openai/gpt-5.4, anthropic/claude-sonnet-4.6, moonshotai/kimi-k2-thinking). tests/run_agent/ + tests/agent/: 4313 passed (same pre-existing test_auxiliary_client failure). run_agent.py: 5944 -> 4564 lines (-1380). Total reduction since baseline: 16083 -> 4564 (-11519, 72%).	2026-05-16 19:43:38 -07:00

13 commits