hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-09 03:11:58 +00:00

Author	SHA1	Message	Date
Teknium	26bac67ef9	fix(entry-points): guard hermes_bootstrap import so partial updates don't brick hermes (#22091 ) teknium1 hit ModuleNotFoundError: No module named 'hermes_bootstrap' after a code update, on both his Windows machine AND his Linux workstation. The failure mode is real and affects every user who updates hermes by any path OTHER than a fully-successful ``hermes update``. ## What happens hermes_bootstrap.py is a top-level module registered via pyproject.toml's ``py-modules`` list (added by Brooklyn's Windows UTF-8 stdio work). It must be registered in the venv's editable-install .pth file before Python can find it as a bare ``import hermes_bootstrap``. ``hermes update`` handles this correctly: (1) git reset --hard, (2) clear __pycache__, (3) uv pip install -e . (re-registers the package including the new py-modules list), (4) restart. BUT if any step AFTER (1) fails — network blip during pip install, PEP 668 on a system Python, venv locked, uv not in PATH, a crash mid-update — the user is left with new code that references hermes_bootstrap and a venv that doesn't know about it. Every hermes invocation after that crashes with ModuleNotFoundError, including ``hermes update`` itself. No recovery path without manual `uv pip install -e .`. Also affects users who ``git pull`` the repo directly without running hermes update — relatively common for developers. ## Fix Wrap ``import hermes_bootstrap`` in a try/except ModuleNotFoundError across all 6 entry points (hermes_cli/main, run_agent, gateway/run, acp_adapter/entry, cli, batch_runner). On Windows, missing bootstrap means the UTF-8 stdio setup doesn't run — degraded behavior (Unicode chars may fail to print) but NOT a crash. POSIX is unaffected either way since the bootstrap is a no-op there. Once hermes is running again, the user can ``hermes update`` to fully recover. ## Test update tests/test_hermes_bootstrap.py::test_entry_point_imports_bootstrap scans for the first top-level import in each entry point and asserts it is hermes_bootstrap. Extended the check to accept a Try block whose body is a lone Import of hermes_bootstrap — that's the recovery-friendly form we just introduced. Verified behavior by ``mv hermes_bootstrap.py hermes_bootstrap.py.bak`` and confirming ``python -c "import hermes_cli.main"`` succeeds. 82/82 tests pass (hermes_bootstrap + windows-native + windows-compat).	2026-05-08 14:43:13 -07:00
Teknium	cbce5e93fc	codebase: add encoding='utf-8' to all bare open() calls (PLW1514) Closes the last Python-on-Windows UTF-8 exposure by making every text-mode open() call explicit about its encoding. Before: on Windows, bare open(path, 'r') defaults to the system locale encoding (cp1252 on US-locale installs). That means reading any config/yaml/markdown/json file with non-ASCII content either crashes with UnicodeDecodeError or silently mis-decodes bytes. After: all 89 affected call sites in production code now pass encoding='utf-8' explicitly. Works identically on every platform and every locale, no surprise behavior. Mechanical sweep via: ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' . All 89 fixes have the same shape: open(x) or open(x, mode) became open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing else changed. Every modified file still parses and the Windows/sandbox test suite is still green (85 passed, 14 skipped, 0 failed across tests/tools/test_code_execution_windows_env.py + tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py + tests/test_hermes_bootstrap.py). Scope notes: - tests/ excluded: test fixtures can use locale encoding intentionally (exercising edge cases). If we want to tighten tests later that's a separate PR. - plugins/ excluded: plugin-specific conventions may differ; plugin authors own their code. - optional-skills/ and skills/ excluded: skill scripts are user-authored and we don't want to mass-edit them. - website/ and tinker-atropos/ excluded: vendored / generated content. 46 files touched, 89 +/- lines (symmetric replacement). No behavior change on POSIX or on Windows when the file is ASCII; bug fix on Windows when the file contains non-ASCII.	2026-05-08 14:27:40 -07:00
Teknium	d94fb47717	hermes_bootstrap: Windows-only UTF-8 stdio shim for all entry points Codebase-wide fix for Python-on-Windows UTF-8 footguns, complementing the earlier execute_code sandbox fixes (which remain load-bearing for when the sandbox explicitly scrubs child env). Problem: Python on Windows has two long-standing text-encoding pitfalls: 1. sys.stdout/stderr are bound to the console code page (cp1252 on US-locale installs) — print('café') crashes with UnicodeEncodeError. 2. Subprocess children don't know to use UTF-8 unless PYTHONUTF8 and/or PYTHONIOENCODING are set in their env — so any Python we spawn (linters, sandbox children, delegation workers) hits the same bug. Solution: A tiny bootstrap module (hermes_bootstrap.py) imported as the first statement of every Hermes entry point: - hermes_cli/main.py (hermes / hermes-agent console_script) - run_agent.py (hermes-agent direct) - acp_adapter/entry.py (hermes-acp) - gateway/run.py (messaging gateway) - batch_runner.py (parallel batch mode) - cli.py (legacy direct-launch CLI) On Windows, the bootstrap: - os.environ.setdefault('PYTHONUTF8', '1') (PEP 540 UTF-8 mode) - os.environ.setdefault('PYTHONIOENCODING', 'utf-8') - sys.stdout/stderr/stdin.reconfigure(encoding='utf-8', errors='replace') Children inherit the env vars → they run in UTF-8 mode. Current process's stdio is reconfigured → print('café') works now. On POSIX (Linux/macOS), the bootstrap is a complete no-op. We don't touch LANG, LC_, or anything else — users who have intentionally configured a non-UTF-8 locale aren't affected. POSIX systems are already UTF-8 by default in 99% of modern setups, so there's nothing to fix. setdefault() (not overwrite) means users who explicitly set PYTHONUTF8=0 or PYTHONIOENCODING=cp1252 in their environment are respected. What this does NOT fix: bare open(path, 'w') calls in the parent* process still default to locale encoding because PYTHONUTF8 is only read at interpreter init. A ruff PLW1514 sweep (separate follow-up) will add explicit encoding='utf-8' at those ~219 call sites for belt-and-suspenders. Tests (17): 16 passed, 1 skipped on Windows. - Windows: env vars set, stdio reconfigured, child inherits UTF-8 mode - POSIX: complete no-op (verified on fake POSIX + skipped on real POSIX since we don't have a Linux box in this session) - Idempotence: multiple calls safe - Graceful degradation: non-reconfigurable streams don't crash - User opt-out: explicit PYTHONUTF8=0 is respected - Load order: every entry point's FIRST top-level import is hermes_bootstrap, enforced by an AST-level parametrized test pyproject.toml: added hermes_bootstrap to py-modules so it ships with pip installs.	2026-05-08 14:27:40 -07:00
Teknium	d0aad4b021	fix(computer-use): harden image-rejection fallback + AUTHOR_MAP Follow-up to #15328's vision-unsupported retry branch in run_agent.py. _strip_images_from_messages() previously deleted any message whose content was entirely images. That's fine for synthetic user messages injected for attachment delivery, but it breaks providers for tool-role messages — the paired tool_call_id on the preceding assistant message ends up unmatched, which OpenAI-compatible APIs reject with HTTP 400. Fix: tool-role messages whose content becomes empty are replaced with a plaintext placeholder that preserves the tool_call_id linkage. Only non-tool messages are dropped. Added 10 tests covering the role-alternation invariants + image-type coverage. Image-rejection detector: expanded phrase list (image content not supported / multimodal input / vision input / model does not support image) and gated on 4xx status so transient 5xx errors never get misinterpreted as 'server said no to images'. Detection is documented as best-effort English phrase matching. AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to ddupont808 so release notes attribute the salvage correctly.	2026-05-08 11:07:38 -07:00
ddupont	2937f9bef6	fix(computer-use): unwrap _multimodal tool results to content list for non-Anthropic providers Tool handlers (e.g. computer_use capture) return a _multimodal envelope dict when a screenshot is attached. The tool-message builder was passing this raw dict as the `content` field of role:tool messages, which is an illegal format — OpenAI-compatible APIs expect a string or a content-parts list, not a plain Python dict, and would reject it with a 400/422 error. Fix: unwrap _multimodal results to their `content` list ([{type:text,...},{type:image_url,...}]) in both the parallel and sequential tool-call paths. The Anthropic adapter already handles content lists natively; vision-capable OpenAI-compatible servers (mlx-vlm, GPT-4o, etc.) accept image_url parts in tool messages directly. Also add a _vision_supported adaptive fallback: on first image-rejection error ("Only 'text' content type is supported." etc.) the agent strips all image parts from the message history and retries with text only, so text-only endpoints degrade gracefully without crashing the session.	2026-05-08 11:07:38 -07:00
ddupont	e31f3b3c56	feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection Extends the cua-driver computer-use backend to drive backgrounded macOS windows without stealing keyboard or mouse focus from the foreground app. All changes target the cua-driver MCP backend and the shared dispatcher. ## cua_backend.py Window-aware capture: capture() now calls list_windows + get_window_state instead of the removed capture tool. Prefers structuredContent.windows (MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back to regex-parsed text for older builds. Stores the selected (pid, window_id) as sticky context so subsequent action calls do not need a redundant round-trip. Action routing: click/scroll/type_text/key all carry the sticky pid (and window_id for element-indexed clicks). type_text routes through type_text_chars (individual key events) rather than AX attribute write -- WebKit AXTextFields reject attribute writes from backgrounded processes. Key parsing: _parse_key_combo splits cmd+s-style strings into (key, [modifiers]) and routes to hotkey (modifier present) or press_key (bare key) -- cua-driver actual tool names. set_value method: new set_value(value, element) calls the cua-driver set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari, AXPress opens the native macOS popup which closes immediately when the app is non-frontmost; set_value AX-presses the matching child option directly (no menu required, no focus steal). focus_app: reimplemented as a pure window-selector (enumerates list_windows, sets sticky pid/window_id) without ever raising the window or stealing focus. list_apps: fixed tool name from listApps to list_apps; handles plain-text response via regex when structured data is absent. Structured-content extraction: _extract_tool_result now surfaces structuredContent from MCP results, enabling the list_windows window array without text parsing. Helpers: _parse_windows_from_text, _parse_elements_from_tree, _split_tree_text, _parse_key_combo extracted as module-level functions. ## schema.py Added set_value to the action enum with a description explaining when to prefer it over click (select/popup elements, sliders, no focus steal). Added value field for set_value payloads. ## tool.py Routed set_value action through _dispatch to backend.set_value. Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated). Fixed MIME-type detection in _capture_response: cua-driver may return JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png) rather than hardcoding image/png. ## agent/display.py + run_agent.py Guard _detect_tool_failure and result-preview logic against non-string function_result values: multimodal tool results (dicts with _multimodal=True) are not string-sliceable; treat them as successes and fall back to str() for length/preview.	2026-05-08 11:07:38 -07:00
Teknium	850413f120	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-05-08 11:07:38 -07:00
kshitij	7338e5d9ba	fix(model-switch): prevent stale Ollama credentials after provider switch (#21703 ) When switching from a custom local provider (e.g. ollama-launch) to a cloud provider, two bugs caused the CLI to misbehave: 1. _explicit_api_key/_explicit_base_url were only updated when the switch result had non-empty values (guarded by `if result.api_key:` etc.). If the previous provider set these to Ollama values ("ollama", "http://127.0.0.1:11434/v1"), those stale values leaked into the next turn's _ensure_runtime_credentials() call and were forwarded to the new provider's API endpoint, causing authentication/routing failures. Fix: unconditionally write result.api_key/base_url into the explicit fields after every successful switch. An empty string is the correct sentinel — it tells _ensure_runtime_credentials to re-resolve from the auth store / config rather than forwarding a stale override. 2. In AIAgent.switch_model(), `self.base_url = base_url or self.base_url` kept the old Ollama localhost URL whenever the incoming base_url was an empty string. For providers that use a native SDK (not an OpenAI-compat endpoint), the caller passes base_url="" and expects the agent to clear the field — not silently inherit Ollama's address. Fix: only update self.base_url when base_url is truthy. 3. _handle_model_picker_selection() was called from the prompt_toolkit Enter key binding without any exception guard. Any unexpected error in the model-selection code path propagated through prompt_toolkit's key-binding dispatcher and caused the entire TUI to exit — which the user sees as "the terminal exits when I switch providers". Fix: wrap the call in try/except and close the picker on failure.	2026-05-08 14:28:54 +05:30
Austin Pickett	d87c7b99e2	fix(analytics): prevent silent token loss and add Claude 4.5–4.7 pricing (#21455 ) - Add pricing entries for Claude Opus 4.5/4.6/4.7, Sonnet 4.5/4.6, and Haiku 4.5 with updated source URLs (platform.claude.com) - Add _normalize_anthropic_model_name() to handle dot-notation variants (e.g. claude-opus-4.7 → claude-opus-4-7) for pricing lookups - Fix silent token loss: ensure session row exists before UPDATE in both run_agent.py and hermes_state.py (INSERT OR IGNORE is idempotent) - Log token persistence failures at DEBUG level instead of swallowing them silently — makes undercounted analytics diagnosable - Surface reasoning tokens in CLI /usage and TUI usage panel - Add 'reasoning' and 'cost_status' fields to TUI Usage type	2026-05-07 13:24:31 -07:00
Blake Johnson	9076a2e74e	fix(agent): keep Nous GPT-5 fallback on chat completions	2026-05-07 13:04:42 -07:00
Teknium	812ce0b987	fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385 ) When empty-response terminal scaffolding fires on a tool-result turn, _drop_trailing_empty_response_scaffolding left the live history ending at a bare 'tool' message. The next user input then landed as [...tool, user], a protocol-invalid sequence that OpenRouter/Opus and other providers silently fail on (returns empty content). That retriggered the empty-retry recovery every turn, and recovery flags never hit SQLite (no column for them), so history kept looking broken on every reload. Two fixes: 1. Scaffolding strip rewinds the orphan assistant(tool_calls)+tool pair after popping sentinels. Only fires when scaffolding flags were actually present, so mid-iteration tool loops are untouched. 2. _repair_message_sequence runs right before every API call as a defensive belt: drops stray tool messages with unknown tool_call_ids, merges consecutive user messages so no user input is lost. Does NOT rewind assistant(tool_calls)+tool+user — that pattern is valid when the user redirected before the model got its continuation turn. Repro: session 20260507_044111_fa7e65. Opus-4.7/OpenRouter returned content-less response after a 42KB execute_code output, nudge+retry chain exhausted (no fallback configured), terminal sentinel appended, scaffolding stripped leaving bare tool tail, user typed 'wtf happened..' and landed as tool→user violation. Every subsequent turn collapsed in <50ms with the same 3-retry empty chain because the API request itself was malformed. Verified live via HTTP mock: pre-fix reproduced 5 api_calls/0.15s exit 'empty_response_exhausted'; post-fix 1 api_call/0.10s exit 'text_response(finish_reason=stop)'. Three-turn session flows cleanly through the scenario. Full run_agent suite: 1242 passed (0 regressions, 2 pre-existing concurrent_interrupt failures unrelated).	2026-05-07 08:35:10 -07:00
LeonSGP43	a78e622dfe	fix(agent): honor configured model max tokens	2026-05-07 06:40:30 -07:00
stormhierta	f648c2e3aa	fix: use max_completion_tokens for GitHub Copilot	2026-05-07 06:14:45 -07:00
wxst	2021c18655	fix(agent): drop terminal empty-response sentinels	2026-05-07 05:52:10 -07:00
wxst	e73508979f	fix(agent): avoid persisting empty-response recovery scaffolding	2026-05-07 05:52:10 -07:00
BarnacleBoy	c3be6ec184	feat: add transform_llm_output plugin hook Enables plugins to transform LLM output text after generation, useful for vocabulary/personality transformation without burning inference tokens. Follows same pattern as transform_tool_result and transform_terminal_output: - First non-empty string result wins - Fail-open: exceptions logged as warnings, agent continues - Signature: (response_text, session_id, model, platform)	2026-05-07 05:46:05 -07:00
Teknium	a0fedfbb1b	feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709 ) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, .so/.dylib/.dll, .mp4/.mov, .zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.	2026-05-06 05:44:35 -07:00
rob-maron	2d4eaed111	arcee temperature + compression	2026-05-05 17:23:45 -07:00
kshitijk4poor	20a4f79ed1	feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path Introduces providers/ package — single source of truth for every inference provider. Adding a simple api-key provider now requires one providers/<name>.py file with zero edits anywhere else. What this PR ships: - providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes) - ProviderProfile declarative fields: name, api_mode, aliases, display_name, env_vars, base_url, models_url, auth_type, fallback_models, hostname, default_headers, fixed_temperature, default_max_tokens, default_aux_model - 4 overridable hooks: prepare_messages, build_extra_body, build_api_kwargs_extras, fetch_models - chat_completions.build_kwargs: profile path via _build_kwargs_from_profile, legacy flag path retained for lmstudio/tencent-tokenhub (which have session-aware reasoning probing that doesn't map cleanly to hooks yet) - run_agent.py: profile path for all registered providers; legacy path variable scoping fixed (all flags defined before branching) - Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS, doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER - GeminiProfile: thinking_config translation (native + openai-compat nested) - New tests/providers/ (79 tests covering profile declarations, transport parity, hook overrides, e2e kwargs assembly) Deltas vs original PR (salvaged onto current main): - Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth (were added to main since original PR) - Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their reasoning_effort probing has no clean hook equivalent yet) - Removed lmstudio alias from custom profile (it's a separate provider now) - Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension (resolve_provider special-cases them; adding breaks runtime resolution) - runtime_provider: profile.api_mode only as fallback when URL detection finds nothing (was breaking minimax /v1 override) - Preserved main's legacy-path improvements: deepseek reasoning_content preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M beta recovery, etc. - Kept agent/copilot_acp_client.py in place (rejected PR's relocation — main has 7 fixes landed since; relocation would revert them) - _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing test imports Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Closes #14418	2026-05-05 13:40:01 -07:00
Justin Kausel	526742199b	Prefer fallback for Gemini CloudCode rate limits	2026-05-05 10:14:48 -07:00
rxdxxxx	c46bc92949	fix(run_agent): use aux provider for compression context length lookup Each auxiliary model must be resolved with its own provider so that provider-specific paths (e.g. Bedrock static table, OpenRouter API) are invoked for the correct client, not inherited from the main model. When the main model is Bedrock, passing self.provider unconditionally to get_model_context_length() for the aux model caused the Bedrock static table hard-intercept (step 1b) to fire for non-Bedrock models, returning BEDROCK_DEFAULT_CONTEXT_LENGTH=128K instead of the model's real context window — triggering a false compression warning every session. Fix: pass _aux_cfg_provider when explicitly set, falling back to self.provider only when the aux provider is unset or "auto". Closes #12977 Related: #13807, #17460	2026-05-05 06:12:11 -07:00
Leon	19eebf6e0d	fix(openrouter): treat xiaomi models as reasoning-capable	2026-05-05 06:07:44 -07:00
happy5318	efe1cb00c8	fix: prevent stale reasoning from being reused across turns The reasoning-box extraction loop in run_conversation() walked backwards through the entire message history looking for any assistant message with a non-empty 'reasoning' field. When the current turn produced no reasoning (e.g. the provider returned reasoning_content=null for a trivial response), the loop walked past the current turn and showed reasoning from a prior turn — stale text from minutes or hours ago displayed as if it belonged to the current reply. Fix: stop the walk at the user message that started the current turn. That picks the most recent reasoning WITHIN the turn (correct for tool-calling turns where reasoning lands on the tool-call step and the final-answer step has reasoning=None — common on Claude thinking, DeepSeek v4, Codex Responses), and returns None cleanly when the current turn genuinely had no reasoning. Co-authored-by: happy5318 <happy5318@users.noreply.github.com>	2026-05-05 05:00:05 -07:00
Teknium	2a285d5ec2	fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924 ) (#20184 ) * revert(gateway): remove stale-code self-check and auto-restart Removes the _detect_stale_code / _trigger_stale_code_restart mechanism introduced in #17648 and iterated in #19740. On every incoming message the gateway compared the boot-time git HEAD SHA to the current SHA on disk, and if they differed it would reply with Gateway code was updated in the background -- restarting this gateway so your next message runs on the new code. Please retry in a moment. and then kick off a graceful restart. This is unwanted behaviour: users who run a long-lived gateway and do their own ad-hoc git operations on the checkout end up with their chat interrupted and the current message dropped every time HEAD moves, with no way to opt out. If an operator really needs the old protection against stale sys.modules after "hermes update", the SIGKILL-survivor sweep in hermes update (hermes_cli/main.py, also tagged #17648) already handles the supervisor-respawn case on its own. Removed: gateway/run.py: - _STALE_CODE_SENTINELS, _GIT_SHA_CACHE_TTL_SECS - _read_git_head_sha(), _compute_repo_mtime() module helpers - class-level _boot_wall_time / _boot_repo_mtime / _boot_git_sha / _stale_code_restart_triggered defaults - __init__ boot-snapshot block (_boot_, _cached_current_sha, _repo_root_for_staleness, _stale_code_notified) - _current_git_sha_cached(), _detect_stale_code(), _trigger_stale_code_restart() methods - stale-code check + user-facing restart notice at the top of _handle_message() tests/gateway/test_stale_code_self_check.py (deleted, 412 lines) No new logic added. Zero remaining references to any removed symbol. Gateway test suite passes the same 4589 tests it passed before; the 3 pre-existing unrelated failures (discord free-channel, feishu bot admission, teams typing) are unchanged by this commit. * fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924) Per-delta _strip_think_blocks ran at _fire_stream_delta and destroyed downstream state. When MiniMax-M2.7 / DeepSeek / Qwen3 streamed a tag split across deltas (delta1='<think>', delta2='Let me check'), the regex case-2 match erased delta1 entirely, so CLI/gateway state machines never learned a block was open and leaked delta2 as content. Raw consumers (ACP, api_server, TTS) had no downstream defense at all. Replace the per-delta regex with a stateful StreamingThinkScrubber that survives delta boundaries: - Closed <tag>X</tag> pairs always stripped (matches _strip_think_blocks case 1). - Unterminated open at block boundary enters a block; content discarded until close tag arrives. At end-of-stream, held content is dropped. - Orphan close tags stripped without boundary gating. - Partial tags at delta boundaries held back until resolved. - Block-boundary rule (start-of-stream, after \n, or whitespace-only since last \n) preserves prose that mentions tag names. Reset at turn start alongside the existing context scrubber; flush at turn end so a benign '<' held back at end-of-stream reaches the UI. E2E-verified on live OpenRouter->MiniMax-m2 streams: closed pairs strip cleanly, first word of post-block content is preserved, pure content passes through unchanged. Stefan's screenshot case (#17924) — 'Let me check' getting chopped to ' me check' — no longer happens. Final _strip_think_blocks calls on completed strings (final_response, replay, compression) are preserved; only the streaming per-delta call site switched to the scrubber.	2026-05-05 04:33:38 -07:00
Chris Danis	28f4d6db63	fix(tool-schemas): reactive strip of pattern/format on llama.cpp grammar 400s MCP servers commonly emit JSON Schema `pattern` (e.g. `\\d{4}-\\d{2}-\\d{2}` for date-time params) and `format` keywords. llama.cpp's `json-schema-to-grammar` converter rejects regex escape classes (\\d/\\w/\\s) and most format values, returning HTTP 400 "parse: error parsing grammar: unknown escape at \\d" — the whole request fails. Cloud providers (OpenAI, Anthropic, OpenRouter, Gemini) accept these keywords fine and use them as prompting hints. Stripping unconditionally loses useful hints for every cloud user to fix a llama.cpp-only bug. Approach: classify the llama.cpp grammar-parse 400 in the error classifier, and on match do a one-shot in-place strip of pattern/format from `self.tools`, then retry. Follows the existing `thinking_signature` recovery pattern. Cloud users hit zero overhead; llama.cpp users pay one failed request per session. Changes - agent/error_classifier.py: new `FailoverReason.llama_cpp_grammar_pattern` + narrow HTTP-400 branch matching "error parsing grammar", "json-schema-to-grammar", or "unable to generate parser ... template". - tools/schema_sanitizer.py: new `strip_pattern_and_format()` helper — reactive, walks schema nodes, skips property names (search_files.pattern survives). Returns strip count for logging. - run_agent.py: new one-shot recovery block in the retry loop. Strips, logs, continues. Falls through to normal retry if nothing to strip. - tests: 4 classifier tests (3 variants + 1 non-400 negative), 7 strip tests including the property-name preservation and idempotency checks. Co-authored-by: Chris Danis <cdanis@gmail.com>	2026-05-05 04:25:18 -07:00
Ricardo-M-L	fbc477df71	fix(run_agent): acquire lock in IterationBudget.used property The `used` property was reading `self._used` without holding the lock, while `consume()`, `refund()`, and `remaining` all properly acquire `self._lock` before accessing `_used`. This means a concurrent call to `used` during `consume()` or `refund()` could observe a partially- updated value, leading to incorrect iteration budget metrics reported to the gateway, or in extreme cases a ValueError from CPython's list implementation when the internal array resizes during iteration. Fix: acquire the lock in `used` just like `remaining` does. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-04 12:37:28 -07:00
0xsir0000	52882dade6	fix(agent): include name field on every role:tool message for Gemini compatibility (#16478 ) Gemini's OpenAI-compatibility endpoint strictly requires the `name` field on `role: tool` messages — it returns HTTP 400 ("Request contains an invalid argument") when the function name is missing. OpenAI/Anthropic/ ollama tolerate the absence, so the gap stays invisible until the conversation accumulates a tool turn and the user routes it through Gemini (direct API or via ollama-cloud proxy). Fix: add a `_get_tool_call_name_static()` helper alongside the existing `_get_tool_call_id_static()`, and populate `name` at every site that constructs a `role: tool` message — the pre-call sanitizer stub, the tool-call args repair marker, both interrupt-skip paths, both result-append paths (parallel + sequential), the invalid-tool-name recovery, the invalid-JSON-args recovery, and the exception fallback. Each call site was already in scope of the function name (`function_name`, `skipped_name`, `name`, or a dict tool_call), so the change is local — no new lookups, no behavior change for providers that already worked. Fixes #16478	2026-05-04 05:06:33 -07:00
Teknium	1bd5ac7f2f	fix(self-improvement-loop): bump background-review budget to 16 and suppress status leaks (#19710 ) The background memory/skill review fork had two user-visible issues: 1. max_iterations=8 was too tight for multi-step reviews. A review that needs to skill_view one or two candidate skills, add a memory entry, and patch a skill routinely blew the budget — surfacing an 'Iteration budget exhausted (8/8)' warning to the user and leaving the review half-finished. 2. Mid-review lifecycle messages leaked into the user's terminal past the existing quiet_mode + redirect_stdout/stderr guards. _emit_status and _emit_warning route through _vprint(force=True) -> _print_fn / status_callback, which bypass sys.stdout entirely. The stdout redirect only catches raw print() calls. Changes: - Bump the review fork's max_iterations from 8 to 16. - Set review_agent.suppress_status_output = True on the fork. This short-circuits _vprint unconditionally so _emit_status/_emit_warning emissions (iteration-budget warnings, rate-limit retries, compression messages) never reach the user. The only user-visible output remains the compact final summary line ('💾 Self-improvement review: ...') which is printed via self._safe_print on the main agent (outside the fork's redirect/suppress scope). Summarizer filter is already correct — _summarize_background_review_actions only surfaces tool calls with data.get('success') is truthy, so failed attempts and reasoning text never reach the summary line.	2026-05-04 04:53:44 -07:00
LLing486	145a38a875	fix(agent): preserve dots in model names for Xiaomi MiMo provider Add 'xiaomi' to the _anthropic_preserve_dots() provider whitelist and 'xiaomimimo.com' to the URL-based fallback check. Without this, normalize_model_name() converts mimo-v2.5 to mimo-v2-5, which the Xiaomi API rejects with HTTP 400. Fixes #16156	2026-05-04 03:09:24 -07:00
thchen	51dc98d314	fix(agent): detect Qwen3/Ollama inline thinking after tool calls Ollama serves Qwen3 thinking inside the content field as <think>...</think> blocks rather than in the API-level reasoning_content field. This means _has_structured was False for these responses, so an empty-looking reply after a tool call triggered the nudge instead of the prefill continuation, causing a double-response loop. Fix: detect <think>/<thinking>/<reasoning> in final_response and: 1. Skip the nudge when thinking is present (model is still reasoning) 2. Include _has_inline_thinking in _has_structured so prefill kicks in	2026-05-04 02:47:29 -07:00
QifengKuang	52c539d53a	fix(agent): disable SDK retries on per-request OpenAI clients Per-request OpenAI-wire clients (used by both non-streaming and streaming chat-completions paths in _interruptible_api_call) should not run the SDK's built-in retry loop: the agent's outer loop owns retries with credential rotation, provider fallback, and backoff that the SDK can't see. Leaving SDK retries on (default 2) compounds with our outer retries and lets a single hung provider request stretch to ~3x the per-call timeout before our stale detector reports it. Shared/primary clients and Anthropic / Bedrock paths are unaffected (they don't go through here). Salvage of #15811 core improvement — the timeout push-down in the original PR required scaffolding that has since been refactored on main, so only the max_retries=0 change is preserved. Co-authored-by: QifengKuang <k2767567815@gmail.com>	2026-05-04 02:43:20 -07:00
Teknium	3c070f9f9d	fix(curator): only mark agent-created for background-review sediment (#19621 ) Tighten the provenance semantics added in #19618: skills a user asks a foreground agent to write via skill_manage(create) now stay invisible to the curator. Only skills the background self-improvement review fork sediments through skill_manage get the created_by=agent marker. - tools/skill_provenance.py — new ContextVar module mirroring the _approval_session_key pattern: set_current_write_origin / reset / get / is_background_review. Default origin is 'foreground'; the review fork sets 'background_review'. - run_agent.py — run_conversation() binds the ContextVar from self._memory_write_origin at the top of each call. The review fork runs on its own thread (fresh context), so foreground and review contexts never cross-contaminate. - tools/skill_manager_tool.py — skill_manage(action='create') now only calls mark_agent_created() when is_background_review(). All other cases (foreground create, patch, edit, write_file, delete) continue as before. - tests: test_skill_provenance.py (6 tests covering the ContextVar surface), split test_full_create_via_dispatcher into foreground vs. review-fork variants, curator status tests now mark-first. Why: the agent routinely edits existing user skills on the user's behalf; those writes must never flip provenance. And when a user explicitly asks the foreground agent to create a skill, that skill belongs to the user. The curator should only be cleaning up after its own autonomous sediment from the review nudge loop.	2026-05-04 02:42:16 -07:00
molvikar	cb33c73418	fix(run_agent): gate iteration-limit provider routing to OpenRouter	2026-05-04 01:45:59 -07:00
ai-ag2026	8bdec80882	fix(agent): surface preflight compression status Preflight compression can run synchronously before the first model call when a loaded session exceeds the active context threshold. Gateway users saw no visible progress while the compression LLM call was in flight, which can look like a dropped message during long compactions.\n\nEmit the existing lifecycle status through _emit_status before starting preflight compression so CLI, gateway, and WebUI status callbacks all get immediate feedback.\n\nAdds a regression assertion for the preflight path.	2026-05-04 01:41:51 -07:00
kshitij	457c7b76cd	feat(openrouter): add response caching support (#19132 ) Enable OpenRouter's response caching feature (beta) via X-OpenRouter-Cache headers. When enabled, identical API requests return cached responses for free (zero billing), reducing both latency and cost. Configuration via config.yaml: openrouter: response_cache: true # default: on response_cache_ttl: 300 # 1-86400 seconds Changes: - Add openrouter config section to DEFAULT_CONFIG (response_cache + TTL) - Add build_or_headers() in auxiliary_client.py that builds attribution headers plus optional cache headers based on config - Replace inline _OR_HEADERS dicts with build_or_headers() at all 5 sites: run_agent.py __init__, _apply_client_headers_for_base_url(), and auxiliary_client.py _try_openrouter() + _to_async_client() - Add _check_openrouter_cache_status() method to AIAgent that reads X-OpenRouter-Cache-Status from streaming response headers and logs HIT/MISS status - Document in cli-config.yaml.example - Add 28 tests (22 unit + 6 integration) Ref: https://openrouter.ai/docs/guides/features/response-caching	2026-05-03 01:54:24 -07:00
luyao618	13f344c5ce	fix(agent): try fallback providers at init when primary credential pool is exhausted (#17929 ) When a provider's credential pool has a single entry in 429-cooldown, resolve_provider_client returns None and AIAgent.__init__ raises a misleading RuntimeError suggesting the API key is missing — even when valid fallback_providers are configured. This patch makes __init__ iterate the fallback chain before raising, mirroring the existing in-flight fallback logic in the request loop. If a fallback resolves, the agent initializes against it and sets _fallback_activated=True so _restore_primary_runtime can pick the primary back up after cooldown. Closes #17929	2026-05-02 02:09:46 -07:00
Siddharth Balyan	c5b4c48165	fix: lazy session creation — defer DB row until first message (#18370 ) Prevents ghost sessions from accumulating in state.db when the TUI/web dashboard is opened and closed without sending a message. Changes: - run_agent.py: Add _ensure_db_session() gate method, called at run_conversation() entry. Remove eager create_session() from __init__. Handle compression rotation flag correctly. - tui_gateway/server.py: Remove eager db.create_session() in _start_agent_build(). Add post-first-message pending_title re-apply. - hermes_state.py: Extract _insert_session_row() shared helper (DRY). Add prune_empty_ghost_sessions() for one-time migration. - cli.py: One-time ghost session prune on startup. Fix _pending_title to call _ensure_db_session() before set_session_title(). - hermes_cli/main.py: Guard TUI exit summary on message_count > 0. - tests: Update test_860_dedup to call _ensure_db_session() before direct _flush_messages_to_session_db() calls. Closes: ghost session clutter in hermes sessions list and web dashboard.	2026-05-01 18:39:12 +05:30
IMHaoyan	bfb704684e	fix(deepseek): use non-empty reasoning_content placeholder for V4 Pro thinking mode DeepSeek V4 Pro tightened thinking-mode validation and rejects empty-string reasoning_content with HTTP 400: The reasoning content in the thinking mode must be passed back to the API. run_agent.py injected "" at three fallback sites — the tool-call pad in _build_assistant_message and both injection branches of _copy_reasoning_content_for_api (cross-provider poison guard + unconditional thinking pad). All three now emit " " (single space), which satisfies the non-empty check on V4 Pro without leaking fabricated reasoning. Also upgrades stale empty-string placeholders on replay: sessions persisted before this change have reasoning_content="" pinned at creation time; when the active provider enforces thinking-mode echo, the replay path now rewrites "" -> " " so existing users don't 400 on their first V4 Pro turn after updating. Non-thinking providers still round-trip "" verbatim. Updates 9 existing assertions + adds 2 regression tests (stale-placeholder upgrade, non-thinking verbatim preservation). Refs #15250, #17400. Closes #17341.	2026-04-30 23:04:23 -07:00
Teknium	f0dc919f92	fix(compression): include system prompt + tool schemas in token estimates (#18265 ) The user-visible /compress banner and the post-compression last_prompt_tokens writeback both counted only the raw message transcript (chars/4). With a 15KB system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of request pressure — a 234x gap. Two user-facing consequences: - Banner shows 'Compressing … (~45 tokens)…' while compression is actually firing on 10K+ tokens of real pressure, confusing users about why compression triggered (reported by @codecovenant on X; #6217). - Post-compression last_prompt_tokens writeback omits tool schemas, so the next should_compress() check compares real usage against a stale underestimate — compression triggers late, potentially past the model's context limit on small-context models (#14695). Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough() at every user-visible banner and at the post-compression writeback. estimate_request_tokens_rough() already existed for exactly this purpose and includes system prompt + tool schemas. Touched call sites: - run_agent.py: post-compression last_prompt_tokens writeback, post-tool call should_compress() fallback when provider usage is missing - cli.py: /compress banner + summary - gateway/run.py: gateway /compress banner + summary - tui_gateway/server.py: TUI /compress status + summary - acp_adapter/server.py: ACP /compact before/after Left intentionally alone: - Session-hygiene fallback and the 'no agent' /status path in gateway/run.py — no agent instance is in scope to query for system prompt/tools, and the existing 30-50% overestimate wobble on hygiene is safety-accepted. - Verbose-mode 'Request size' logging — informational only, already counts system prompt via api_messages[0]. Also relabels the feedback line from 'Rough transcript estimate' to 'Approx request size' so the metric label matches what it actually measures. Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217); user report @codecovenant on X (2026-04-30). Closes #14695 Closes #6217	2026-04-30 23:03:54 -07:00
Mind-Dragon	0704589ceb	fix(agent): make tool loop guardrails warning-first	2026-04-30 20:43:15 -07:00
Mind-Dragon	58b89965c8	fix(agent): add tool-call loop guardrails	2026-04-30 20:43:15 -07:00
Stephen Schoettler	b29b709a71	fix(agent): sanitize Codex tool-call history summaries	2026-04-30 19:58:46 -07:00
Teknium	e5dad4ac57	fix(agent): propagate ContextVars to concurrent tool worker threads (#18123 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-and-push (push) Waiting to run Details Nix Lockfile Fix / auto-fix-main (push) Waiting to run Details Nix Lockfile Fix / fix (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (push) Waiting to run Details Tests / e2e (push) Waiting to run Details Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics. Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd). Salvages #16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site. Closes #16660. Co-authored-by: firefly <promptsiren@gmail.com> Co-authored-by: banditburai <banditburai@users.noreply.github.com>	2026-04-30 16:26:26 -07:00
Teknium	80a676658c	fix(cli): surface self-improvement review summaries from bg thread When the self-improvement background review fires after a turn, it runs in a bg thread and emits a ' 💾 <summary>' line to announce what it saved to memory or skills. Two problems made this invisible to users even when the review successfully modified a skill: 1. The print went through `_cprint` (prompt_toolkit's print_formatted_text) on a bg thread while the CLI's PromptSession was live. Direct print_formatted_text races with the input-area redraw and the line can land behind/above the prompt, scrolled off without the user seeing it. 2. The message said only '💾 Skill created.' / '💾 Memory updated' with no indication that the self-improvement loop was the one doing this. Users who did catch the line couldn't tell the background review from some other agent action. Fixes: - `_cprint` now detects when it's called from a non-app thread with a running prompt_toolkit Application, and routes through `run_in_terminal` via `loop.call_soon_threadsafe`. That pauses the input, prints the line above the prompt, and redraws — the normal prompt_toolkit contract for bg-thread output. Direct-print fallback preserved for the no-app / same-thread / import-error paths. Affects every bg-thread emission, not just the review summary (curator summaries and auxiliary failure prints benefit too). - The summary now reads ' 💾 Self-improvement review: <summary>' in both the CLI and the gateway `background_review_callback` path, so the origin is unambiguous. Tests: - New `tests/cli/test_cprint_bg_thread.py` covers all five routing branches (no app, app-not-running, cross-thread schedule, same-thread direct, app-loop-attribute-error, import-error). - New case in `tests/run_agent/test_background_review.py` asserts the attributed prefix shows up in both `_safe_print` and `background_review_callback`. Live E2E: exercised _cprint from a bg thread inside a real Application event loop; confirmed get_app_or_none() sees the app, call_soon_threadsafe schedules run_in_terminal, and the inner _pt_print runs.	2026-04-30 14:07:22 -07:00
Teknium	c868425467	feat(kanban): durable multi-profile collaboration board (#17805 ) Salvage of PR #16100 onto current main (after emozilla's #17514 fix that unblocks plugin Pydantic body validation). History preserved on the standing `feat/kanban-standing` branch; this squashes the 22 iterative commits into one clean landing. What this lands: - SQLite kernel (hermes_cli/kanban_db.py) — durable task board with tasks, task_links, task_runs, task_comments, task_events, kanban_notify_subs tables. WAL mode, atomic claim via CAS, tenant-namespaced, skills JSON array per task, max-runtime timeouts, worker heartbeats, idempotency keys, circuit breaker on repeated spawn failures, crash detection via /proc/<pid>/status, run history preserved across attempts. - Dispatcher — runs inside the gateway by default (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims stale claims, promotes ready tasks, spawns `hermes -p <assignee> chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK + HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker` plus any per-task skills. Health telemetry warns on stuck ready queue. - Structured tool surface (tools/kanban_tools.py) — 7 tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link). Gated on HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal sessions. - System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE) injected only when kanban tools are active. - Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board UI: triage/todo/ready/running/blocked/done columns, drag-drop, inline create, task drawer with markdown, comments, run history, dependency editor, bulk ops, lanes-by-profile grouping, WS-driven live refresh. Matches active dashboard theme via CSS variables. - CLI — `hermes kanban init\|create\|list\|show\|assign\|link\|unlink\| claim\|comment\|complete\|block\|unblock\|archive\|tail\|dispatch\|context\| init\|gc\|watch\|stats\|notify\|log\|heartbeat\|runs\|assignees` + `/kanban` slash in-session. - Worker + orchestrator skills (skills/devops/kanban-worker + kanban-orchestrator) — pattern library for good summary/metadata shapes, retry diagnostics, block-reason examples, fan-out patterns. - Per-task force-loaded skills — `--skill <name>` (repeatable), stored as JSON, threaded through to dispatcher argv as one `--skills X` pair per skill alongside the built-in kanban-worker. Dashboard + CLI + tool parity. - Deprecation of standalone `hermes kanban daemon` — stub exits 2 with migration guidance; `--force` escape hatch for headless hosts. - Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md) with 11 dashboard screenshots walking through four user stories (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker). - Tests (251 passing): kernel schema + migration + CAS atomicity, dispatcher logic, circuit breaker, crash detection, max-runtime timeouts, claim lifecycle, tenant isolation, idempotency keys, per- task skills round-trip + validation + dispatcher argv, tool surface (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk + links + warnings), gateway-embedded dispatcher (config gate, env override, graceful shutdown), CLI deprecation stub, migration from legacy schemas. Gateway integration: - GatewayRunner._kanban_dispatcher_watcher — new asyncio background task, symmetric with _kanban_notifier_watcher. Runs dispatch_once via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override for debugging. - Config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`. Additive — no \_config_version bump needed. Forward-compat: - workflow_template_id / current_step_key columns on tasks (v1 writes NULL; v2 will use them for routing). - task_runs holds claim machinery (claim_lock, claim_expires, worker_pid, last_heartbeat_at) so multi-attempt history is first- class from day one. Closes #16102. Co-authored-by: emozilla <emozilla@nousresearch.com>	2026-04-30 13:36:47 -07:00
Teknium	76edc40ab0	fix(agent): extend thinking-mode reasoning_content pad to Kimi/Moonshot Builds on #16855 (@lsdsjy) which fixed DeepSeek v4 reasoning_content replay via model_extra fallback + capturing tool_calls at method entry. Kimi / Moonshot thinking mode enforces the same echo-back contract and hits the same 400 when a tool-call turn is persisted without reasoning_content. - _build_assistant_message: pad branch now uses _needs_thinking_reasoning_pad() (DeepSeek OR Kimi) instead of _needs_deepseek_tool_reasoning() alone. - Extract _needs_thinking_reasoning_pad() and reuse it in _copy_reasoning_content_for_api so both sites share one predicate. - tests/run_agent/test_deepseek_reasoning_content_echo.py: add TestBuildAssistantMessagePadsStrictProviders parametrized over DeepSeek (attr=None, attr-absent), Kimi (attr=None), Moonshot (via base_url), and an OpenRouter negative control that must NOT pad. Proven to fail 2/5 cases on Kimi/Moonshot without this change. - scripts/release.py: add AUTHOR_MAP entries for lsdsjy and season179. Refs #17400. Co-authored-by: season179 <season.saw@gmail.com>	2026-04-30 11:18:39 -07:00
lsdsjy	b9b9ee3e6c	fix(deepseek): preserve v4 reasoning_content on replay	2026-04-30 11:18:39 -07:00
Teknium	e8e5985ce6	fix(curator): seed defaults on update, create logs/curator dir, defer fire import (#17927 ) Three fixes bundled for curator reliability on existing installs and broken/partial installs: 1. run_agent.py: defer `import fire` into the __main__ block. `fire` is only used by `fire.Fire(main)` when running run_agent.py directly as a CLI — it is NOT needed for library usage. Importing it at module top made `from run_agent import AIAgent` from a daemon thread (e.g. the curator's forked review agent) crash with ModuleNotFoundError on broken/partial installs where `fire` isn't present. 2. hermes_cli/config.py: add version 22 → 23 migration that writes the `curator` + `auxiliary.curator` sections to config.yaml with their defaults, only filling keys the user hasn't overridden. Existing configs from before PR #16049 / the April 2026 `auxiliary.curator` unification had neither section on disk, so users couldn't see or edit the settings in their config.yaml (runtime deep-merge papered over it at read time, but the file never reflected reality). 3. hermes_cli/config.py: `ensure_hermes_home()` now pre-creates `~/.hermes/logs/curator/` alongside cron/sessions/logs/memories on every CLI launch. Managed-mode (NixOS) variant mkdir's it defensively after the activation-script existence checks, since the activation script may not know about this subpath. 4. agent/curator.py: `_reports_root()` mkdir's the dir at call time as belt-and-suspenders for entry paths that bypass both ensure_hermes_home() and the v23 migration (gateway-only installs, bare library use). E2E validated in isolated HERMES_HOME: fresh install gets full defaults seeded; partial-override config keeps user's `enabled: false` and custom `interval_hours` while filling the missing keys; re-running the migration is a no-op.	2026-04-30 04:52:28 -07:00
Sanjays2402	e0fa2cf972	fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335 ) Long-lived Gateway processes were sending duplicate tool names to providers that enforce uniqueness: - DeepSeek: 'Tool names must be unique.' - Xiaomi MiMo: 'tools contains duplicate names: lcm_expand' - Moonshot/Kimi: 'function name lcm_grep is duplicated' TUI was unaffected because TUI runs with quiet_mode=False and skips the cache entirely. Root cause (two layered bugs) - model_tools.get_tool_definitions(quiet_mode=True) memoizes its result in _tool_defs_cache. The cache-hit path returned list(cached) (safe), but the FIRST uncached call stored and returned the SAME object. run_agent.py mutates self.tools (memory + LCM context-engine schemas) in-place, so the very first agent init in a Gateway process poisoned the cache, and every subsequent init appended LCM schemas again on top of the already-polluted list. - run_agent.py's context-engine injection (lcm_grep / lcm_describe / lcm_expand) had no dedup, unlike the memory-tools injection right above it which already skips already-present names. Fix (defense in depth, per the issue's suggested fix) - model_tools.get_tool_definitions: on the uncached branch, cache the computed list but return list(result) to the caller. Same pattern as the cache-hit path. - run_agent.py: build _existing_tool_names from self.tools and skip schemas whose names are already present, mirroring the memory-tools block. This also defends against plugin paths that may register the same schemas via ctx.register_tool(). Tests (tests/test_get_tool_definitions_cache_isolation.py) - test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without it, first-call alias caused all the symptoms. - test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays. - test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent appending lcm_grep / lcm_expand to the returned list and asserts the next call doesn't see them. - test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the long-lived Gateway accumulation pattern across 5 agent inits. - test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI was fine. 5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.	2026-04-30 04:32:06 -07:00
Rob Moen	0dd373ec43	fix(context): honor model.context_length for Ollama num_ctx and all display paths When a user sets model.context_length in config.yaml, the value was only used for Hermes' internal compression decisions (context_compressor) but NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF metadata (often 256K+) and allocates that much VRAM regardless of the user's config — causing OOM on smaller GPUs like the P100 (16GB). Root cause: two separate context values existed independently: - context_compressor.context_length = config value (e.g. 65536) ✓ - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config Changes: 1. Cap Ollama num_ctx to config context_length (run_agent.py) When model.context_length is explicitly set and no explicit ollama_num_ctx override exists, cap the auto-detected GGUF value to the user's context_length. This is the core fix — it prevents Ollama from allocating more VRAM than the user budgeted. 2. Pass config_context_length through all secondary call sites Several paths called get_model_context_length() without the config override, falling through to the 256K default fallback: - cli.py: @-reference expansion and /model switch display - gateway/run.py: @-reference expansion and /model switch display - tui_gateway/server.py: @-reference expansion - hermes_cli/model_switch.py: resolve_display_context_length() 3. Normalize root-level context_length in config (hermes_cli/config.py) _normalize_root_model_keys() now migrates root-level context_length into the model section, matching existing behavior for provider and base_url. Users who wrote `context_length: 65536` at the YAML root instead of under `model:` had it silently ignored. 4. Fix misleading comments (agent/model_metadata.py) DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K as two comments stated. Tests: 3 new tests for root-level context_length normalization. All existing context_length tests pass (96 tests).	2026-04-30 04:31:23 -07:00

1 2 3 4 5 ...

869 commits