hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-26 01:01:40 +00:00

Author	SHA1	Message	Date
bobashopcashier	b49a1b71a7	fix(agent): accept empty content with stop_reason=end_turn as valid anthropic response Anthropic's API can legitimately return content=[] with stop_reason="end_turn" when the model has nothing more to add after a turn that already delivered the user-facing text alongside a trivial tool call (e.g. memory write). The transport validator was treating that as an invalid response, triggering 3 retries that each returned the same valid-but-empty response, then failing the run with "Invalid API response after 3 retries." The downstream normalizer already handles empty content correctly (empty loop over response.content, content=None, finish_reason="stop"), so the only fix needed is at the validator boundary. Tests: - Empty content + stop_reason="end_turn" → valid (the fix) - Empty content + stop_reason="tool_use" → still invalid (regression guard) - Empty content without stop_reason → still invalid (existing behavior preserved)	2026-04-22 14:26:23 -07:00
Teknium	ea67e49574	fix(streaming): silent retry when stream dies mid tool-call (#14151 ) When the streaming connection dropped AFTER user-visible text was delivered but a tool call was in flight, we stubbed the turn with a '⚠ Stream stalled mid tool-call; Ask me to retry' warning — costing an iteration and breaking the flow. Users report this happening increasingly often on long SSE streams through flaky provider routes. Fix: in the existing inner stream-retry loop, relax the deltas_were_sent short-circuit. If a tool call was in flight (partial_tool_names populated) AND the error is a transient connection error (timeout, RemoteProtocolError, SSE 'connection lost', etc.), silently retry instead of bailing out. Fire a brief 'Connection dropped mid tool-call; reconnecting…' marker so the user understands the preamble is about to be re-streamed. Researched how Claude Code (tombstone + non-streaming fallback), OpenCode (blind Effect.retry wrapping whole stream), and Clawdbot (4-way gate: stopReason==error + output==0 + !hadPotentialSideEffects) handle this. Chose the narrow Clawdbot-style gate: retry only when (a) a tool call was actually in flight (otherwise the existing stub-with-recovered-text is correct for pure-text stalls) and (b) the error is transient. Side-effect safety is automatic — no tool has been dispatched within this single API call yet. UX trade-off: user sees preamble text twice on retry (OpenCode-style). Strictly better than a lost action with a 'retry manually' message. If retries exhaust, falls through to the existing stub-with-warning path so the user isn't left with zero signal. Tests: 3 new tests in TestSilentRetryMidToolCall covering (1) silent retry recovers tool call; (2) exhausted retries fall back to stub; (3) text-only stalls don't trigger retry. 30/30 pass.	2026-04-22 13:47:33 -07:00
Teknium	88564ad8bc	fix(skins): don't inherit status_bar_* into light-mode skins The salvaged status-bar skin keys were seeded on the default skin, but _build_skin_config merges default.colors into every skin — so daylight and warm-lightmode silently inherited silver status_bar_text (#C0C0C0) on their light backgrounds, rendering as low-contrast gray on gray. Drop the seven status_bar_{text,strong,dim,good,warn,bad,critical} entries from the default skin's colors and let get_prompt_toolkit_style _overrides fall back to banner_text / banner_title / banner_dim / ui_ok / ui_warn / ui_error. Dark skins keep their explicit overrides and render identically; light skins now inherit their own dark banner colors for readable status-bar text.	2026-04-22 13:20:02 -07:00
kshitij	81a504a4a0	fix: align status bar skin tests with upstream main Drop rebased test assumptions about theme-mode helpers removed on main and keep the status bar skin integration aligned with the current skin engine model.	2026-04-22 13:20:02 -07:00
kshitij	c323217188	fix: make CLI status bar skin-aware Route prompt_toolkit status bar colors through the skin engine so /skin updates the status bar alongside the rest of the interactive TUI. Add regression coverage for the new status bar style override keys and CLI style composition.	2026-04-22 13:20:02 -07:00
kshitijk4poor	de849c410d	refactor(debug): remove dead _read_log_tail/_read_full_log wrappers These thin wrappers around _capture_log_snapshot had zero production callers after the snapshot refactor — run_debug_share uses snapshots directly and collect_debug_report captures internally. The wrappers also caused a performance regression: _read_log_tail read up to 512KB and built full_text just to return tail_text. Remove both wrappers and migrate TestReadFullLog → TestCaptureLogSnapshot to test _capture_log_snapshot directly. Same coverage, tests the real API instead of dead indirection.	2026-04-22 11:59:39 -07:00
kshitijk4poor	8dc936f10e	chore: add taosiyuan163 to AUTHOR_MAP, add truncation boundary tests Add missing AUTHOR_MAP entry for taosiyuan163 whose truncation boundary fix was adapted into _capture_log_snapshot(). Add regression tests proving: line-boundary truncation keeps the full first line, mid-line truncation correctly drops the partial fragment.	2026-04-22 11:59:39 -07:00
Junass1	61d0a99c11	fix(debug): sweep expired pending pastes on slash debug paths	2026-04-22 11:59:39 -07:00
kshitijk4poor	921133cfa5	fix(debug): preserve full line at truncation boundary and cap memory Adapt the byte-boundary-safe truncation fix from PR #14040 by taosiyuan163 into the new _capture_log_snapshot() code path: when the truncation cut lands exactly on a line boundary, keep the first retained line instead of unconditionally dropping it. Also add a 2x max_bytes safety cap to the backward-reading loop to prevent unbounded memory consumption when log files contain very long lines (e.g. JSON blobs) with few newlines. Based on #14040 by @taosiyuan163.	2026-04-22 11:59:39 -07:00
helix4u	fc3862bdd6	fix(debug): snapshot logs once for debug share	2026-04-22 11:59:39 -07:00
brooklyn!	bc5da42b2c	Merge pull request #14045 from NousResearch/bb/subagent-observability feat(tui): subagent spawn observability overlay	2026-04-22 12:21:25 -05:00
Brooklyn Nicholson	5b0741e986	refactor(tui): consolidate agents overlay — share duration/root helpers via lib Pull duplicated rules into ui-tui/src/lib/subagentTree so the live overlay, disk snapshot label, and diff pane all speak one dialect: - export fmtDuration(seconds) — was a private helper in subagentTree; agentsOverlay's local secLabel/fmtDur/fmtElapsedLabel now wrap the same core (with UI-only empty-string policy). - export topLevelSubagents(items) — matches buildSubagentTree's orphan semantics (no parent OR parent not in snapshot). Replaces three hand- rolled copies across createGatewayEventHandler (disk label), agentsOverlay DiffPane, and prior inline filters. Also collapse agentsOverlay boilerplate: - replace IIFE title + inner `delta` helper with straight expressions; - introduce module-level diffMetricLine for replay-diff rows; - tighten OverlayScrollbar (single thumbColor expression, vBar/thumbBody). Adds unit coverage for the new exports (fmtDuration + topLevelSubagents). No behaviour change; 221 tests pass.	2026-04-22 12:10:21 -05:00
Brooklyn Nicholson	9e1f606f7f	fix: scroll in agents detail view	2026-04-22 12:03:14 -05:00
Brooklyn Nicholson	7eae504d15	fix(tui): address Copilot round-2 on #14045 - delegate_task: use shared tool_error() for the paused-spawn early return so the error envelope matches the rest of the tool. - Disk snapshot label: treat orphaned nodes (parentId missing from the snapshot) as top-level, matching buildSubagentTree / summarizeLabel.	2026-04-22 11:54:19 -05:00
Brooklyn Nicholson	eda400d8a5	chore: uptick	2026-04-22 11:32:17 -05:00
Brooklyn Nicholson	82197a87dc	style(tui): breathing room around status glyphs in agents overlay - List rows: pad the status dot with space before (heat-marker gap or matching 2-space filler) and after (3 spaces to goal) so `●` / `○` / `✓` / `■` / `✗` don't read glued to the heat bar or the goal text. - Gantt rows: bump id→bar separator from 1 to 2 spaces; widen the id gutter from 4 to 5 cols and re-align the ruler lead to match.	2026-04-22 11:01:22 -05:00
Brooklyn Nicholson	dee51c1607	fix(tui): address Copilot review on #14045 Four real issues Copilot flagged: 1. delegate_tool: `_build_child_agent` never passed `toolsets` to the progress callback, so the event payload's `toolsets` field (wired through every layer) was always empty and the overlay's toolsets row never populated. Thread `child_toolsets` through. 2. event handler: the race-protection on subagent.spawn_requested / subagent.start only preserved `completed`, so a late-arriving queued event could clobber `failed` / `interrupted` too. Preserve any terminal status (`completed \| failed \| interrupted`). 3. SpawnHud: comment claimed concurrency was approximated by "widest level in the tree" but code used `totals.activeCount` (total across all parents). `max_concurrent_children` is a per-parent cap, so activeCount over-warns for multi-orchestrator runs. Switch to `max(widthByDepth(tree))`; the label now reads `⚡W/cap+extra` where W is the widest level (drives the ratio) and `+extra` is the rest. 4. spawn_tree.list: comment said "peek header without parsing full list" but the code json.loads()'d every snapshot. Adds a per-session `_index.jsonl` sidecar written on save; list() reads only the index (with a full-scan fallback for pre-index sessions). O(1) per snapshot now vs O(file-size).	2026-04-22 10:56:32 -05:00
kshitijk4poor	5e8262da26	chore: add rnijhara to AUTHOR_MAP	2026-04-22 08:49:24 -07:00
kshitijk4poor	1f216ecbb4	feat(gateway/slack): add SLACK_REACTIONS env toggle for reaction lifecycle Adds _reactions_enabled() gating to match Discord (DISCORD_REACTIONS) and Telegram (TELEGRAM_REACTIONS) pattern. Defaults to true to preserve existing behavior. Gates at three levels: - _handle_slack_message: skips _reacting_message_ids registration - on_processing_start: early return - on_processing_complete: early return Also adds config.yaml bridge (slack.reactions) and two new tests.	2026-04-22 08:49:24 -07:00
Roopak Nijhara	70a33708e7	fix(gateway/slack): align reaction lifecycle with Discord/Telegram pattern Slack reactions were placed around handle_message(), which returns immediately after spawning a background task. This caused the 👀 → ✅ swap to happen before any real work began. Fix: implement on_processing_start / on_processing_complete callbacks (matching Discord/Telegram) so reactions bracket actual _message_handler work driven by the base class. Also fixes missing stop_typing() for Slack's assistant thread status indicator, which left 'is thinking...' stuck in the UI after processing completed. - Add _reacting_message_ids set for DM/@mention-only gating - Add _active_status_threads dict for stop_typing lookup - Update test_reactions_in_message_flow for new callback pattern - Add test_reactions_failure_outcome and test_reactions_skipped_for_non_dm_non_mention	2026-04-22 08:49:24 -07:00
Brooklyn Nicholson	f06adcc1ae	chore(tui): drop unreachable return + prettier pass - createGatewayEventHandler: remove dead `return` after a block that always returns (tool.complete case). The inner block exits via both branches so the outer statement was never reachable. Was pre-existing on main; fixed here because it was the only thing blocking `npm run fix` on this branch. - agentsOverlay + ops: prettier reformatting. `npm run fix` / `npm run type-check` / `npm test` all clean.	2026-04-22 10:43:59 -05:00
Brooklyn Nicholson	06ebe34b40	fix(tui): repair useInput handler in agents overlay The Write tool that wrote the cleaned overlay split the `if` keyword across two lines in 9 places (` i\nf (cond) {`), which silently passed one typecheck run but actually left the handler as broken JS — every keystroke threw. Input froze in the /agents overlay (j/k/arrows/q/etc. all no-ops) while the 500ms now-tick kept rendering, so the UI looked "frozen but the timeline moves". Reflows the handler as-intended with no behaviour change.	2026-04-22 10:41:13 -05:00
Brooklyn Nicholson	7785654ad5	feat(tui): subagent spawn observability overlay Adds a live + post-hoc audit surface for recursive delegate_task fan-out. None of cc/oc/oclaw tackle nested subagent trees inside an Ink overlay; this ships a view-switched dashboard that handles arbitrary depth + width. Python - delegate_tool: every subagent event now carries subagent_id, parent_id, depth, model, tool_count; subagent.complete also ships input/output/ reasoning tokens, cost, api_calls, files_read/files_written, and a tail of tool-call outputs - delegate_tool: new subagent.spawn_requested event + _active_subagents registry so the overlay can kill a branch by id and pause new spawns - tui_gateway: new RPCs delegation.status, delegation.pause, subagent.interrupt, spawn_tree.save/list/load (disk under \$HERMES_HOME/spawn-trees/<session>/<ts>.json) TUI - /agents overlay: full-width list mode (gantt strip + row picker) and Enter-to-drill full-width scrollable detail mode; inverse+amber selection, heat-coloured branch markers, wall-clock gantt with tick ruler, per-branch rollups - Detail pane: collapsible accordions (Budget, Files, Tool calls, Output, Progress, Summary); open-state persists across agents + mode switches via a shared atom - /replay [N\|last\|list\|load <path>] for in-memory + disk history; /replay-diff <a> <b> for side-by-side tree comparison - Status-bar SpawnHud warns as depth/concurrency approaches caps; overlay auto-follows the just-finished turn onto history[1] - Theme: bump DARK dim #B8860B → #CC9B1F for readable secondary text globally; keep LIGHT untouched Tests: +29 new subagentTree unit tests; 215/215 passing.	2026-04-22 10:38:17 -05:00
kshitijk4poor	04e039f687	fix: Kimi /coding thinking block survival + empty reasoning_content + block ordering Follow-up to the cherry-picked PR #13897 fix. Three issues found: 1. CRITICAL: The thinking block synthesised from reasoning_content was immediately stripped by the third-party signature management code (Kimi is classified as _is_third_party_anthropic_endpoint). Added a Kimi-specific carve-out that preserves unsigned thinking blocks while still stripping Anthropic-signed blocks Kimi can't validate. 2. Empty-string reasoning_content was silently dropped because the truthiness check ('if reasoning_content and ...') evaluates to False for ''. Changed to 'isinstance(reasoning_content, str)' so the tier-3 fallback from _copy_reasoning_content_for_api (which injects '' for Kimi tool-call messages with no reasoning) actually produces a thinking block. 3. The thinking block was appended AFTER tool_use blocks. Anthropic protocol requires thinking -> text -> tool_use ordering. Changed to blocks.insert(0, ...) to prepend.	2026-04-22 08:21:23 -07:00
Jerome	97a536057d	chore(release): add hiddenpuppy to AUTHOR_MAP Map tsuijinglei@gmail.com → hiddenpuppy.	2026-04-22 08:21:23 -07:00
Jerome	2efb0eea21	fix(anthropic_adapter): preserve reasoning_content on assistant tool-call messages for Kimi /coding Fixes NousResearch/hermes-agent#13848 Kimi's /coding endpoint speaks the Anthropic Messages protocol but has its own thinking semantics: when thinking is enabled, Kimi validates message history and requires every prior assistant tool-call message to carry OpenAI-style reasoning_content. The Anthropic path never populated that field, and convert_messages_to_anthropic strips all Anthropic thinking blocks on third-party endpoints — so the request failed with HTTP 400: "thinking is enabled but reasoning_content is missing in assistant tool call message at index N" Now, when an assistant message contains tool_calls and a reasoning_content string, we append a {"type": "thinking", ...} block to the Anthropic content so Kimi can validate the history. This only affects assistant messages with tool_calls + reasoning_content; plain text assistant messages are unchanged.	2026-04-22 08:21:23 -07:00
Teknium	77e04a29d5	fix(error_classifier): don't classify generic 404 as model_not_found (#14013 ) The 404 branch in _classify_by_status had dead code: the generic fallback below the _MODEL_NOT_FOUND_PATTERNS check returned the exact same classification (model_not_found + should_fallback=True), so every 404 — regardless of message — was treated as a missing model. This bites local-endpoint users (llama.cpp, Ollama, vLLM) whose 404s usually mean a wrong endpoint path, proxy routing glitch, or transient backend issue — not a missing model. Claiming 'model not found' misleads the next turn and silently falls back to another provider when the real problem was a URL typo the user should see. Fix: only classify 404 as model_not_found when the message actually matches _MODEL_NOT_FOUND_PATTERNS ("invalid model", "model not found", etc.). Otherwise fall through as unknown (retryable) so the real error surfaces in the retry loop. Test updated to match the new behavior. 103 error_classifier tests pass.	2026-04-22 06:11:47 -07:00
Yukipukii1	40619b393f	tools: normalize file tool pagination bounds	2026-04-22 06:11:41 -07:00
Teknium	3e652f75b2	fix(plugins+nous): auto-coerce memory plugins; actionable Nous 401 diagnostic (#14005 ) * fix(plugins): auto-coerce user-installed memory plugins to kind=exclusive User-installed memory provider plugins at $HERMES_HOME/plugins/<name>/ were being dispatched to the general PluginManager, which has no register_memory_provider method on PluginContext. Every startup logged: Failed to load plugin 'mempalace': 'PluginContext' object has no attribute 'register_memory_provider' Bundled memory providers were already skipped via skip_names={memory, context_engine} in discover_and_load, but user-installed ones weren't. Fix: _parse_manifest now scans the plugin's __init__.py source for 'register_memory_provider' or 'MemoryProvider' (same heuristic as plugins/memory/__init__.py:_is_memory_provider_dir) and auto-coerces kind to 'exclusive' when the manifest didn't declare one explicitly. This routes the plugin to plugins/memory discovery instead of the general loader. The escape hatch: if a manifest explicitly declares kind: standalone, the heuristic doesn't override it. Reported by Uncle HODL on Discord. * fix(nous): actionable CLI message when Nous 401 refresh fails Mirrors the Anthropic 401 diagnostic pattern. When Nous returns 401 and the credential refresh (_try_refresh_nous_client_credentials) also fails, the user used to see only the raw APIError. Now prints: 🔐 Nous 401 — Portal authentication failed. Response: <truncated body> Most likely: Portal OAuth expired, account out of credits, or agent key revoked. Troubleshooting: • Re-authenticate: hermes login --provider nous • Check credits / billing: https://portal.nousresearch.com • Verify stored credentials: $HERMES_HOME/auth.json • Switch providers temporarily: /model <model> --provider openrouter Addresses the common 'my hermes model hangs' pattern where the user's Portal OAuth expired and the CLI gave no hint about the next step.	2026-04-22 05:54:11 -07:00
kshitijk4poor	5fb143169b	feat(dashboard): track real API call count per session Adds schema v7 'api_call_count' column. run_agent.py increments it by 1 per LLM API call, web_server analytics SQL aggregates it, frontend uses the real counter instead of summing sessions. The 'API Calls' card on the analytics dashboard previously displayed COUNT(*) from the sessions table — the number of conversations, not LLM requests. Each session makes 10-90 API calls through the tool loop, so the reported number was ~30x lower than real. Salvaged from PR #10140 (@kshitijk4poor). The cache-token accuracy portions of the original PR were deferred — per-provider analytics is the better path there, since cache_write_tokens and actual_cost_usd are only reliably available from a subset of providers (Anthropic native, Codex Responses, OpenRouter with usage.include). Tests: - schema_version v7 assertion - migration v2 -> v7 adds api_call_count column with default 0 - update_token_counts increments api_call_count by provided delta - absolute=True sets api_call_count directly - /api/analytics/usage exposes total_api_calls in totals	2026-04-22 05:51:58 -07:00
teknium1	be11a75eae	chore(release): map hharry11 email to GitHub handle	2026-04-22 05:51:44 -07:00
hharry11	83cb9a03ee	fix(cli): ensure project .env is sanitized before loading	2026-04-22 05:51:44 -07:00
WideLee	cf55c738e7	refactor(qqbot): migrate qr onboard flow to sync + consolidate into onboard.py - Replace async create_bind_task/poll_bind_result with synchronous httpx.Client equivalents, eliminating manual event loop management - Move _render_qr and full qr_register() entry-point into onboard.py, mirroring the Feishu onboarding pattern - Remove _qqbot_render_qr and _qqbot_qr_flow from gateway.py (~90 lines); call site becomes a single qr_register() import - Fix potential segfault: previous code called loop.close() in the EXPIRED branch and again in the finally block (double-close crashed under uvloop)	2026-04-22 05:50:21 -07:00
Teknium	ba7e8b0df9	chore(release): map Abner email to Abnertheforeman	2026-04-22 05:27:10 -07:00
Abner	b66644f0ec	feat(hindsight): richer session-scoped retain metadata - Add configurable retain_tags / retain_source / retain_user_prefix / retain_assistant_prefix knobs for native Hindsight. - Thread gateway session identity (user_name, chat_id, chat_name, chat_type, thread_id) through AIAgent and MemoryManager into MemoryProvider.initialize kwargs so providers can scope and tag retained memories. - Hindsight attaches the new identity fields as retain metadata, merges per-call tool tags with configured default tags, and uses the configurable transcript labels for auto-retained turns. Co-authored-by: Abner <abner.the.foreman@agentmail.to>	2026-04-22 05:27:10 -07:00
Teknium	b8663813b6	feat(state): auto-prune old sessions + VACUUM state.db at startup (#13861 ) * feat(state): auto-prune old sessions + VACUUM state.db at startup state.db accumulates every session, message, and FTS5 index entry forever. A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM brought it to 43MB. The prune command and VACUUM are not wired to run automatically anywhere — sessions grew unbounded until users noticed. Changes: - hermes_state.py: new state_meta key/value table, vacuum() method, and maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in state_meta so it only actually executes once per min_interval_hours across all Hermes processes for a given HERMES_HOME. Never raises. - hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG (auto_prune=True, retention_days=90, vacuum_after_prune=True, min_interval_hours=24). Added to _KNOWN_ROOT_KEYS. - cli.py: call maintenance once at HermesCLI init (shared helper _run_state_db_auto_maintenance reads config and delegates to DB). - gateway/run.py: call maintenance once at GatewayRunner init. - Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section. Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays bloated forever. VACUUM only runs when the prune actually removed rows, so tight DBs don't pay the I/O cost. Tests: 10 new tests in tests/test_hermes_state.py covering state_meta, vacuum, idempotency, interval skipping, VACUUM-only-when-needed, corrupt-marker recovery. All 246 existing state/config/gateway tests still pass. Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG exposes the new block, load_config() returns it for fresh installs, first call prunes+vacuums, second call within min_interval_hours skips, and the state_meta marker persists across connection close/reopen. * sessions.auto_prune defaults to false (opt-in) Session history powers session_search recall across past conversations, so silently pruning on startup could surprise users. Ship the machinery disabled and let users opt in when they notice state.db is hurting performance. - DEFAULT_CONFIG.sessions.auto_prune: True → False - Call-site fallbacks in cli.py and gateway/run.py match the new default (so unmigrated configs still see off) - Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff	2026-04-22 05:21:49 -07:00
Teknium	b43524ecab	fix(wecom): visible poll progress + clearer no-bot-info failure + docstring note Follow-ups on top of salvaged #13923 (@keifergu): - Print QR poll dot every 3s instead of every 18s so "Fetching configuration results..." doesn't look hung. - On "status=success but no bot_info" from the WeCom query endpoint, log the full payload at WARNING and tell the user we're falling back to manual entry (was previously a single opaque line). - Document in the qr_scan_for_bot_info() docstring that the work.weixin.qq.com/ai/qc/* endpoints are the admin-console web-UI flow, not the public developer API, and may change without notice. Also add keifergu@tencent.com to scripts/release.py AUTHOR_MAP so release notes attribute the feature correctly.	2026-04-22 05:15:32 -07:00
keifergu	3f60a907e1	docs(wecom): document QR scan-to-create setup flow	2026-04-22 05:15:32 -07:00
keifergu	8bcd77a9c2	feat(wecom): add QR scan flow and interactive setup wizard for bot credentials	2026-04-22 05:15:32 -07:00
Teknium	d166716c65	feat(optional-skills): add page-agent skill under new web-development category (#13976 ) Adds an optional skill that walks users through installing and using alibaba/page-agent — a pure-JS in-page GUI agent that web developers embed into their own webapps so end users can drive the UI with natural language. Three install paths: CDN demo (30s, no install), npm install into an existing app with provider config table (Qwen/OpenAI/Ollama/OpenRouter), and clone-from-source for dev/contributor workflow. Clear use-case framing up front (embed AI copilot in SaaS/admin/B2B, modernize legacy UIs, accessibility via natural language) and an explicit NOT-for list that points users wanting server-side browser automation back to Hermes' built-in browser tool. Live-verified: repo builds on Node 22.22 + npm 10.9, dev:demo serves at localhost:5174, API surface (new PageAgent{...}, panel.show(), execute(task)) matches what the skill documents. Also verified discovery end-to-end via OptionalSkillSource with isolated HERMES_HOME — search/inspect/fetch all resolve official/web-development/page-agent correctly. New category directory: optional-skills/web-development/ with a DESCRIPTION.md explaining the distinction from Hermes' own browser automation (outside-in vs inside-out).	2026-04-22 04:54:26 -07:00
helix4u	a7d78d3bfd	fix: preserve reasoning_content on Kimi replay	2026-04-22 04:31:59 -07:00
kshitijk4poor	30ec12970b	fix(packaging): include agent.* sub-packages in pyproject.toml The transport refactor (PRs #13862 ff.) added agent/transports/ as a sub-package but the setuptools packages.find include list only had "agent" (top-level files), not "agent." (sub-packages). pip install / Nix builds therefore ship run_agent.py (which now imports from agent.transports on every API call) but omit the transports directory entirely, causing: ModuleNotFoundError: No module named 'agent.transports' on every LLM call for packaged installs. Adds "agent." to match the existing pattern used by tools, gateway, tui_gateway, and plugins.	2026-04-22 03:35:37 -07:00
hengm3467	c6b1ef4e58	feat: add Step Plan provider support (salvage #6005 ) Adds a first-class 'stepfun' API-key provider surfaced as Step Plan: - Support Step Plan setup for both International and China regions - Discover Step Plan models live from /step_plan/v1/models, with a small coding-focused fallback catalog when discovery is unavailable - Thread StepFun through provider metadata, setup persistence, status and doctor output, auxiliary routing, and model normalization - Add tests for provider resolution, model validation, metadata mapping, and StepFun region/model persistence Based on #6005 by @hengm3467. Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>	2026-04-22 02:59:58 -07:00
Teknium	ff9752410a	feat(plugins): pluggable image_gen backends + OpenAI provider (#13799 ) * feat(plugins): pluggable image_gen backends + OpenAI provider Adds a ImageGenProvider ABC so image generation backends register as bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner gains three primitives to make this work generically: - `kind:` manifest field (`standalone` \| `backend` \| `exclusive`). Bundled `kind: backend` plugins auto-load — no `plugins.enabled` incantation. User-installed backends stay opt-in. - Path-derived keys: `plugins/image_gen/openai/` gets key `image_gen/openai`, so a future `tts/openai` cannot collide. - Depth-2 recursion into category namespaces (parent dirs without a `plugin.yaml` of their own). Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5 default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64 responses save to `$HERMES_HOME/cache/images/`; URL responses pass through. FAL stays in-tree for this PR — a follow-up ports it into `plugins/image_gen/fal/` so the in-tree `image_generation_tool.py` slims down. The dispatch shim in `_handle_image_generate` only fires when `image_gen.provider` is explicitly set to a non-FAL value, so existing FAL setups are untouched. - 41 unit tests (scanner recursion, kind parsing, gate logic, registry, OpenAI payload shapes) - E2E smoke verified: bundled plugin autoloads, registers, and `_handle_image_generate` routes to OpenAI when configured * fix(image_gen/openai): don't send response_format to gpt-image-* The live API rejects it: 'Unknown parameter: response_format' (verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return b64_json unconditionally, so the parameter was both unnecessary and actively broken. * feat(image_gen/openai): gpt-image-2 only, drop legacy catalog gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21) and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 / dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward (dall-e-2 squares only). Trim the catalog down to a single model. Live-verified end-to-end: landscape 1536x1024 render of a Moog-style synth matches prompt exactly, 2.4MB PNG saved to cache. * feat(image_gen/openai): expose gpt-image-2 as three quality tiers Users pick speed/fidelity via the normal model picker instead of a hidden quality knob. All three tier IDs resolve to the single underlying gpt-image-2 API model with a different quality parameter: gpt-image-2-low ~15s fast iteration gpt-image-2-medium ~40s default gpt-image-2-high ~2min highest fidelity Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the same 1024x1024 prompt. Config: image_gen.openai.model: gpt-image-2-high # or image_gen.model: gpt-image-2-low # or env var for scripts/tests OPENAI_IMAGE_MODEL=gpt-image-2-medium Live-verified end-to-end with the low tier: 18.8s landscape render of a golden retriever in wildflowers, vision-confirmed exact match. * feat(tools_config): plugin image_gen providers inject themselves into picker 'hermes tools' → Image Generation now shows plugin-registered backends alongside Nous Subscription and FAL.ai without tools_config.py needing to know about them. OpenAI appears as a third option today; future backends appear automatically as they're added. Mechanism: - ImageGenProvider gains an optional get_setup_schema() hook (name, badge, tag, env_vars). Default derived from display_name. - tools_config._plugin_image_gen_providers() pulls the schemas from every registered non-FAL plugin provider. - _visible_providers() appends those rows when rendering the Image Generation category. - _configure_provider() handles the new image_gen_plugin_name marker: writes image_gen.provider and routes to the plugin's list_models() catalog for the model picker. - _toolset_needs_configuration_prompt('image_gen') stops demanding a FAL key when any plugin provider reports is_available(). FAL is skipped in the plugin path because it already has hardcoded TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up PR the hardcoded rows go away and it surfaces through the same path as OpenAI. Verified live: picker shows Nous Subscription / FAL.ai / OpenAI. Picking OpenAI prompts for OPENAI_API_KEY, then shows the gpt-image-2-low/medium/high model picker sourced from the plugin. 397 tests pass across plugins/, tools_config, registry, and picker. * fix(image_gen): close final gaps for plugin-backend parity with FAL Two small places that still hardcoded FAL: - hermes_cli/setup.py status line: an OpenAI-only setup showed 'Image Generation: missing FAL_KEY'. Now probes plugin providers and reports '(OpenAI)' when one is_available() — or falls back to 'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured. - image_generate tool schema description: said 'using FAL.ai, default FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are user-configured' — and notes the 'image' field can be a URL or an absolute path, which the gateway delivers either way via extract_local_files().	2026-04-21 21:30:10 -07:00
Teknium	d1acf17773	feat(models): add minimax/minimax-m2.5:free to OpenRouter catalog (#13836 ) Surfaces the free variant alongside the paid minimax-m2.5 entry in both the OPENROUTER_MODELS fallback snapshot and the nous/openrouter provider model list.	2026-04-21 21:27:40 -07:00
Teknium	410f33a728	fix(kimi): don't send Anthropic thinking to api.kimi.com/coding (#13826 ) Kimi's /coding endpoint speaks the Anthropic Messages protocol but has its own thinking semantics: when thinking.enabled is sent, Kimi validates the history and requires every prior assistant tool-call message to carry OpenAI-style reasoning_content. The Anthropic path never populates that field, and convert_messages_to_anthropic strips Anthropic thinking blocks on third-party endpoints — so after one tool-calling turn the next request fails with: HTTP 400: thinking is enabled but reasoning_content is missing in assistant tool call message at index N Kimi on chat_completions handles thinking via extra_body in ChatCompletionsTransport (#13503). On the Anthropic route, drop the parameter entirely and let Kimi drive reasoning server-side. build_anthropic_kwargs now gates the reasoning_config -> thinking block on not _is_kimi_coding_endpoint(base_url). Tests: 8 new parametric tests cover /coding, /coding/v1, /coding/anthropic, /coding/ (trailing slash), explicit disabled, other third-party endpoints still getting thinking (MiniMax), native Anthropic unaffected, and the non-/coding Kimi root route.	2026-04-21 21:19:14 -07:00
Teknium	7b79e0f4c9	chore(models): drop 3 models from nous portal recommended list (#13822 ) Remove nvidia/nemotron-3-super-120b-a12b:free, arcee-ai/trinity-large-preview:free, and openrouter/elephant-alpha from _PROVIDER_MODELS['nous']. The paid nemotron and arcee-thinking variants remain.	2026-04-21 21:10:20 -07:00
kshitijk4poor	57411fca24	feat: add BedrockTransport + wire all Bedrock transport paths Fourth and final transport — completes the transport layer with all four api_modes covered. Wraps agent/bedrock_adapter.py behind the ProviderTransport ABC, handles both raw boto3 dicts and already-normalized SimpleNamespace. Wires all transport methods to production paths in run_agent.py: - build_kwargs: _build_api_kwargs bedrock branch - validate_response: response validation, new bedrock_converse branch - finish_reason: new bedrock_converse branch in finish_reason extraction Based on PR #13467 by @kshitijk4poor, with one adjustment: the main normalize loop does NOT add a bedrock_converse branch to invoke normalize_response on the already-normalized response. Bedrock's normalize_converse_response runs at the dispatch site (run_agent.py:5189), so the response already has the OpenAI-compatible .choices[0].message shape by the time the main loop sees it. Falling through to the chat_completions else branch is correct and sidesteps a redundant NormalizedResponse rebuild. Transport coverage — complete: \| api_mode \| Transport \| build_kwargs \| normalize \| validate \| \|--------------------\|--------------------------\|:------------:\|:---------:\|:--------:\| \| anthropic_messages \| AnthropicTransport \| ✅ \| ✅ \| ✅ \| \| codex_responses \| ResponsesApiTransport \| ✅ \| ✅ \| ✅ \| \| chat_completions \| ChatCompletionsTransport \| ✅ \| ✅ \| ✅ \| \| bedrock_converse \| BedrockTransport \| ✅ \| ✅ \| ✅ \| 17 new BedrockTransport tests pass. 117 transport tests total pass. 160 bedrock/converse tests across tests/agent/ pass. Full tests/run_agent/ targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the pre-existing test_concurrent_interrupt flake on origin/main).	2026-04-21 20:58:37 -07:00
Brooklyn Nicholson	572e27c93f	fix(tui): demote gateway log-noise from Activity to info tone Restore the old-CLI contract where only complete failures tint Activity red. Everything else is still visible for debugging but no longer commandeers attention. - gateway.stderr: always tone='info' (drops the ERRLIKE_RE regex) - gateway.protocol_error: both pushes demoted to 'info' - commands.catalog cold-start failure: demoted to 'info' - approval.request: no longer duplicates the overlay into Activity Kept as 'error': terminal `error` event, gateway.start_timeout, gateway-exited, explicit status.update kinds.	2026-04-21 20:57:40 -07:00
Brooklyn Nicholson	76ad697dcb	fix(tui): don't force-open Activity on every error Reverts the auto-expand-on-new-error effect added in `93b47d96`. The effect overrode the user's chosen detailsMode and visually interrupted every turn. Red/yellow chevron tint remains as the passive signal — click to read, just like Thinking and Tool calls.	2026-04-21 20:57:40 -07:00

1 2 3 4 5 ...

5351 commits