hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Teknium	7d8b2eee63	fix(delegate): default inherit_mcp_toolsets=true, drop version bump Follow-up on helix4u's PR #14211: - Flip default to true: narrowing toolsets=['web','browser'] expresses 'I want these extras', not 'silently strip MCP'. Parent MCP tools (registered at runtime) should survive narrowing by default. - Drop _config_version bump (22->23); additive nested key under delegation.* is handled by _deep_merge, no migration needed. - Update tests to reflect new default behavior.	2026-04-22 17:45:48 -07:00
helix4u	3e96c87f37	fix(delegate): make MCP toolset inheritance configurable	2026-04-22 17:45:48 -07:00
Teknium	d74eaef5f9	fix(error_classifier): retry mid-stream SSL/TLS alert errors as transport Mid-stream SSL alerts (bad_record_mac, tls_alert_internal_error, handshake failures) previously fell through the classifier pipeline to the 'unknown' bucket because: - ssl.SSLError type names weren't in _TRANSPORT_ERROR_TYPES (the isinstance(OSError) catch picks up some but not all SDK-wrapped forms) - the message-pattern list had no SSL alert substrings The 'unknown' bucket is still retryable, but: (a) logs tell the user 'unknown' instead of identifying the cause, (b) it bypasses the transport-specific backoff/fallback logic, and (c) if the SSL error happens on a large session with a generic 'connection closed' wrapper, the existing disconnect-on-large-session heuristic would incorrectly trigger context compression — expensive, and never fixes a transport hiccup. Changes: - Add ssl.SSLError and its subclass type names to _TRANSPORT_ERROR_TYPES - New _SSL_TRANSIENT_PATTERNS list (separate from _SERVER_DISCONNECT_PATTERNS so SSL alerts route to timeout, not context_overflow+compress) - New step 5 in the classifier pipeline: SSL pattern check runs BEFORE the disconnect check to pre-empt the large-session-compress path Patterns cover both space-separated ('ssl alert', 'bad record mac') and underscore-separated ('ERR_SSL_SSL/TLS_ALERT_BAD_RECORD_MAC') forms. This is load-bearing because OpenSSL 3.x changed the error-code separator from underscore to slash (e.g. SSLV3_ALERT_BAD_RECORD_MAC → SSL/TLS_ALERT_BAD_RECORD_MAC) and will likely churn again — matching on stable alert reason substrings survives future format changes. Tests (8 new): - BAD_RECORD_MAC in Python ssl.c format - OpenSSL 3.x underscore format - TLSV1_ALERT_INTERNAL_ERROR - ssl handshake failure - [SSL: ...] prefix fallback - Real ssl.SSLError instance - REGRESSION GUARD: SSL on large session does NOT compress - REGRESSION GUARD: plain disconnect on large session STILL compresses	2026-04-22 17:44:50 -07:00
Teknium	b9463e32c6	fix(usage): read top-level Anthropic cache fields from OAI-compatible proxies Port from cline/cline#10266. When OpenAI-compatible proxies (OpenRouter, Vercel AI Gateway, Cline) route Claude models, they sometimes surface the Anthropic-native cache counters (`cache_read_input_tokens`, `cache_creation_input_tokens`) at the top level of the `usage` object instead of nesting them inside `prompt_tokens_details`. Our chat-completions branch of `normalize_usage()` only read the nested `prompt_tokens_details` fields, so those responses: - reported `cache_write_tokens = 0` even when the model actually did a prompt-cache write, - reported only some of the cache-read tokens when the proxy exposed them top-level only, - overstated `input_tokens` by the missed cache-write amount, which in turn made cost estimation and the status-bar cache-hit percentage wrong for Claude traffic going through these gateways. Now the chat-completions branch tries the OpenAI-standard `prompt_tokens_details` first and falls back to the top-level Anthropic-shape fields only if the nested values are absent/zero. The Anthropic and Codex Responses branches are unchanged. Regression guards added for three shapes: top-level write + nested read, top-level-only, and both-present (nested wins).	2026-04-22 17:40:49 -07:00
Teknium	9eb543cafe	feat(/model): merge models.dev entries for lesser-loved providers (#14221 ) New and newer models from models.dev now surface automatically in /model (both hermes model CLI and the gateway Telegram/Discord picker) for a curated set of secondary providers — no Hermes release required when the registry publishes a new model. Primary user-visible fix: on OpenCode Go, typing '/model mimo-v2.5-pro' no longer silently fuzzy-corrects to 'mimo-v2-pro'. The exact match against the merged models.dev catalog wins. Scope (opt-in frozenset _MODELS_DEV_PREFERRED in hermes_cli/models.py): opencode-go, opencode-zen, deepseek, kilocode, fireworks, mistral, togetherai, cohere, perplexity, groq, nvidia, huggingface, zai, gemini, google. Explicitly NOT merged: - openrouter and nous (never): curated list is already a hand-picked subset / Portal is source of truth. - xai, xiaomi, minimax, minimax-cn, kimi-coding, kimi-coding-cn, alibaba, qwen-oauth (per-project decision to keep curated-only). - providers with dedicated live-endpoint paths (copilot, anthropic, ai-gateway, ollama-cloud, custom, stepfun, openai-codex) — those paths already handle freshness themselves. Changes: - hermes_cli/models.py: add _MODELS_DEV_PREFERRED + _merge_with_models_dev helper. provider_model_ids() branches on the set at its curated-fallback return. Merge is models.dev-first, curated-only extras appended, case-insensitive dedup, graceful fallback when models.dev is offline. - hermes_cli/model_switch.py: list_authenticated_providers() calls the same merge in both its code paths (PROVIDER_TO_MODELS_DEV loop + HERMES_OVERLAYS loop). Picker AND validation-fallback both see fresh entries. - tests/hermes_cli/test_models_dev_preferred_merge.py (new): 13 tests — merge-helper unit tests (empty/raise/order/dedup), opencode-go/zen behavior, openrouter+nous explicitly guarded from merge. - tests/hermes_cli/test_opencode_go_in_model_list.py: converted from snapshot-style assertion to a behavior-based floor check, so it doesn't break when models.dev publishes additional opencode-go entries. Addresses a report from @pfanis via Telegram: newer Xiaomi variants on OpenCode Go weren't appearing in the /model picker, and /model was silently routing requests for new variants to older ones.	2026-04-22 17:33:42 -07:00
Teknium	402d048eb6	fix(gateway): also unlink stale PID + lock files on cleanup Follow-up for salvaged PR #14179. `_cleanup_invalid_pid_path` previously called `remove_pid_file()` for the default PID path, but that helper defensively refuses to delete a PID file whose pid field differs from `os.getpid()` (to protect --replace handoffs). Every realistic stale-PID scenario is exactly that case: a crashed/Ctrl+C'd gateway left behind a PID file owned by a now-dead foreign PID. Once `get_running_pid()` has confirmed the runtime lock is inactive, the on-disk metadata is known to belong to a dead process, so we can force-unlink both the PID file and the sibling `gateway.lock` directly instead of going through the defensive helper. Also adds a regression test with a dead foreign PID that would have failed against the previous cleanup logic.	2026-04-22 16:33:46 -07:00
helix4u	b52123eb15	fix(gateway): recover stale pid and planned restart state	2026-04-22 16:33:46 -07:00
Teknium	51ca575994	feat(gateway): expose plugin slash commands natively on all platforms + decision-capable command hook Plugin slash commands now surface as first-class commands in every gateway enumerator — Discord native slash picker, Telegram BotCommand menu, Slack /hermes subcommand map — without a separate per-platform plugin API. The existing 'command:<name>' gateway hook gains a decision protocol via HookRegistry.emit_collect(): handlers that return a dict with {'decision': 'deny'\|'handled'\|'rewrite'\|'allow'} can intercept slash command dispatch before core handling runs, unifying what would otherwise have been a parallel 'pre_gateway_command' hook surface. Changes: - gateway/hooks.py: add HookRegistry.emit_collect() that fires the same handler set as emit() but collects non-None return values. Backward compatible — fire-and-forget telemetry hooks still work via emit(). - hermes_cli/plugins.py: add optional 'args_hint' param to register_command() so plugins can opt into argument-aware native UI registration (Discord arg picker, future platforms). - hermes_cli/commands.py: add _iter_plugin_command_entries() helper and merge plugin commands into telegram_bot_commands() and slack_subcommand_map(). New is_gateway_known_command() recognizes both built-in and plugin commands so the gateway hook fires for either. - gateway/platforms/discord.py: extract _build_auto_slash_command helper from the COMMAND_REGISTRY auto-register loop and reuse it for plugin-registered commands. Built-in name conflicts are skipped. - gateway/run.py: before normal slash dispatch, call emit_collect on command:<canonical> and honor deny/handled/rewrite/allow decisions. Hook now fires for plugin commands too. - scripts/release.py: AUTHOR_MAP entry for @Magaav. - Tests: emit_collect semantics, plugin command surfacing per platform, decision protocol (deny/handled/rewrite/allow + non-dict tolerance), Discord plugin auto-registration + conflict skipping, is_gateway_known_command. Salvaged from #14131 (@Magaav). Original PR added a parallel 'pre_gateway_command' hook and a platform-keyed plugin command registry; this re-implementation reuses the existing 'command:<name>' hook and treats plugin commands as platform-agnostic so the same capability reaches Telegram and Slack without new API surface. Co-authored-by: Magaav <73175452+Magaav@users.noreply.github.com>	2026-04-22 16:23:21 -07:00
brooklyn!	a1d57292af	Merge pull request #14145 from NousResearch/bb/tui-polish fix(tui): input wrap, shift-tab yolo, statusline, clean boot	2026-04-22 16:48:37 -05:00
Yukipukii1	1e8254e599	fix(agent): guard context compressor against structured message content	2026-04-22 14:46:51 -07:00
ismell0992-afk	6513138f26	fix(agent): recognize Tailscale CGNAT (100.64.0.0/10) as local for Ollama timeouts `is_local_endpoint()` leaned on `ipaddress.is_private`, which classifies RFC-1918 ranges and link-local as private but deliberately excludes the RFC 6598 CGNAT block (100.64.0.0/10) — the range Tailscale uses for its mesh IPs. As a result, Ollama reached over Tailscale (e.g. `http://100.77.243.5:11434`) was treated as remote and missed the automatic stream-read / stale-stream timeout bumps, so cold model load plus long prefill would trip the 300 s watchdog before the first token. Add a module-level `_TAILSCALE_CGNAT = ipaddress.IPv4Network("100.64.0.0/10")` (built once) and extend `is_local_endpoint()` to match the block both via the parsed-`IPv4Address` path and the existing bare-string fallback (for symmetry with the 10/172/192 checks). Also hoist the previously function-local `import ipaddress` to module scope now that it's used by the constant. Extend `TestIsLocalEndpoint` with a CGNAT positive set (lower bound, representative host, MagicDNS anchor, upper bound) and a near-miss negative set (just below 100.64.0.0, just above 100.127.255.255, well outside the block, and first-octet-wrong).	2026-04-22 14:46:10 -07:00
Yukipukii1	44a16c5d9d	guard terminal_tool import-time env parsing	2026-04-22 14:45:50 -07:00
Roy-oss1	e86acad8f1	feat(feishu): preserve @mention context on inbound messages Resolve Feishu @_user_N / @_all placeholders into display names plus a structured [Mentioned: Name (open_id=...), ...] hint so agents can both reason about who was mentioned and call Feishu OpenAPI tools with stable open_ids. Strip bot self-mentions only at message edges (leading unconditionally, trailing only before whitespace/terminal punctuation) so commands parse cleanly while mid-text references are preserved. Covers both plain-text and rich-post payloads. Also fixes a pre-existing hydration bug: Client.request no longer accepts the 'method' kwarg on lark-oapi 1.5.3, so bot identity silently failed to hydrate and self-filtering never worked. Migrate to the BaseRequest.builder() pattern and accept the 'app_name' field the API actually returns. Tighten identity matching precedence so open_id is authoritative when present on both sides.	2026-04-22 14:44:07 -07:00
LeonSGP43	4ac1c959b2	fix(agent): resolve fallback provider key_env secrets	2026-04-22 14:42:48 -07:00
Aslaaen	76c454914a	fix(core): ensure non-blocking executor shutdown on async timeout	2026-04-22 14:42:32 -07:00
kshitijk4poor	d6ed35d047	feat(security): add global toggle to allow private/internal URL resolution Adds security.allow_private_urls / HERMES_ALLOW_PRIVATE_URLS toggle so users on OpenWrt routers, TUN-mode proxies (Clash/Mihomo/Sing-box), corporate split-tunnel VPNs, and Tailscale networks — where DNS resolves public domains to 198.18.0.0/15 or 100.64.0.0/10 — can use web_extract, browser, vision URL fetching, and gateway media downloads. Single toggle in tools/url_safety.py; all 23 is_safe_url() call sites inherit automatically. Cached for process lifetime. Cloud metadata endpoints stay ALWAYS blocked regardless of the toggle: 169.254.169.254 (AWS/GCP/Azure/DO/Oracle), 169.254.170.2 (AWS ECS task IAM creds), 169.254.169.253 (Azure IMDS wire server), 100.100.100.200 (Alibaba), fd00:ec2::254 (AWS IPv6), the entire 169.254.0.0/16 link-local range, and the metadata.google.internal / metadata.goog hostnames (checked pre-DNS so they can't be bypassed on networks where those names resolve to local IPs). Supersedes #3779 (narrower HERMES_ALLOW_RFC2544 for the same class of users). Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-04-22 14:38:59 -07:00
bobashopcashier	b49a1b71a7	fix(agent): accept empty content with stop_reason=end_turn as valid anthropic response Anthropic's API can legitimately return content=[] with stop_reason="end_turn" when the model has nothing more to add after a turn that already delivered the user-facing text alongside a trivial tool call (e.g. memory write). The transport validator was treating that as an invalid response, triggering 3 retries that each returned the same valid-but-empty response, then failing the run with "Invalid API response after 3 retries." The downstream normalizer already handles empty content correctly (empty loop over response.content, content=None, finish_reason="stop"), so the only fix needed is at the validator boundary. Tests: - Empty content + stop_reason="end_turn" → valid (the fix) - Empty content + stop_reason="tool_use" → still invalid (regression guard) - Empty content without stop_reason → still invalid (existing behavior preserved)	2026-04-22 14:26:23 -07:00
Teknium	ea67e49574	fix(streaming): silent retry when stream dies mid tool-call (#14151 ) When the streaming connection dropped AFTER user-visible text was delivered but a tool call was in flight, we stubbed the turn with a '⚠ Stream stalled mid tool-call; Ask me to retry' warning — costing an iteration and breaking the flow. Users report this happening increasingly often on long SSE streams through flaky provider routes. Fix: in the existing inner stream-retry loop, relax the deltas_were_sent short-circuit. If a tool call was in flight (partial_tool_names populated) AND the error is a transient connection error (timeout, RemoteProtocolError, SSE 'connection lost', etc.), silently retry instead of bailing out. Fire a brief 'Connection dropped mid tool-call; reconnecting…' marker so the user understands the preamble is about to be re-streamed. Researched how Claude Code (tombstone + non-streaming fallback), OpenCode (blind Effect.retry wrapping whole stream), and Clawdbot (4-way gate: stopReason==error + output==0 + !hadPotentialSideEffects) handle this. Chose the narrow Clawdbot-style gate: retry only when (a) a tool call was actually in flight (otherwise the existing stub-with-recovered-text is correct for pure-text stalls) and (b) the error is transient. Side-effect safety is automatic — no tool has been dispatched within this single API call yet. UX trade-off: user sees preamble text twice on retry (OpenCode-style). Strictly better than a lost action with a 'retry manually' message. If retries exhaust, falls through to the existing stub-with-warning path so the user isn't left with zero signal. Tests: 3 new tests in TestSilentRetryMidToolCall covering (1) silent retry recovers tool call; (2) exhausted retries fall back to stub; (3) text-only stalls don't trigger retry. 30/30 pass.	2026-04-22 13:47:33 -07:00
Brooklyn Nicholson	b641639e42	fix(debug): distinguish empty-log from missing-log in report placeholder Copilot on #14138 flagged that the share report says '(file not found)' when the log exists but is empty (either because the primary is empty and no .1 rotation exists, or in the rare race where the file is truncated between _resolve_log_path() and stat()). - Split _primary_log_path() out of _resolve_log_path so both can share the LOG_FILES/home math without duplication. - _capture_log_snapshot now reports '(file empty)' when the primary path exists on disk with zero bytes, and keeps '(file not found)' for the truly-missing case. Tests: rename test_returns_none_for_empty → test_empty_primary_reports_file_empty with the new assertion, plus a race-path test that monkeypatches _resolve_log_path to exercise the size==0 branch directly.	2026-04-22 15:27:54 -05:00
Brooklyn Nicholson	6fb98f343a	fix(tui): address copilot review on #14103 - normalizeStatusBar: trim/lowercase + 'on' → 'top' alias so user-edited YAML variants (Top, " bottom ", on) coerce correctly - shift-tab yolo: no-op with sys note when no live session; success-gated echo and catch fallback so RPC failures don't report as 'yolo off' - tui_gateway config.set/get statusbar: isinstance(display, dict) guards mirroring the compact branch so a malformed display scalar in config.yaml can't raise Tests: +1 vitest for trim/case/on, +2 pytest for non-dict display survival.	2026-04-22 15:27:54 -05:00
kshitij	81a504a4a0	fix: align status bar skin tests with upstream main Drop rebased test assumptions about theme-mode helpers removed on main and keep the status bar skin integration aligned with the current skin engine model.	2026-04-22 13:20:02 -07:00
kshitij	c323217188	fix: make CLI status bar skin-aware Route prompt_toolkit status bar colors through the skin engine so /skin updates the status bar alongside the rest of the interactive TUI. Add regression coverage for the new status bar style override keys and CLI style composition.	2026-04-22 13:20:02 -07:00
kshitijk4poor	de849c410d	refactor(debug): remove dead _read_log_tail/_read_full_log wrappers These thin wrappers around _capture_log_snapshot had zero production callers after the snapshot refactor — run_debug_share uses snapshots directly and collect_debug_report captures internally. The wrappers also caused a performance regression: _read_log_tail read up to 512KB and built full_text just to return tail_text. Remove both wrappers and migrate TestReadFullLog → TestCaptureLogSnapshot to test _capture_log_snapshot directly. Same coverage, tests the real API instead of dead indirection.	2026-04-22 11:59:39 -07:00
kshitijk4poor	8dc936f10e	chore: add taosiyuan163 to AUTHOR_MAP, add truncation boundary tests Add missing AUTHOR_MAP entry for taosiyuan163 whose truncation boundary fix was adapted into _capture_log_snapshot(). Add regression tests proving: line-boundary truncation keeps the full first line, mid-line truncation correctly drops the partial fragment.	2026-04-22 11:59:39 -07:00
Junass1	61d0a99c11	fix(debug): sweep expired pending pastes on slash debug paths	2026-04-22 11:59:39 -07:00
helix4u	fc3862bdd6	fix(debug): snapshot logs once for debug share	2026-04-22 11:59:39 -07:00
kshitijk4poor	1f216ecbb4	feat(gateway/slack): add SLACK_REACTIONS env toggle for reaction lifecycle Adds _reactions_enabled() gating to match Discord (DISCORD_REACTIONS) and Telegram (TELEGRAM_REACTIONS) pattern. Defaults to true to preserve existing behavior. Gates at three levels: - _handle_slack_message: skips _reacting_message_ids registration - on_processing_start: early return - on_processing_complete: early return Also adds config.yaml bridge (slack.reactions) and two new tests.	2026-04-22 08:49:24 -07:00
Roopak Nijhara	70a33708e7	fix(gateway/slack): align reaction lifecycle with Discord/Telegram pattern Slack reactions were placed around handle_message(), which returns immediately after spawning a background task. This caused the 👀 → ✅ swap to happen before any real work began. Fix: implement on_processing_start / on_processing_complete callbacks (matching Discord/Telegram) so reactions bracket actual _message_handler work driven by the base class. Also fixes missing stop_typing() for Slack's assistant thread status indicator, which left 'is thinking...' stuck in the UI after processing completed. - Add _reacting_message_ids set for DM/@mention-only gating - Add _active_status_threads dict for stop_typing lookup - Update test_reactions_in_message_flow for new callback pattern - Add test_reactions_failure_outcome and test_reactions_skipped_for_non_dm_non_mention	2026-04-22 08:49:24 -07:00
Teknium	77e04a29d5	fix(error_classifier): don't classify generic 404 as model_not_found (#14013 ) The 404 branch in _classify_by_status had dead code: the generic fallback below the _MODEL_NOT_FOUND_PATTERNS check returned the exact same classification (model_not_found + should_fallback=True), so every 404 — regardless of message — was treated as a missing model. This bites local-endpoint users (llama.cpp, Ollama, vLLM) whose 404s usually mean a wrong endpoint path, proxy routing glitch, or transient backend issue — not a missing model. Claiming 'model not found' misleads the next turn and silently falls back to another provider when the real problem was a URL typo the user should see. Fix: only classify 404 as model_not_found when the message actually matches _MODEL_NOT_FOUND_PATTERNS ("invalid model", "model not found", etc.). Otherwise fall through as unknown (retryable) so the real error surfaces in the retry loop. Test updated to match the new behavior. 103 error_classifier tests pass.	2026-04-22 06:11:47 -07:00
Yukipukii1	40619b393f	tools: normalize file tool pagination bounds	2026-04-22 06:11:41 -07:00
Teknium	3e652f75b2	fix(plugins+nous): auto-coerce memory plugins; actionable Nous 401 diagnostic (#14005 ) * fix(plugins): auto-coerce user-installed memory plugins to kind=exclusive User-installed memory provider plugins at $HERMES_HOME/plugins/<name>/ were being dispatched to the general PluginManager, which has no register_memory_provider method on PluginContext. Every startup logged: Failed to load plugin 'mempalace': 'PluginContext' object has no attribute 'register_memory_provider' Bundled memory providers were already skipped via skip_names={memory, context_engine} in discover_and_load, but user-installed ones weren't. Fix: _parse_manifest now scans the plugin's __init__.py source for 'register_memory_provider' or 'MemoryProvider' (same heuristic as plugins/memory/__init__.py:_is_memory_provider_dir) and auto-coerces kind to 'exclusive' when the manifest didn't declare one explicitly. This routes the plugin to plugins/memory discovery instead of the general loader. The escape hatch: if a manifest explicitly declares kind: standalone, the heuristic doesn't override it. Reported by Uncle HODL on Discord. * fix(nous): actionable CLI message when Nous 401 refresh fails Mirrors the Anthropic 401 diagnostic pattern. When Nous returns 401 and the credential refresh (_try_refresh_nous_client_credentials) also fails, the user used to see only the raw APIError. Now prints: 🔐 Nous 401 — Portal authentication failed. Response: <truncated body> Most likely: Portal OAuth expired, account out of credits, or agent key revoked. Troubleshooting: • Re-authenticate: hermes login --provider nous • Check credits / billing: https://portal.nousresearch.com • Verify stored credentials: $HERMES_HOME/auth.json • Switch providers temporarily: /model <model> --provider openrouter Addresses the common 'my hermes model hangs' pattern where the user's Portal OAuth expired and the CLI gave no hint about the next step.	2026-04-22 05:54:11 -07:00
kshitijk4poor	5fb143169b	feat(dashboard): track real API call count per session Adds schema v7 'api_call_count' column. run_agent.py increments it by 1 per LLM API call, web_server analytics SQL aggregates it, frontend uses the real counter instead of summing sessions. The 'API Calls' card on the analytics dashboard previously displayed COUNT(*) from the sessions table — the number of conversations, not LLM requests. Each session makes 10-90 API calls through the tool loop, so the reported number was ~30x lower than real. Salvaged from PR #10140 (@kshitijk4poor). The cache-token accuracy portions of the original PR were deferred — per-provider analytics is the better path there, since cache_write_tokens and actual_cost_usd are only reliably available from a subset of providers (Anthropic native, Codex Responses, OpenRouter with usage.include). Tests: - schema_version v7 assertion - migration v2 -> v7 adds api_call_count column with default 0 - update_token_counts increments api_call_count by provided delta - absolute=True sets api_call_count directly - /api/analytics/usage exposes total_api_calls in totals	2026-04-22 05:51:58 -07:00
hharry11	83cb9a03ee	fix(cli): ensure project .env is sanitized before loading	2026-04-22 05:51:44 -07:00
Abner	b66644f0ec	feat(hindsight): richer session-scoped retain metadata - Add configurable retain_tags / retain_source / retain_user_prefix / retain_assistant_prefix knobs for native Hindsight. - Thread gateway session identity (user_name, chat_id, chat_name, chat_type, thread_id) through AIAgent and MemoryManager into MemoryProvider.initialize kwargs so providers can scope and tag retained memories. - Hindsight attaches the new identity fields as retain metadata, merges per-call tool tags with configured default tags, and uses the configurable transcript labels for auto-retained turns. Co-authored-by: Abner <abner.the.foreman@agentmail.to>	2026-04-22 05:27:10 -07:00
Teknium	b8663813b6	feat(state): auto-prune old sessions + VACUUM state.db at startup (#13861 ) * feat(state): auto-prune old sessions + VACUUM state.db at startup state.db accumulates every session, message, and FTS5 index entry forever. A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM brought it to 43MB. The prune command and VACUUM are not wired to run automatically anywhere — sessions grew unbounded until users noticed. Changes: - hermes_state.py: new state_meta key/value table, vacuum() method, and maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in state_meta so it only actually executes once per min_interval_hours across all Hermes processes for a given HERMES_HOME. Never raises. - hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG (auto_prune=True, retention_days=90, vacuum_after_prune=True, min_interval_hours=24). Added to _KNOWN_ROOT_KEYS. - cli.py: call maintenance once at HermesCLI init (shared helper _run_state_db_auto_maintenance reads config and delegates to DB). - gateway/run.py: call maintenance once at GatewayRunner init. - Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section. Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays bloated forever. VACUUM only runs when the prune actually removed rows, so tight DBs don't pay the I/O cost. Tests: 10 new tests in tests/test_hermes_state.py covering state_meta, vacuum, idempotency, interval skipping, VACUUM-only-when-needed, corrupt-marker recovery. All 246 existing state/config/gateway tests still pass. Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG exposes the new block, load_config() returns it for fresh installs, first call prunes+vacuums, second call within min_interval_hours skips, and the state_meta marker persists across connection close/reopen. * sessions.auto_prune defaults to false (opt-in) Session history powers session_search recall across past conversations, so silently pruning on startup could surprise users. Ship the machinery disabled and let users opt in when they notice state.db is hurting performance. - DEFAULT_CONFIG.sessions.auto_prune: True → False - Call-site fallbacks in cli.py and gateway/run.py match the new default (so unmigrated configs still see off) - Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff	2026-04-22 05:21:49 -07:00
helix4u	a7d78d3bfd	fix: preserve reasoning_content on Kimi replay	2026-04-22 04:31:59 -07:00
hengm3467	c6b1ef4e58	feat: add Step Plan provider support (salvage #6005 ) Adds a first-class 'stepfun' API-key provider surfaced as Step Plan: - Support Step Plan setup for both International and China regions - Discover Step Plan models live from /step_plan/v1/models, with a small coding-focused fallback catalog when discovery is unavailable - Thread StepFun through provider metadata, setup persistence, status and doctor output, auxiliary routing, and model normalization - Add tests for provider resolution, model validation, metadata mapping, and StepFun region/model persistence Based on #6005 by @hengm3467. Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>	2026-04-22 02:59:58 -07:00
Teknium	ff9752410a	feat(plugins): pluggable image_gen backends + OpenAI provider (#13799 ) * feat(plugins): pluggable image_gen backends + OpenAI provider Adds a ImageGenProvider ABC so image generation backends register as bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner gains three primitives to make this work generically: - `kind:` manifest field (`standalone` \| `backend` \| `exclusive`). Bundled `kind: backend` plugins auto-load — no `plugins.enabled` incantation. User-installed backends stay opt-in. - Path-derived keys: `plugins/image_gen/openai/` gets key `image_gen/openai`, so a future `tts/openai` cannot collide. - Depth-2 recursion into category namespaces (parent dirs without a `plugin.yaml` of their own). Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5 default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64 responses save to `$HERMES_HOME/cache/images/`; URL responses pass through. FAL stays in-tree for this PR — a follow-up ports it into `plugins/image_gen/fal/` so the in-tree `image_generation_tool.py` slims down. The dispatch shim in `_handle_image_generate` only fires when `image_gen.provider` is explicitly set to a non-FAL value, so existing FAL setups are untouched. - 41 unit tests (scanner recursion, kind parsing, gate logic, registry, OpenAI payload shapes) - E2E smoke verified: bundled plugin autoloads, registers, and `_handle_image_generate` routes to OpenAI when configured * fix(image_gen/openai): don't send response_format to gpt-image-* The live API rejects it: 'Unknown parameter: response_format' (verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return b64_json unconditionally, so the parameter was both unnecessary and actively broken. * feat(image_gen/openai): gpt-image-2 only, drop legacy catalog gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21) and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 / dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward (dall-e-2 squares only). Trim the catalog down to a single model. Live-verified end-to-end: landscape 1536x1024 render of a Moog-style synth matches prompt exactly, 2.4MB PNG saved to cache. * feat(image_gen/openai): expose gpt-image-2 as three quality tiers Users pick speed/fidelity via the normal model picker instead of a hidden quality knob. All three tier IDs resolve to the single underlying gpt-image-2 API model with a different quality parameter: gpt-image-2-low ~15s fast iteration gpt-image-2-medium ~40s default gpt-image-2-high ~2min highest fidelity Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the same 1024x1024 prompt. Config: image_gen.openai.model: gpt-image-2-high # or image_gen.model: gpt-image-2-low # or env var for scripts/tests OPENAI_IMAGE_MODEL=gpt-image-2-medium Live-verified end-to-end with the low tier: 18.8s landscape render of a golden retriever in wildflowers, vision-confirmed exact match. * feat(tools_config): plugin image_gen providers inject themselves into picker 'hermes tools' → Image Generation now shows plugin-registered backends alongside Nous Subscription and FAL.ai without tools_config.py needing to know about them. OpenAI appears as a third option today; future backends appear automatically as they're added. Mechanism: - ImageGenProvider gains an optional get_setup_schema() hook (name, badge, tag, env_vars). Default derived from display_name. - tools_config._plugin_image_gen_providers() pulls the schemas from every registered non-FAL plugin provider. - _visible_providers() appends those rows when rendering the Image Generation category. - _configure_provider() handles the new image_gen_plugin_name marker: writes image_gen.provider and routes to the plugin's list_models() catalog for the model picker. - _toolset_needs_configuration_prompt('image_gen') stops demanding a FAL key when any plugin provider reports is_available(). FAL is skipped in the plugin path because it already has hardcoded TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up PR the hardcoded rows go away and it surfaces through the same path as OpenAI. Verified live: picker shows Nous Subscription / FAL.ai / OpenAI. Picking OpenAI prompts for OPENAI_API_KEY, then shows the gpt-image-2-low/medium/high model picker sourced from the plugin. 397 tests pass across plugins/, tools_config, registry, and picker. * fix(image_gen): close final gaps for plugin-backend parity with FAL Two small places that still hardcoded FAL: - hermes_cli/setup.py status line: an OpenAI-only setup showed 'Image Generation: missing FAL_KEY'. Now probes plugin providers and reports '(OpenAI)' when one is_available() — or falls back to 'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured. - image_generate tool schema description: said 'using FAL.ai, default FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are user-configured' — and notes the 'image' field can be a URL or an absolute path, which the gateway delivers either way via extract_local_files().	2026-04-21 21:30:10 -07:00
Teknium	410f33a728	fix(kimi): don't send Anthropic thinking to api.kimi.com/coding (#13826 ) Kimi's /coding endpoint speaks the Anthropic Messages protocol but has its own thinking semantics: when thinking.enabled is sent, Kimi validates the history and requires every prior assistant tool-call message to carry OpenAI-style reasoning_content. The Anthropic path never populates that field, and convert_messages_to_anthropic strips Anthropic thinking blocks on third-party endpoints — so after one tool-calling turn the next request fails with: HTTP 400: thinking is enabled but reasoning_content is missing in assistant tool call message at index N Kimi on chat_completions handles thinking via extra_body in ChatCompletionsTransport (#13503). On the Anthropic route, drop the parameter entirely and let Kimi drive reasoning server-side. build_anthropic_kwargs now gates the reasoning_config -> thinking block on not _is_kimi_coding_endpoint(base_url). Tests: 8 new parametric tests cover /coding, /coding/v1, /coding/anthropic, /coding/ (trailing slash), explicit disabled, other third-party endpoints still getting thinking (MiniMax), native Anthropic unaffected, and the non-/coding Kimi root route.	2026-04-21 21:19:14 -07:00
kshitijk4poor	57411fca24	feat: add BedrockTransport + wire all Bedrock transport paths Fourth and final transport — completes the transport layer with all four api_modes covered. Wraps agent/bedrock_adapter.py behind the ProviderTransport ABC, handles both raw boto3 dicts and already-normalized SimpleNamespace. Wires all transport methods to production paths in run_agent.py: - build_kwargs: _build_api_kwargs bedrock branch - validate_response: response validation, new bedrock_converse branch - finish_reason: new bedrock_converse branch in finish_reason extraction Based on PR #13467 by @kshitijk4poor, with one adjustment: the main normalize loop does NOT add a bedrock_converse branch to invoke normalize_response on the already-normalized response. Bedrock's normalize_converse_response runs at the dispatch site (run_agent.py:5189), so the response already has the OpenAI-compatible .choices[0].message shape by the time the main loop sees it. Falling through to the chat_completions else branch is correct and sidesteps a redundant NormalizedResponse rebuild. Transport coverage — complete: \| api_mode \| Transport \| build_kwargs \| normalize \| validate \| \|--------------------\|--------------------------\|:------------:\|:---------:\|:--------:\| \| anthropic_messages \| AnthropicTransport \| ✅ \| ✅ \| ✅ \| \| codex_responses \| ResponsesApiTransport \| ✅ \| ✅ \| ✅ \| \| chat_completions \| ChatCompletionsTransport \| ✅ \| ✅ \| ✅ \| \| bedrock_converse \| BedrockTransport \| ✅ \| ✅ \| ✅ \| 17 new BedrockTransport tests pass. 117 transport tests total pass. 160 bedrock/converse tests across tests/agent/ pass. Full tests/run_agent/ targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the pre-existing test_concurrent_interrupt flake on origin/main).	2026-04-21 20:58:37 -07:00
kshitijk4poor	83d86ce344	feat: add ChatCompletionsTransport + wire all default paths Third concrete transport — handles the default 'chat_completions' api_mode used by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama, DeepSeek, xAI, Kimi, custom, etc.). Wires build_kwargs + validate_response to production paths. Based on PR #13447 by @kshitijk4poor, with fixes: - Preserve tool_call.extra_content (Gemini thought_signature) via ToolCall.provider_data — the original shim stripped it, causing 400 errors on multi-turn Gemini 3 thinking requests. - Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so the thinking-prefill retry check (_has_structured) still triggers. - Port Kimi/Moonshot quirks (32000 max_tokens, top-level reasoning_effort, extra_body.thinking) that landed on main after the original PR was opened. - Keep _qwen_prepare_chat_messages_inplace alive and call it through the transport when sanitization already deepcopied (avoids a second deepcopy). - Skip the back-compat SimpleNamespace shim in the main normalize loop — for chat_completions, response.choices[0].message is already the right shape with .content/.tool_calls/.reasoning/.reasoning_content/.reasoning_details and per-tool-call .extra_content from the OpenAI SDK. run_agent.py: -239 lines in _build_api_kwargs default branch extracted to the transport. build_kwargs now owns: codex-field sanitization, Qwen portal prep, developer role swap, provider preferences, max_tokens resolution (ephemeral > user > NVIDIA 16384 > Qwen 65536 > Kimi 32000 > anthropic_max_output), Kimi reasoning_effort + extra_body.thinking, OpenRouter/Nous/GitHub reasoning, Nous product attribution tags, Ollama num_ctx, custom-provider think=false, Qwen vl_high_resolution_images, request_overrides. 39 new transport tests (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize including extra_content regression, 3 cache stats, 3 basic). Tests/run_agent/ targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the test_concurrent_interrupt flake present on origin/main).	2026-04-21 20:50:02 -07:00
emozilla	29693f9d8e	feat(aux): use Portal /api/nous/recommended-models for auxiliary models Wire the auxiliary client (compaction, vision, session search, web extract) to the Nous Portal's curated recommended-models endpoint when running on Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for pricing. hermes_cli/models.py - fetch_nous_recommended_models(portal_base_url, force_refresh=False) 10-minute TTL cache, keyed per portal URL (staging vs prod don't collide). Public endpoint, no auth required. Returns {} on any failure so callers always get a dict. - get_nous_recommended_aux_model(vision, free_tier=None, ...) Tier-aware pick from the payload: - Paid tier → paidRecommended{Vision,Compaction}Model, falling back to freeRecommended* when the paid field is null (common during staged rollouts of new paid models). - Free tier → freeRecommended* only, never leaks paid models. When free_tier is None, auto-detects via the existing check_nous_free_tier() helper (already cached 3 min against /api/oauth/account). Detection errors default to paid so we never silently downgrade a paying user. agent/auxiliary_client.py — _try_nous() - Replaces the hardcoded xiaomi/mimo free-tier branch with a single call to get_nous_recommended_aux_model(vision=vision). - Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the Portal is unreachable or returns a null recommendation. - The Portal is now the source of truth for aux model selection; the xiaomi allowlist we used to carry is effectively dead. Tests (15 new) - tests/hermes_cli/test_models.py::TestNousRecommendedModels Fetch caching, per-portal keying, network failure, force_refresh; paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid, auto-detect, detection-error → paid default, null/blank modelName handling. - tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh _try_nous honors Portal recommendation for text + vision, falls back to google/gemini-3-flash-preview on None or exception. Behavior won't visibly change today — both tier recommendations currently point at google/gemini-3-flash-preview — but the moment the Portal ships a better paid recommendation, subscribers pick it up within 10 minutes without a Hermes release.	2026-04-21 20:35:16 -07:00
emozilla	c22f4a76de	remove Nous Portal free-model allowlist Drop _NOUS_ALLOWED_FREE_MODELS + filter_nous_free_models and its two call sites. Whatever Nous Portal prices as free now shows up in the picker as-is — no local allowlist gatekeeping. Free-tier partitioning (paid vs free in the menu) still runs via partition_nous_models_by_tier.	2026-04-21 20:35:16 -07:00
kshitijk4poor	c832ebd67c	feat: add ResponsesApiTransport + wire all Codex transport paths Add ResponsesApiTransport wrapping codex_responses_adapter.py behind the ProviderTransport ABC. Auto-registered via _discover_transports(). Wire ALL Codex transport methods to production paths in run_agent.py: - build_kwargs: main _build_api_kwargs codex branch (50 lines extracted) - normalize_response: main loop + flush + summary + retry (4 sites) - convert_tools: memory flush tool override - convert_messages: called internally via build_kwargs - validate_response: response validation gate - preflight_kwargs: request sanitization (2 sites) Remove 7 dead legacy wrappers from AIAgent (_responses_tools, _chat_messages_to_responses_input, _normalize_codex_response, _preflight_codex_api_kwargs, _preflight_codex_input_items, _extract_responses_message_text, _extract_responses_reasoning_text). Keep 3 ID manipulation methods still used by _build_assistant_message. Update 18 test call sites across 3 test files to call adapter functions directly instead of through deleted AIAgent wrappers. 24 new tests. 343 codex/responses/transport tests pass (0 failures). PR 4 of the provider transport refactor.	2026-04-21 19:48:56 -07:00
Teknium	b2ba351380	fix(kimi): reconcile sk-kimi- routing with Anthropic SDK URL semantics Follow-ups after salvaging xiaoqiang243's kimi-for-coding patches: - KIMI_CODE_BASE_URL: drop trailing /v1 (was /coding/v1). The /coding endpoint speaks Anthropic Messages, and the Anthropic SDK appends /v1/messages internally. /coding/v1 + SDK suffix produced /coding/v1/v1/messages (a 404). /coding + SDK suffix now yields /coding/v1/messages correctly. - kimi-coding ProviderConfig: keep legacy default api.moonshot.ai/v1 so non-sk-kimi- moonshot keys still authenticate. sk-kimi- keys are already redirected to api.kimi.com/coding via _resolve_kimi_base_url. - doctor.py: update Kimi UA to claude-code/0.1.0 (was KimiCLI/1.30.0) and rewrite /coding base URLs to /coding/v1 for the /models health check (Anthropic surface has no /models). - test_kimi_env_vars: accept KIMI_CODING_API_KEY as a secondary env var. E2E verified: sk-kimi-<key> → https://api.kimi.com/coding/v1/messages (Anthropic) sk-<legacy> → https://api.moonshot.ai/v1/chat/completions (OpenAI) UA: claude-code/0.1.0, x-api-key: <sk-kimi-*>	2026-04-21 19:48:39 -07:00
Teknium	84449d9afe	fix(prompt): tell CLI agents not to emit MEDIA:/path tags (#13766 ) The CLI has no attachment channel — MEDIA:<path> tags are only intercepted on messaging gateway platforms (Telegram, Discord, Slack, WhatsApp, Signal, BlueBubbles, email, etc.). On the CLI they render as literal text, which is confusing for users. The CLI platform hint was the one PLATFORM_HINTS entry that said nothing about file delivery, so models trained on the messaging hints would default to MEDIA: tags on the CLI too. Tool schemas (browser_tool, tts_tool, etc.) also recommend MEDIA: generically. Extend the CLI hint to explicitly discourage MEDIA: tags and tell the agent to reference files by plain absolute path instead. Add a regression test asserting the CLI hint carries negative guidance about MEDIA: while messaging hints keep positive guidance.	2026-04-21 19:36:05 -07:00
Teknium	8f167e8791	fix(tts): use per-provider input-character caps instead of global 4000 (#13743 ) A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at 4000 chars, causing long inputs to be silently chopped even though the underlying APIs allow much more: - OpenAI: 4096 - xAI: 15000 - MiniMax: 10000 - ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware) - Gemini: ~5000 - Edge: ~5000 The schema description also told the model 'Keep under 4000 characters', which encouraged the agent to self-chunk long briefs into multiple TTS calls (producing 3 separate audio files instead of one). New behavior: - PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH encode the documented per-provider limits. - _resolve_max_text_length(provider, cfg) resolves: 1. tts.<provider>.max_text_length user override 2. ElevenLabs model_id lookup 3. provider default 4. 4000 fallback - text_to_speech_tool() and stream_tts_to_speaker() both call the resolver; old MAX_TEXT_LENGTH alias kept for back-compat. - Schema description no longer hardcodes 4000. Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253 voice-command/voice-cli tests still pass.	2026-04-21 17:49:39 -07:00
brooklyn!	90fca3c7e0	Merge pull request #13724 from NousResearch/bb/tui-resume-all-sources fix(tui): /resume picker shows telegram/discord/etc sessions	2026-04-21 18:59:12 -05:00
Brooklyn Nicholson	bd046220b3	fix(tui): narrow /resume sources to human adapters Follow-up on #13724: showing literally every source was too noisy.\n\n now fetches a wider window (, larger limit) and then filters to a curated allowlist of human-facing sources (tui/cli plus chat adapters like telegram/discord/slack/whatsapp/etc). This keeps row #7 fixed (telegram sessions visible in /resume) without surfacing internal source kinds such as tool/acp.	2026-04-21 18:52:26 -05:00
Teknium	9c9d9b7ddf	feat(delegate): cross-agent file state coordination for concurrent subagents (#13718 ) * feat(models): hide OpenRouter models that don't advertise tool support Port from Kilo-Org/kilocode#9068. hermes-agent is tool-calling-first — every provider path assumes the model can invoke tools. Models whose OpenRouter supported_parameters doesn't include 'tools' (e.g. image-only or completion-only models) cannot be driven by the agent loop and fail at the first tool call. Filter them out of fetch_openrouter_models() so they never appear in the model picker (`hermes model`, setup wizard, /model slash command). Permissive when the field is missing — OpenRouter-compatible gateways (Nous Portal, private mirrors, older snapshots) don't always populate supported_parameters. Treat missing as 'unknown → allow' rather than silently emptying the picker on those gateways. Only hide models whose supported_parameters is an explicit list that omits tools. Tests cover: tools present → kept, tools absent → dropped, field missing → kept, malformed non-list → kept, non-dict item → kept, empty list → dropped. * feat(delegate): cross-agent file state coordination for concurrent subagents Prevents mangled edits when concurrent subagents touch the same file (same process, same filesystem — the mangle scenario from #11215). Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1: 1. FileStateRegistry (tools/file_state.py) — process-wide singleton tracking per-agent read stamps and the last writer globally. check_stale() names the sibling subagent in the warning when a non-owning agent wrote after this agent's last read. 2. Per-path threading.Lock wrapped around the read-modify-write region in write_file_tool and patch_tool. Concurrent siblings on the same path serialize; different paths stay fully parallel. V4A multi-file patches lock in sorted path order (deadlock-free). 3. Delegate-completion reminder in tools/delegate_tool.py: after a subagent returns, writes_since(parent, child_start, parent_reads) appends '[NOTE: subagent modified files the parent previously read — re-read before editing: ...]' to entry.summary when the child touched anything the parent had already seen. Complements (does not replace) the existing path-overlap check in run_agent._should_parallelize_tool_batch — batch check prevents same-file parallel dispatch within one agent's turn (cheap prevention, zero API cost), registry catches cross-subagent and cross-turn staleness at write time (detection). Behavior is warning-only, not hard-failing — matches existing project style. Errors surface naturally: sibling writes often invalidate the old_string in patch operations, which already errors cleanly. Tests: tests/tools/test_file_state_registry.py — 16 tests covering registry state transitions, per-path locking, per-path-not-global locking, writes_since filtering, kill switch, and end-to-end integration through the real read_file/write_file/patch handlers.	2026-04-21 16:41:26 -07:00

1 2 3 4 5 ...

2321 commits