hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-30 06:41:51 +00:00

Author	SHA1	Message	Date
kshitijk4poor	66827f8947	chore: prune unused imports and duplicate import redefinitions Remove unused imports (F401) and duplicate/shadowed import redefinitions (F811) across the codebase using ruff's safe autofixes. No behavioral changes -- imports only. - ~1400 safe autofixes applied across 644 files (net -1072 lines) - __init__.py re-exports preserved (excluded from F401 removal so public re-export surfaces stay intact) - Re-exports that are imported or monkeypatched by tests but look unused in their defining module are kept with explicit # noqa: F401 (gateway/run.py load_dotenv; run_agent re-exports from agent.message_sanitization, agent.context_compressor, agent.retry_utils, agent.prompt_builder, agent.process_bootstrap, agent.codex_responses_adapter) - Unsafe F841 (unused-variable) fixes deliberately skipped -- those can change behavior when the RHS has side effects - ruff lints remain disabled in pyproject.toml (only PLW1514 is selected); this is a one-time cleanup, not a config change Verification: - python -m compileall: clean - pytest --collect-only: all 27161 tests collect (zero import errors) - core entry points import clean (run_agent, model_tools, cli, toolsets, hermes_state, batch_runner, gateway) - static scan: every name any test imports directly from an edited module still resolves	2026-05-28 22:26:25 -07:00
Nicolò Boschi	490b3e76b1	feat(hindsight): default recall_types to observation only Auto-recall used to surface every fact type Hindsight had on the session — `world`, `experience`, and `observation`. That triple-ships the same underlying signal in three different framings: observations are the concrete events the user said/did/asked, while world and experience facts are aggregate summaries Hindsight derives from those exact observations. Including all three burns most of `recall_max_tokens` on rephrasings, crowds out events the model actually needs to see, and produces effective duplicates in the prompt — observations themselves are deduplicated by construction so observation-only recall is denser per token and closer to conversational ground truth. Change ------ - Default `_recall_types = ["observation"]` (was `None`, which delegated to server-side "return everything"). - `initialize()` now treats a missing `recall_types` config the same way; also accepts comma-separated strings for parity with `recall_tags`. - An explicit `recall_types=[]` config falls back to the default rather than disabling the filter (would silently widen recall vs. the new default). - Added to `get_config_schema()` so it's discoverable via `hermes config`. Per-call `hindsight_recall` tool invocations are unaffected — they already only forward `types` when the caller passes the argument. Docs / migration ---------------- plugins/memory/hindsight/README.md grows a "Behavior change" callout explaining the why (no-duplicates, information-efficient) and how to restore the legacy broad recall: "recall_types": "observation,world,experience" # or a JSON list in `~/.hermes/hindsight/config.json`. Tests ----- - `test_default_values` updated for the new default. - New cases: explicit list override, CSV string accepted, empty list falls back to default (not "wider than default").	2026-05-28 13:07:20 -07:00
Teknium	5e1f793430	chore(web): remove web_crawl tool + provider crawl plumbing (#33824 ) The web_crawl_tool() function was an orphan — no model schema registered it, no skill or CLI command called it, and the agent had no way to invoke it. PR #32608 proposed wiring it up as a model-callable tool; we've decided not to expose crawl as a separate capability since web_search + web_extract cover the use cases we want models to have. Removed: - tools/web_tools.py: web_crawl_tool() (~230 LOC) - plugins/web/firecrawl/provider.py: supports_crawl() + crawl() - plugins/web/tavily/provider.py: supports_crawl() + crawl() - plugins/web/xai/provider.py: supports_crawl() override - agent/web_search_provider.py: supports_crawl() + crawl() ABC methods - agent/web_search_registry.py: get_active_crawl_provider() + the 'crawl' branch in _resolve() - agent/display.py: web_crawl tool-progress rendering - hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools - tools/website_policy.py: stale comment reference - Tests: removed TestWebCrawlTavily class, the two website-policy web_crawl tests, the searxng/ddgs/brave-free crawl-error tests, the integration test_web_crawl method, and the test_unconfigured_crawl_emits_top_level_error test. Trimmed the capability-flag parametrize list and the WebSearchProvider ABC conformance tests. - Docs: trimmed the Crawl column from capability tables in both EN and zh-Hans, updated the developer-guide ABC table. Net: 25 files, +115/-1067. Closes #33762 (the schema-text bug only existed if #32608 landed). Supersedes #32608.	2026-05-28 04:52:42 -07:00
Robin Fernandes	406901b27d	feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly.	2026-05-28 00:19:31 -07:00
Teknium	9919caff46	feat(image_gen): add Krea provider plugin (Krea 2 Medium + Large) (#33236 ) * feat(image_gen): add Krea provider plugin (Krea 2 Medium + Large) New built-in image_gen backend wrapping Krea's Krea 2 foundation image model family. Auto-discovered like the other image_gen plugins and appears in 'hermes tools' → Image Generation → Krea. Krea's API is asynchronous — submit returns a job_id, poll /jobs/{id} until terminal. The provider hides that behind the synchronous ImageGenProvider.generate() contract: submit, poll every 2s with light backoff (max 5s), 3-minute ceiling matching Krea's hosted-tool timeout. Result URL is materialised to $HERMES_HOME/cache/images/ to avoid CDN-expiry 404s downstream (same fix as xAI #26942). Models: - krea-2-medium (default — Krea's 'start here' recommendation) - krea-2-large Aspect ratios map landscape→16:9, square→1:1, portrait→9:16. Resolution: 1K (Krea's only current option). Kwarg passthrough: seed, creativity (raw/low/medium/high), styles, image_style_references (capped 10), moodboards (capped 1) — matches Krea's per-request limits. Unknown kwargs are ignored. Config knobs (config.yaml): image_gen.provider: krea image_gen.krea.model: krea-2-medium \| krea-2-large image_gen.krea.creativity: raw \| low \| medium \| high Env overrides: KREA_API_KEY (required), KREA_IMAGE_MODEL. KREA_API_KEY is registered in OPTIONAL_ENV_VARS so 'hermes setup' prompts for it. 31 new tests; image_gen suite + picker + tools_config: 211/211. * fix(image_gen/krea): address review feedback - Update KREA_API_KEY setup URL to the canonical token-creation page (https://www.krea.ai/app/api/tokens). The previous URL returned 404. - Fail fast on non-retryable HTTP statuses during poll. The previous loop retried every HTTPError for the full 180s deadline, so an auth (401), billing (402), forbidden (403), or not-found (404) response would make image_generate hang for three minutes. Only retry transient statuses (408/409/425/429/5xx); surface everything else immediately. - Add 5 tests covering fail-fast on 401/403/404 and retry on 429/503. * fix(krea): point users at the real API token dashboard URL Three call sites linked users to dashboard pages that don't exist: - hermes_cli/config.py: https://www.krea.ai/app/api/tokens - plugins/image_gen/krea/__init__.py get_setup_schema: https://www.krea.ai/api-keys - plugins/image_gen/krea/__init__.py auth_required error: https://www.krea.ai/api-keys Per Krea's own docs (https://docs.krea.ai/developers/api-keys-and-billing), the real dashboard URL is https://www.krea.ai/settings/api-tokens. All three sites now point there.	2026-05-27 11:01:47 -07:00
Ben	61dcc33893	feat(dashboard-auth): config.yaml as canonical surface for dashboard.oauth Per AGENTS.md, ~/.hermes/.env is reserved for API keys / secrets and config.yaml is the surface for non-secret configuration. The Nous Portal plugin previously read HERMES_DASHBOARD_OAUTH_CLIENT_ID and HERMES_DASHBOARD_PORTAL_URL from the environment only, which forced local-dev / on-prem operators to put non-secret per-instance configuration in .env — violating the convention. Add dashboard.oauth.{client_id,portal_url} to DEFAULT_CONFIG and have the plugin resolve each setting with env-overrides-config precedence: 1. Env var when set to a non-empty value (Fly.io platform-secret injection — what pushes per-deploy client_ids without baking them into the image). 2. config.yaml entry (canonical surface for local dev / on-prem). 3. Plugin default (no provider registered when client_id is empty; portal_url defaults to https://portal.nousresearch.com). Empty env values are explicitly treated as unset so a provisioned-but- not-populated Fly secret can't accidentally shadow a valid config.yaml entry with an empty string — operators would otherwise lose the gate. Implementation: - hermes_cli/config.py: add dashboard.oauth.{client_id,portal_url} block to DEFAULT_CONFIG with full doc comment explaining the override precedence and Fly.io rationale. - plugins/dashboard_auth/nous/__init__.py: add _load_config_oauth_section, _resolve_client_id, _resolve_portal_url helpers; replace the two direct os.environ.get() calls in register() with the resolvers. Update the skip-reason string to mention BOTH surfaces so an operator looking at the fail-closed bind error knows config.yaml is a valid alternative to the env var. - plugins/dashboard_auth/nous/plugin.yaml: update description to name both surfaces. requires_env stays pointing at the env var name — it's metadata-only (not used by the plugin loader for gating) so this is documentation/UX, not enforcement. - cli-config.yaml.example: append commented dashboard.oauth block with the same override rationale operators see in code. - website/docs/user-guide/features/web-dashboard.md: rewrite the 'Default provider: Nous Research' section to lead with config.yaml, present env vars as operator overrides (Fly.io's primary path). Updated the example fail-closed bind error to match the new skip-reason text. Test coverage — new TestConfigYamlSource class (8 tests) pinning every tier of the precedence chain: - config-yaml-only path registers correctly - both config-yaml fields (client_id + portal_url) honoured - env var overrides config for client_id (Fly.io critical path) - env var overrides config for portal_url - empty env string does NOT shadow config (CI/Fly edge case) - neither source set → skip with reason mentioning BOTH surfaces - load_config() raising falls through to env-only path (resilience) - non-dict oauth section falls through cleanly (typo resilience) Mutation-tested: flipping the precedence to config-wins-over-env trips exactly test_env_overrides_config_client_id while the other 7 stay green, confirming the suite discriminates the order, not just the sources. This closes the last item in Teknium's PR review (PR #30156).	2026-05-27 02:12:27 -07:00
Ben	a498485631	feat(dashboard-auth-nous): surface token iss/aud in verification-failure error When jwt.decode raises InvalidTokenError, decode the token a second time without signature verification (safe — we never trust the values, just display them) and append the actual iss/aud claims plus our configured expected values to the error message. Lets operators see config drift between HERMES_DASHBOARD_PORTAL_URL / HERMES_DASHBOARD_OAUTH_CLIENT_ID and what Portal is actually emitting without having to hand-decode the JWT from the browser cookie.	2026-05-27 02:12:27 -07:00
Ben	b3dc539304	feat(dashboard-auth): Nous plugin always-on; default portal URL; specific error messages The Nous OAuth provider plugin (plugins/dashboard_auth/nous) is bundled and auto-loaded — same as before — but previously refused to register unless BOTH HERMES_DASHBOARD_OAUTH_CLIENT_ID and HERMES_DASHBOARD_PORTAL_URL were set, then the gate's fail-closed branch told the operator 'install the default Nous provider'. That message is misleading: the provider IS installed; it's just unconfigured. And the contract only really needs the per-instance client_id — the portal URL is the same for everyone in production. Three changes: 1. plugins/dashboard_auth/nous/__init__.py: - HERMES_DASHBOARD_PORTAL_URL is now optional and defaults to 'https://portal.nousresearch.com'. Override only for staging (portal.rewbs.uk) or a custom deployment. Empty string also falls back to the default so an empty Fly secret can't point the dashboard at nowhere. - Plugin exposes a module-level LAST_SKIP_REASON: str that the gate reads when no providers register. Cleared on each register() call. Skip reasons are human-readable and actionable ('HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. The Nous Portal provisions this env var…'). 2. plugins/dashboard_auth/nous/plugin.yaml: - requires_env drops HERMES_DASHBOARD_PORTAL_URL; only the client_id is mandatory. Description updated to reflect this. 3. hermes_cli/web_server.py: - When the gate fail-closes for 'no providers', it now reads each bundled plugin's LAST_SKIP_REASON and embeds them in the SystemExit message. Operator sees the specific config fix needed: Bundled providers reported these issues: • nous: HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. … instead of the prior generic 'Install the default Nous provider'. Tests: - TestPluginRegister rewritten to assert the new defaults + LAST_SKIP_REASON contents (6 tests, +1 new for empty-string env). - New gate test test_start_server_surfaces_nous_skip_reason_when_unconfigured. - test_get_method_is_not_allowed widened to handle the SPA-shell 200 path explicitly — assertion now verifies no JSON ticket leaks rather than asserting a specific status code (covers all four of 401/404/405/200). Docs updated: web-dashboard.md's 'Default provider' section now shows the env-var table with required/optional columns and embeds the fail-closed error message verbatim so operators can match what they see at the prompt.	2026-05-27 02:12:27 -07:00
Ben	848baeb0a8	feat(dashboard-auth): plugins/dashboard_auth/nous — contract-compliant Nous OAuth provider Bundled, kind=backend, auto-loads. Activates ONLY when Portal-injected env vars are present: HERMES_DASHBOARD_OAUTH_CLIENT_ID — agent:{instance_id} HERMES_DASHBOARD_PORTAL_URL — Portal base URL Loopback / --insecure operators leave both unset and never see this plugin register anything. The fail-closed branch in start_server handles the 'public bind + zero providers' case independently. Implementation follows nous-account-service PR #180's published OAuth contract verbatim: - client_id is per-instance (agent:{instance_id}); the suffix is cross-checked against the token's agent_instance_id claim as defense-in-depth (contract C9). - scope is agent_dashboard:access only (contract C3). - aud is the bare client_id, no hermes-cli: prefix (contract C2). - RS256 JWT verification against /.well-known/jwks.json with 5-minute cache (contract C7). - No refresh tokens in V1: refresh_session always raises RefreshExpiredError; revoke_session is a no-op (contract C5). - oauth_contract_version claim: missing → warn + proceed; present and != 1 → refuse (contract C11, OQ-C2 tolerant treatment). - redirect_uri validated client-side as defense before bouncing to Portal; authoritative check is server-side per agent-redirect-uri.ts. 41 new tests covering construction, plugin-entry env gating, start_login shape, complete_login httpx-mocked happy path + error mapping, verify_session JWT verification (RSA keypair fixture, full claim-check matrix), refresh_session always raising, revoke_session no-op. PyJWT + cryptography are already in the venv (jose was previously suggested; switched to pyjwt[crypto] since the latter is already pulled in transitively).	2026-05-27 02:12:27 -07:00
Teknium	249534e472	plugins: add security-guidance — pattern-matched warnings on dangerous code writes (#33131 ) New opt-in plugin that scans the content passed to write_file / patch / skill_manage for 25 known-dangerous code patterns — pickle.load, yaml.load, eval(, os.system, subprocess(shell=True), child_process.exec, dangerouslySetInnerHTML, innerHTML/outerHTML/document.write/ insertAdjacentHTML, crypto.createCipher (no IV), AES ECB, TLS verification disabled, XXE-prone xml.etree/minidom parsers, <script src=//...> without SRI, torch.load without weights_only=True, GitHub Actions ${{ github.event.* }} injection — and appends a "Security guidance" warning block to the tool result via the transform_tool_result hook. Default behaviour is non-blocking: the file is written and the warning rides back to the model in the next turn so it can self-correct or document why the construct is safe. SECURITY_GUIDANCE_BLOCK=1 upgrades to refusing the write entirely; SECURITY_GUIDANCE_DISABLE=1 is the kill switch. Pattern data (patterns.py) is a verbatim Apache-2.0 fork of Anthropic's claude-plugins-official/plugins/security-guidance/hooks/ patterns.py at commit 0bde168 (2026-05-26). LICENSE and NOTICE preserve attribution. The Hermes-side plugin glue (__init__.py, plugin.yaml, README.md, tests) is original work. Plugin is opt-in like all bundled plugins: hermes plugins enable security-guidance Inspired by https://x.com/ClaudeDevs/status/1927108527247... — Anthropic shipped this as their security-guidance plugin for Claude Code on 2026-05-26 with a measured 30-40% reduction in security-related PR comments on internal rollout. What's NOT ported (deferred): * Layer 2 (LLM diff review on turn end) — would route through main model by default on Hermes, real money on reasoning models. A follow-up can wire it to a cheap aux model with explicit opt-in. * Layer 3 (agentic commit-time review) — agent can run this on demand via delegate_task today. * .hermes/security-guidance.md project-rules file — only used by layers 2/3 upstream.	2026-05-27 02:07:21 -07:00
Will Falcon	bba50977bc	fix: parse Codex image generation SSE directly	2026-05-26 20:40:29 -07:00
teknium1	d3ffbc6409	feat(stt): add stt.providers.<name> command-provider registry Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds, SenseVoice, curl pipelines — become an STT backend with zero Python. Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved untouched via the built-in local_command path) and the register_transcription_provider() Python plugin hook also shipped in this PR. Resolution order (mirrors TTS exactly): 1. Built-in (local, local_command, groq, openai, mistral, xai) → native handler. Always wins. 2. stt.providers.<name>: type: command → command-provider runner. 3. Plugin-registered TranscriptionProvider → plugin dispatch. 4. No match → 'No STT provider available'. Files ----- - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained; added _resolve_command_stt_provider_config, _transcribe_command_stt, and local helpers for template rendering, shell-quote context, and process-tree termination. Helpers are documented as mirrors of their tts_tool.py counterparts (kept local to avoid cross-tool private import). Wire-in is one insertion point in transcribe_audio() after the xai elif and before the plugin dispatcher. Plugin dispatcher additionally defensively short-circuits when a same-name command config exists (command-wins-over-plugin invariant). - tests/tools/test_transcription_command_providers.py: 50 new tests covering resolution (builtin precedence, type/command gating, case-insensitive lookup, legacy stt.<name> back-compat), helpers (timeout fallback, format validation, iter, has-any), template rendering (shell-quote contexts, doubled-brace preservation), end-to-end via _transcribe_command_stt (output_path read, stdout fallback, timeout, nonzero exit envelope, model override, language precedence), and dispatcher integration via the real transcribe_audio() including command-wins-over-plugin and builtin-shadow-rejection. - tests/plugins/transcription/check_parity_vs_main.py: extended from 10 to 13 scenarios. New cases: command-provider-installed, command-vs-plugin-same-name (verifies command wins precedence), explicit-openai-with-command-shadow (verifies built-in wins). Adds command_provider dispatch_kind detection via transcript prefix (CMD: vs PLUGIN:) so command-provider scenarios can be distinguished from plugin scenarios even when sharing a provider name. - website/docs/user-guide/features/tts.md: new 'STT custom command providers' section symmetric to the TTS section — example config, placeholder grammar table (input_path / output_path / output_dir / format / language / model), transcript-read-back semantics (file first, then stdout fallback), optional keys table, behavior notes, security note. Updated 'Python plugin providers (STT)' to include the new 'When to pick which (STT)' decision table and updated resolution-order section (now 4 layers instead of 3). Verification ------------ 189/189 STT targeted tests + 50/50 new command-provider tests pass. Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/ 8623/8623 — zero regressions across 14,199 tests. Parity harness: 13 scenarios, 9 OK + 4 expected diffs (no_provider_error → plugin, plugin_unavailable, command_provider × 2). E2E live-verified in an isolated HERMES_HOME with a real .wav file: command: → dispatched to stt.providers.my-fake-cli plugin: → dispatched to registered TranscriptionProvider command-wins-over-plugin: → command provider beats same-name plugin builtin-wins-over-command: → built-in OpenAI handler fires; stt.providers.openai: type: command does NOT hijack it.	2026-05-25 01:41:19 -07:00
kshitijk4poor	2cd952e110	feat(stt): add register_transcription_provider() plugin hook Add an opt-in Python plugin surface for speech-to-text backends, mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio, Gemini-STT, custom proprietary engines) can be implemented as plugins without modifying tools/transcription_tools.py. Built-ins always win -------------------- The 6 built-in STT providers (local/faster-whisper, local_command, groq, openai, mistral, xai) keep their native handlers. Plugins attempting to register under a built-in name are rejected at registration time with a warning and re-checked defensively at dispatch. Resolution order ---------------- 1. stt.provider matches a built-in → built-in dispatch (unchanged) 2. stt.provider matches a registered plugin → a. if plugin.is_available() returns False → unavailability envelope identifying the plugin (not the generic "No STT provider" message — the user explicitly opted into this plugin) b. otherwise plugin.transcribe() with model + language forwarded from stt.<provider>.{model,language} config 3. No match → legacy "No STT provider available" error (unchanged) Per-provider config namespace ----------------------------- Plugins read their config from stt.<provider> in config.yaml, mirroring how built-ins read stt.openai.model / stt.mistral.model. The dispatcher forwards `model` and `language` from this section. Caller's explicit `model=` argument overrides the config-set model. Files ----- - agent/transcription_provider.py: TranscriptionProvider ABC - agent/transcription_registry.py: register/get/list providers, built-in shadow guard, _reset_for_tests - hermes_cli/plugins.py: register_transcription_provider() on PluginContext - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset, _dispatch_to_plugin_provider() with availability gate, wire-in after xai branch and before "No STT provider" error - tests/agent/test_transcription_registry.py: 27 tests - tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests - tests/tools/test_transcription_plugin_dispatch.py: 28 tests (covering built-in short-circuit, plugin dispatch, exception envelope, non-dict guard, availability gate, language forwarding) - tests/plugins/transcription/check_parity_vs_main.py: 10-scenario subprocess-pinned parity harness vs origin/main - website/docs/user-guide/features/{tts,plugins}.md: docs Behavior parity --------------- 10 scenarios, 8 OK + 2 expected DIFFs: no_provider_error → plugin (plugin-installed scenario) no_provider_error → plugin_unavailable (plugin-installed-unavailable scenario; PR returns cleaner envelope) Zero behavior change for users not opting into a plugin. Issue follow-up to #30398.	2026-05-25 01:41:19 -07:00
Teknium	031f9c9edc	fix(image_gen): cache xAI ephemeral URL responses to disk (#26942 ) (#31759 ) xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs that 404 within minutes — long before downstream consumers (Telegram send_photo, browser preview, multi-tier delivery fallback) get a chance to fetch them. The xAI image_gen provider was passing those URLs through unchanged on the elif url: branch; b64 responses were already cached locally via save_b64_image. Result: every image_generate call on a Telegram-routed xai-oauth profile delivered no image, falling through to text-only. Adds agent.image_gen_provider.save_url_image() — a sibling helper to save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/. Content-type-aware extension inference with URL-suffix fallback; oversize cap (25MB default) with partial-write cleanup; empty-body refusal. Mirrors the audio_cache pattern used by text_to_speech. Wires save_url_image into both the xAI and OpenAI providers' URL branches. When the download fails (network blip, 404 in-flight) we log a warning and fall back to the bare URL rather than turning the tool call into a hard error — the gateway's existing URL-send fallback then gets a chance to surface the original error legibly. Test plan: - tests/agent/test_save_url_image.py — 8 direct tests against a real in-process HTTP server: bytes round-trip, content-type → extension, URL-suffix fallback, default-to-png, 404 propagation, empty-body refusal, oversize cap + cleanup, filename uniqueness. - tests/plugins/image_gen/test_xai_provider.py — flip test_successful_url_response (was asserting the bug), add test_url_response_falls_back_to_bare_url_when_download_fails. - tests/plugins/image_gen/test_openai_provider.py — symmetric pair. 160/160 in the broader image_gen test surface.	2026-05-24 18:10:47 -07:00
kshitijk4poor	00ec0b617c	feat(tts): add register_tts_provider() plugin hook (closes #30398 ) Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, alongside the existing config-driven `tts.providers.<name>: type: command` registry from PR #17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for new engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. Always wins. 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR #17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes #30398	2026-05-24 18:04:54 -07:00
teknium1	70aaa774be	fix(opencode-go): emit Kimi reasoning_effort, match KimiProfile shape The Kimi K2 branch added in the prior commit only emitted extra_body.thinking and dropped reasoning_effort entirely. KimiProfile (api.moonshot.ai/v1) sends both fields, and OpenCode Go proxies to the same Moonshot backend. Mirror that shape on the Go path so /reasoning effort actually reaches Kimi. - low/medium/high pass through verbatim - xhigh/max clamp to high (Moonshot's max supported value) - minimal / unknown effort → omit reasoning_effort, keep thinking on - disabled / no config → unchanged - DeepSeek branch unchanged	2026-05-23 02:20:28 -07:00
Harish Kukreja	3589960e03	fix(provider): expose OpenCode Go reasoning controls	2026-05-23 02:20:28 -07:00
0xDevNinja	3ac2125140	refactor(image_gen): port FAL backend to plugins/image_gen/fal Mirrors the architecture established by the web (#25182), browser (#25214), and video_gen (#25126) plugin migrations: * `tools/fal_common.py` — stateless atoms shared by both FAL-backed plugins (image_gen + video_gen). Holds the lazy `fal_client` import helper, `_ManagedFalSyncClient`, `_normalize_fal_queue_url_format`, `_extract_http_status`. Stateful pieces (`fal_client` module global, `_managed_fal_client` cache, `_submit_fal_request`, `_resolve_managed_fal_gateway`, `_get_managed_fal_client`) intentionally stay on `tools.image_generation_tool` so the existing `monkeypatch.setattr(image_tool, ...)` patch sites keep working unchanged. `plugins/video_gen/fal/__init__.py` — drops its inline `_load_fal_client` duplicate; consumes `tools.fal_common.import_fal_client`. * `plugins/image_gen/fal/{plugin.yaml,__init__.py}` — new plugin. `FalImageGenProvider` is a thin registration adapter that resolves the legacy module via `import tools.image_generation_tool as _it` and calls `_it.image_generate_tool` + `_it._resolve_fal_model` at call time. The 18-model catalog, `_build_fal_payload`, managed- gateway selection, and Clarity Upscaler chaining all remain in `tools.image_generation_tool` as the single source of truth — the plugin is a registration adapter, not a parallel implementation. * `tools/image_generation_tool.py::_dispatch_to_plugin_provider` — drops the `configured == "fal"` skip. Setting `image_gen.provider: fal` now routes through the registry like any other provider; the plugin re-enters this module's pipeline so behavior is identical. Unset `image_gen.provider` still falls through to the in-tree pipeline (preserves no-config-with-FAL_KEY UX from #15696). * `hermes_cli/tools_config.py` — drops the hardcoded "FAL.ai" row from `TOOL_CATEGORIES["image_gen"]["providers"]` (now injected by `_plugin_image_gen_providers` like every other backend) and the `getattr(provider, "name") == "fal"` skip that protected against duplication with the hardcoded row. The "Nous Subscription" row stays as a setup-flow entry — same shape browser kept "Nous Subscription (Browser Use cloud)" after #25214. * `tests/plugins/image_gen/test_fal_provider.py` — 14 cases covering the ABC surface, call-time indirection (verifying `monkeypatch.setattr(image_tool, "image_generate_tool", ...)` takes effect through the plugin), response-shape stamping, exception handling, and registry wiring. * `tests/plugins/image_gen/check_parity_vs_main.py` — subprocess harness mirroring `tests/plugins/browser/check_parity_vs_main.py`. Pins one path to origin/main, one to the worktree; runs six scenarios (unset, explicit-fal-no-creds, explicit-fal-with-creds, explicit-fal-with-model, typo provider, managed-gateway-only) and diffs the reduced shape `{dispatch_kind, provider_name, model}` per scenario. The only acceptable diff is "legacy_fal → plugin (fal)" for explicit-FAL paths — every other delta is flagged as a regression. * `tests/hermes_cli/test_image_gen_picker.py::test_fal_surfaced_alongside_other_plugins` — flips the previous `test_fal_skipped_to_avoid_duplicate` to match the new shape (FAL is a plugin now, no dedup needed). Verified: 195/195 tests across `tests/{tools/test_image_generation*,tools/test_managed_media_gateways,plugins/image_gen,plugins/video_gen,hermes_cli/test_image_gen_picker}.py` pass on this branch with no test patches modified outside the picker test that asserted the old skip behaviour. Fixes #26241	2026-05-22 04:10:45 -07:00
ethernet	48be2e0e4d	test: use subprocesses for each test file (#29016 ) * ci(tests): install ripgrep from prebuilt tarball instead of apt apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu runners (the apt-get update against archive.ubuntu.com is the slow part; ripgrep itself is small). Switching to the upstream musl binary tarball cuts the step to a few seconds. - Pinned to ripgrep 15.1.0 with sha256 verification (same hash as published in the releases sha256 sidecar file). - Drops the `rg` binary into /usr/local/bin so it is on PATH for every subsequent step without GITHUB_PATH manipulation. - Applied to both the test and e2e jobs in tests.yml. * fix(cli): compile syntax check to tempdir, not source __pycache__ `_validate_critical_files_syntax` runs `py_compile.compile()` on each critical bootstrap file after a successful `git pull`. The default `py_compile` writes the resulting `.pyc` next to the source under `__pycache__/`, which causes two real problems: 1. Parallel test workers walking the same source tree (e.g. running the suite under per-file process isolation) can race against each other on the `__pycache__` write — manifests as flaky 'directory not empty' errors during teardown. 2. In production, the post-pull syntax check leaves a `.pyc` behind that the next interpreter run might pick up — fine when the interpreter version matches, sketchy if it doesn't. Fix: write the compiled output to a `tempfile.TemporaryDirectory()` that's discarded on function exit. We only care about the compile-or-not signal, not the artifact. * test(runner): per-file process isolation, drop manual state reset + xdist Replace fragile manual _reset_module_state test fixtures with robust per-file subprocess isolation. Each test file runs in a fresh `python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist, no custom pytest plugin, no shared worker state. Key changes: * scripts/run_tests_parallel.py — new runner: discovers test files, runs N in parallel via ThreadPoolExecutor, captures stdout per file, treats exit code 5 (no tests collected) as pass, kills all children on exit. Change from cpu_count to cpu_count2. The runner is I/O-bound (waiting on subprocess.communicate() from pytest children) The parent process does almost no CPU work, so 2x oversubscription keeps more pipes full. When a file fails, immediately show the last 30 lines of pytest output (stack traces + FAILED summary) plus a ready-to-copy repro command: python -m pytest tests/agent/test_auxiliary_client.py scripts/run_tests.sh — delegates to run_tests_parallel.py * .github/workflows/tests.yml — test step: python scripts/run_tests_parallel.py * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts * tests/conftest.py — remove ~200 lines of manual state-reset fixtures * AGENTS.md — update Testing section for per-file design * test(runner): speed gateway test antipattern scan up * fix(test): web search provider plugin test missing xai * fix(tests): make 14 test files pass under per-file subprocess isolation Tests that relied on cross-file state pollution from xdist workers fail when run in isolation (per-file subprocess model). Root causes and fixes: Tool registry not populated: - test_video_generation_tool_surface_matrix: add discover_builtin_tools() - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures registering all 8 bundled web providers, reset after each test - test_website_policy: same provider registration pattern - test_web_tools_tavily: same pattern across 3 dispatch test classes - Also add is_safe_url/check_website_access mocks where SSRF check blocks example.com (DNS resolution fails in isolated envs) Stale check_fn cache: - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache() in both kanban guidance tests (prior test cached False for kanban_show) - test_discord_tool: cache invalidation in setup/teardown - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries Module-level state pollution: - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache - test_skill_commands: set_session_vars() instead of patch.dict(os.environ) (ContextVar takes precedence over os.environ) - test_dm_topics: overwrite sys.modules + separate telegram.constants mock + force-reimport of gateway.platforms.telegram - test_terminal_tool_requirements: removed duplicate class declaration, autouse _clear_caches fixture * change(tests): run_tests.sh explicitly includes env vars instead of manually dropping some vars, now we just only include some * fix(tests): 5 more isolation/NixOS fixes - test_approval_plugin_hooks: isolate HERMES_HOME so real user's command_allowlist doesn't short-circuit the approval path - test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum (feature not merged on this branch) - test_write_deny: test systemd prefix against tmp_path instead of /etc/systemd which resolves to /nix/store on NixOS - test_pty_bridge: use shutil.which('cat') instead of /bin/cat (doesn't exist on NixOS) - profiles.py: rmtree onexc handler chmod's parent dirs too, fixing profile deletion when copytree preserved read-only modes from nix store * fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client * fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor * fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test * fix: address PR #29016 review feedback - Remove tracked .pytest-cache/ artifact and add to .gitignore - Fix stale 'xdist worker' comment in conftest.py - Deduplicate web provider registration into tests/tools/conftest.py shared helper (register_all_web_providers), replacing 8 copy-pasted blocks across 6 test files - Update PR description: remove stale recovered-test-files claim, fix worker count to match code (cpu_count2) fix: eliminate race in stale-cache achievements test The background scan thread could complete and overwrite _SNAPSHOT_CACHE before evaluate_all() returned the stale data — only 10 fake sessions made the scan finish instantly. Added scan_delay param to _FakeSessionDB and set it to 2s in the stale-cache test so the background thread can't win the race.	2026-05-21 16:40:04 +05:30
Teknium	362ef912ea	fix(kanban-dashboard): restore implementations dropped during salvages (#28481 ) Four kanban dashboard test failures, all from PR salvages that picked up the test additions but dropped the corresponding implementations. - BOARD_COLUMNS: add 'review' (status added by PR `f55d94a1e` but the board API never grew the column → test_board_empty failed because VALID_STATUSES - {archived} mismatched the rendered columns). - update_task: enrich the 'ready' 409 detail with the blocking parent list (id, title, status) and add _parents_blocking_ready helper. Implementation lost in the #26744 salvage (commit `e215558ba`) which pinned the test but not the server-side code. - dist/index.js: add parseApiErrorMessage helper, wire it through the drag/drop banner, add patchErr state to the TaskDrawer and surface it inline by the action row. Lost in the same #26744 salvage. - test_diagnostics_endpoint_severity_filter: update to at-or-above semantics (PR `a94ddd807` changed the filter from exact-match so the warning filter now correctly includes error+critical too).	2026-05-18 21:54:56 -07:00
Jpalmer95	dfcf48b476	feat(kanban): drag-to-delete trash zone + bulk delete for task cards Salvages #28125 by @Jpalmer95. Adds: - Drag-to-delete trash zone in the kanban dashboard - Bulk delete endpoint with cascading delete_task cleanup - Frontend updates (drag visual + drop handler) - Confirmation prompt before delete Resolved end-of-file test conflict by appending both halves.	2026-05-18 21:40:13 -07:00
roycepersonalassistant	e3823657d6	feat(kanban): add scheduled status for delayed follow-ups Salvages #24533 by @roycepersonalassistant. Adds a first-class 'scheduled' Kanban status for time-delay follow-ups that aren't waiting on human input. - hermes kanban schedule <task_id> [reason] CLI command - Dashboard/API transitions to/from Scheduled - unblock_task() now releases both 'blocked' AND 'scheduled' tasks (re-checking parent dependencies before moving to ready/todo) - i18n + docs updates Resolved conflicts: kept HEAD's failure-counter reset on unblock alongside the PR's scheduled state, kept HEAD's 'running' direct-set rejection, combined both bulk-status branches. Dropped the dist/ bundle changes (months-stale; would need rebuild from source).	2026-05-18 21:39:03 -07:00
bensargotest-sys	81584940fe	docs: align kanban readiness docs and smoke tests Salvages #28199 by @bensargotest-sys. Aligns Kanban docs with current tool registration: dispatcher-spawned task workers get task tools, profiles that explicitly enable the kanban toolset get orchestrator routing tools (kanban_list, kanban_unblock). Corrects failure-limit text to current default of 2. Hardens the e2e subprocess script to resolve repo root and use the spawnable default assignee. Updates the diagnostics severity fixture to assert error below the critical threshold.	2026-05-18 21:07:03 -07:00
xxxigm	e215558ba7	test(kanban-dashboard): pin enriched 409 detail and inline error wiring (#26744 ) - Existing ``test_patch_drag_drop_move_todo_to_ready`` now asserts the enriched 409 detail names the blocking parent (id, quoted title, and current status), so the dashboard always has something actionable to render. - New bundle-assertion test ``test_dashboard_surfaces_ready_blocked_error_inline`` pins the frontend wiring: the ``parseApiErrorMessage`` helper exists, the drag/drop banner runs through it, and the drawer maintains a visible ``patchErr`` state that's cleared between PATCHes and tasks.	2026-05-18 21:02:49 -07:00
Interstellar-code	02efad704f	feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect) Adds three read-only endpoints to the kanban dashboard plugin so the SwitchUI workspace (and any other dashboard consumer) can track workers across tasks without N+1 round-trips through /tasks/{task_id}. - GET /workers/active Single SQL JOIN of task_runs + tasks where ended_at IS NULL, worker_pid IS NOT NULL, status='running'. Returns {workers: [...], count, checked_at}. - GET /runs/{run_id} Direct lookup of any task_run row by id. Reuses existing kanban_db.get_run() helper and _run_dict() serialiser. 404 when not found. Mirrors GET /tasks/{task_id} 404 shape. - GET /runs/{run_id}/inspect Live PID stats via psutil.Process.as_dict() — cpu_percent, memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status, create_time, cmdline. Short-circuits with alive:false when run has ended, has no worker_pid, the pid is gone, or psutil is unavailable. AccessDenied surfaces as alive:true with error rather than a 500. 11 new tests in tests/plugins/test_kanban_worker_runs.py cover the empty-board case, running-task case, ended-run filtering, missing-pid filtering, 404 paths, already-ended inspect, no-pid inspect, dead-pid inspect, and live-pid inspect (psutil mocked). All pass. Companion termination endpoint (POST /runs/{run_id}/terminate) is intentionally out of scope here — opening a separate issue first since the RBAC and dispatcher-mediated soft-cancel design needs maintainer input before code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 21:01:47 -07:00
kronexoi	e8ce7b83fa	fix(kanban): reject direct running transitions in dashboard bulk updates Salvages #24050 by @kronexoi. The single-task PATCH already rejects direct status='running' since it bypasses the dispatcher/claim invariant, but the bulk-update endpoint still accepted it. Aligns bulk with single by emitting an error result row for any 'running' entry.	2026-05-18 20:38:32 -07:00
roycepersonalassistant	6c4f11c64a	fix: show scheduled kanban tasks in dashboard	2026-05-18 20:25:45 -07:00
moortekweb-art	4f6101cc74	Fix Kanban dashboard initial board selection	2026-05-18 20:18:21 -07:00
Drexuxux	917e51858d	fix(kanban): demote ready children when a parent is reopened	2026-05-18 20:17:28 -07:00
Zyrixtrex	326c15d955	fix(kanban): preserve notifier_profile for dashboard home subscriptions	2026-05-18 20:14:45 -07:00
wuli666	028bbc5425	test(kanban-dashboard): cover _task_dict task_age fallback The fix in `061a1830` added an outer try/except in plugin_api._task_dict so that a future failure mode in kanban_db.task_age (anything _safe_int doesn't already absorb) cannot 500 the GET /board response. The _safe_int / task_age corruption paths got regression coverage in tests/hermes_cli/test_kanban_db.py, but the OUTER fallback contract remained untested -- meaning a refactor that drops the try/except would not be caught by CI. Pin that contract from both consumers of _task_dict: - GET /board returns 200 with the literal fallback age dict for the affected card (other cards continue to render via the same path) - GET /tasks/:id (drawer view) returns 200 with the same fallback, so a single corrupt task can't block its own drawer Both tests force task_age to raise RuntimeError rather than ValueError on '%s', because ValueError is absorbed by _safe_int and never reaches the outer try/except -- testing that path would only re-cover what test_kanban_db.py already pins. Manually verified the regression discipline: git checkout `061a1830`^ -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both FAIL with 500 git checkout HEAD -- plugins/kanban/dashboard/plugin_api.py pytest -k task_age_exception # both PASS	2026-05-18 20:12:52 -07:00
LeonSGP43	c91ad90bff	test(kanban): cover default board dashboard pin	2026-05-18 20:11:43 -07:00
kshitijk4poor	c74ff2c8ef	fix(browser): self-review pass — dead-import, log levels, future-proofing Addresses findings from two self-review passes pre-merge. First pass (3-agent parallel review): 1. plugins/browser/browser_use/provider.py: drop the ``_ = managed_nous_tools_enabled`` dead-import-hider in _get_config_or_none(). The import was actively misleading — the helper IS used in _get_config() (separate method, separate import), not here. The "keep static analysis happy" comment was wrong about what the helper does in this scope. 2. agent/browser_provider.py: drop ``pragma: no cover`` from is_configured() / provider_name() backward-compat aliases. They ARE covered by ``TestLegacyAbcAliases`` — the pragma would have masked future regressions. 3. tools/browser_tool.py: refactor _is_legacy_provider_registry_overridden() to compare against a module-frozen _DEFAULT_PROVIDER_REGISTRY snapshot instead of hardcoded set of 3 keys. Future maintainers adding a 4th built-in provider now just extend _PROVIDER_REGISTRY; the override detection adapts automatically. Previously the hardcoded ``set(...) != {"browserbase", "browser-use", "firecrawl"}`` would flip True forever on any 4-key registry, silently routing every install onto the legacy fixture path. 4. tools/browser_tool.py: when explicit ``browser.cloud_provider`` is set but the registry has no matching plugin (typo, uninstalled plugin, discovery failure), emit a WARNING with actionable text instead of silently falling through to auto-detect. Legacy code surfaced a typed credentials error via direct class instantiation; this log restores the signal in the post-migration path. 5. agent/browser_registry.py: trim the triple-redundant _LEGACY_PREFERENCE documentation. Module docstring + 13-line block-comment + 5-line inline comment was repeating the same point. Kept the docstring and trimmed the block-comment to 5 lines. 6. agent/browser_registry.py: upgrade is_available()-raised logging from DEBUG to WARNING with exc_info=True. A provider's availability check throwing is unusual enough that users debugging "no cloud provider" need the traceback in logs. 7. tests/plugins/browser/check_parity_vs_main.py: drop dead top-level imports (os, shutil, tempfile — only referenced inside the SUBPROCESS_SCRIPT string literal that runs in a child process). Second pass (architecture + claim-verification review): 8. tools/browser_tool.py: rewrite the inline comment in _get_cloud_provider auto-detect branch. Prior text claimed it "routes through the plugin registry's legacy preference walk so third-party plugins still get a chance to be selected when they're explicitly configured" — false on both counts. The branch uses module-level legacy class aliases (BrowserUseProvider / BrowserbaseProvider) directly; third-party plugins are intentionally reachable only via explicit ``browser.cloud_provider``. Corrected comment now matches behaviour and cross-references _LEGACY_PREFERENCE for the firecrawl gate rationale. 9. tools/browser_tool.py + tests/tools/test_managed_browserbase_and_modal.py: drop the unused ``get_active_browser_provider as _registry_get_active_browser_provider`` alias from the ``from agent.browser_registry import ...`` block. It was never referenced; matching test-stub line in the agent.browser_registry SimpleNamespace also dropped. ``get_provider`` is still imported (used by the explicit-config dispatch path at line 535). 10. plugins/browser/firecrawl/provider.py: align emergency_cleanup() with the early-guard pattern used in browserbase + browser_use plugins. Previously firecrawl tried the DELETE and relied on ``_headers()`` raising ValueError to trip a "missing credentials" warning; same final outcome but a different control flow that read like a bug to a maintainer skimming the three modules. Now: if is_available() is False, log+return early — identical shape to the other two providers. Verification: 54/54 unit tests + 13/13 parity scenarios still pass.	2026-05-17 04:04:15 -07:00
kshitijk4poor	1bb6f03724	fix(browser): ensure plugin discovery before registry lookup; parity harness Two changes that go together: 1. tools/browser_tool.py — add _ensure_browser_plugins_loaded() and call it from _get_cloud_provider() before consulting the registry. Normally model_tools triggers discover_plugins() as an import side-effect, but _get_cloud_provider() can be reached from contexts that haven't gone through model_tools (standalone scripts, certain unit-test paths, the new parity-sweep harness). Without the defensive call, the registry is empty and _registry_get_browser_provider() returns None — silently downgrading users to local mode when they explicitly configured a cloud provider with no credentials yet. The behavior-parity sweep below caught this as 4 scenario regressions (explicit-X-no-creds for all 3 providers, and explicit-firecrawl-with-creds). 2. tests/plugins/browser/check_parity_vs_main.py — subprocess harness that pins one Python invocation to origin/main and one to this PR's worktree via sys.path.insert(), runs _get_cloud_provider() across a 13-scenario config matrix, and diffs the reduced shape tuple (is_local, provider_name, is_available). Provider_name pulls from provider.provider_name() which is the legacy CloudBrowserProvider API and remains as a backward-compat alias on the new BrowserProvider ABC, so the comparison is apples-to-apples regardless of class identity. Final result: PARITY OK across 13 scenarios. The four observable config/credential matrices that exercise the dispatcher all match origin/main bit-for-bit: - no-config + no-env → local - explicit local + any env → local - explicit BB / BU / FC + no creds → provider returned with is_available()==False (so dispatcher surfaces typed credentials error; matches main exactly) - explicit BB / BU / FC + creds → provider returned with is_available()==True - no-config + BU creds → Browser Use - no-config + BB creds → Browserbase - no-config + both → Browser Use (legacy walk first hit) - no-config + FC only → local (firecrawl NOT in legacy walk) - no-config + FC + BB → Browserbase (legacy walk skips firecrawl) Per the dev skill's "behavior-parity for refactor PRs" rule — without this subprocess sweep, 31/31 unit tests pass while the production code path is silently broken for users who type `browser.cloud_provider: browserbase` and run a single browser command without prior model_tools import. Caught + fixed before push.	2026-05-17 04:04:15 -07:00
kshitijk4poor	fec0a0da98	test(plugins/browser): coverage for the 3-plugin migration Mirrors tests/plugins/web/test_web_search_provider_plugins.py from PR #25182. 31 tests across 5 classes: TestBundledPluginsRegister (8 tests) - Three plugins register (browserbase, browser-use, firecrawl) - Each plugin's name + display_name accessible - get_setup_schema() returns picker-shaped dict with post_setup hook - All three lifecycle methods (create_session, close_session, emergency_cleanup) overridden on every plugin TestIsAvailable (4 tests) - browserbase needs BOTH BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID - browserbase: api_key alone or project_id alone insufficient - browser-use satisfied by BROWSER_USE_API_KEY - firecrawl satisfied by FIRECRAWL_API_KEY TestRegistryResolution (8 tests) — most valuable, locks down pre-migration semantics: - _resolve(None) with no creds returns None (local mode) - _resolve('local') short-circuits to None - _resolve('browserbase') returns provider even when unavailable (so dispatcher surfaces typed credentials error) - _resolve('firecrawl') same: explicit-config wins - _resolve('unknown') falls through to auto-detect - Legacy walk picks browser-use over browserbase - browserbase-only configuration: browserbase wins - Regression: firecrawl is NEVER auto-selected even when single-eligible (preserves pre-migration gate; FIRECRAWL_API_KEY shared with web firecrawl must not silently route to paid cloud browser) TestLegacyAbcAliases (6 tests) - is_configured() delegates to is_available() for all three plugins - provider_name() returns display_name for all three plugins TestPickerIntegration (3 tests) - _plugin_browser_providers() exposes all three plugins as rows - Each row carries post_setup='agent_browser' - browser_plugin_name marker matches browser_provider All tests use real imports — no mocking of provider classes — so the suite catches drift in the ABC, registry, picker injection, and plugin glue layer simultaneously. 31/31 passing.	2026-05-17 04:04:15 -07:00
kshitij	5fba236644	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 ) Six days after #23937 (608 fixes) the codebase had accumulated 241 new PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix, same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the two are semantically equivalent for hashable scalar membership tests. All 241 instances fixed via `ruff check --select PLR6201 --fix --unsafe-fixes`, zero remaining. Every changed value is a hashable scalar (str/int/None/enum/signal); no risk of unhashable runtime errors. No behavior change. Test plan: - 119 files changed, +244/-244 (net zero) — exactly one-line edits - `ruff check` clean afterward - Compile checks pass on the largest touched files (cli.py, run_agent.py, gateway/run.py, gateway/platforms/discord.py, model_tools.py) - Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/ tests/tools/: 18187 passed, 59 pre-existing failures (verified against origin/main with the same shape — identical failure count, identical category — all xdist test-order flakes unrelated to this change) Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).	2026-05-17 02:29:41 -07:00
teknium1	773a0faca0	fix(deepseek): set default_aux_model on profile so aux warning stops firing Closes #26924 (and supersedes #26926) in spirit. DeepSeek was missing `default_aux_model` on its `ProviderProfile`, so `_get_aux_model_for_provider("deepseek")` returned an empty string and the compression / vision / session-search paths emitted "No auxiliary LLM provider configured -- context compression will drop middle turns without a summary." on every DeepSeek session, even when the user had perfectly working DeepSeek credentials. Fix lands at the profile layer rather than the legacy `_API_KEY_PROVIDER_AUX_MODELS_FALLBACK` dict the original PR targeted. Every modern provider (gemini, zai, minimax, anthropic, kimi-coding, stepfun, ollama-cloud, gmi, novita, kilocode, ai-gateway, opencode-zen) sets `default_aux_model` on its `ProviderProfile`; the fallback dict only exists for providers that predate the profiles system. Tests added under `tests/plugins/model_providers/test_deepseek_profile.py`: - `test_profile_advertises_deepseek_chat` -- pins the profile attribute - `test_consumer_api_returns_deepseek_chat` -- pins the consumer API behavior - `test_consumer_api_returns_non_empty` -- regression guard for the symptom in the issue Original diagnosis and aux-model choice from @kriscolab in PR #26926; moved one layer up. Co-authored-by: kriscolab <71590782+kriscolab@users.noreply.github.com>	2026-05-16 22:54:22 -07:00
teknium1	cd9470f416	fix(deepseek): wire thinking-mode via DeepSeekProfile, not legacy fallback The cherry-picked PR #15251 from @tw2818 correctly identified the DeepSeek 400 root cause but placed the fix in the legacy fallback path of `build_kwargs`, which DeepSeek never reaches — DeepSeek has a registered ProviderProfile and goes through `_build_kwargs_from_profile` instead. The legacy-path block was therefore dead code. This commit pivots the fix to where it actually fires: - New `DeepSeekProfile` in `plugins/model-providers/deepseek/__init__.py` overrides `build_api_kwargs_extras` to emit DeepSeek's expected wire format (mirrors `KimiProfile`): {"reasoning_effort": "<low\|medium\|high\|max>", "extra_body": {"thinking": {"type": "enabled" \| "disabled"}}} - Model gating: only `deepseek-v4-*` and `deepseek-reasoner` emit thinking control. `deepseek-chat` (V3) is untouched — current behavior. - Effort mapping: low/medium/high passthrough, xhigh/max → max, unset → omitted (DeepSeek server applies its own default). - Revert the legacy-path additions from PR #15251 — they were dead code, and the `_copy_reasoning_content_for_api` strip block specifically would have nullified the existing reasoning_content padding machinery (`_needs_deepseek_tool_reasoning` → space-pad on replay) that the active provider already relies on for replay correctness. - Unit tests pin the wire-shape contract and the model gating rules (26 tests, all passing). Existing transport + provider profile suites (321 tests) continue to pass. - AUTHOR_MAP: map twebefy@gmail.com → tw2818 for release notes credit. Closes #15700, #17212, #17825. Co-authored-by: tw2818 <twebefy@gmail.com>	2026-05-15 17:03:26 -07:00
Jaaneek	b62c997973	feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider Adds a new authentication provider that lets SuperGrok subscribers sign in to Hermes with their xAI account via the standard OAuth 2.0 PKCE loopback flow, instead of pasting a raw API key from console.x.ai. Highlights ---------- * OAuth 2.0 PKCE loopback login against accounts.x.ai with discovery, state/nonce, and a strict CORS-origin allowlist on the callback. * Authorize URL carries `plan=generic` (required for non-allowlisted loopback clients) and `referrer=hermes-agent` for best-effort attribution in xAI's OAuth server logs. * Token storage in `auth.json` with file-locked atomic writes; JWT `exp`-based expiry detection with skew; refresh-token rotation synced both ways between the singleton store and the credential pool so multi-process / multi-profile setups don't tear each other's refresh tokens. * Reactive 401 retry: on a 401 from the xAI Responses API, the agent refreshes the token, swaps it back into `self.api_key`, and retries the call once. Guarded against silent account swaps when the active key was sourced from a different (manual) pool entry. * Auxiliary tasks (curator, vision, embeddings, etc.) route through a dedicated xAI Responses-mode auxiliary client instead of falling back to OpenRouter billing. * Direct HTTP tools (`tools/xai_http.py`, transcription, TTS, image-gen plugin) resolve credentials through a unified runtime → singleton → env-var fallback chain so xai-oauth users get them for free. * `hermes auth add xai-oauth` and `hermes auth remove xai-oauth N` are wired through the standard auth-commands surface; remove cleans up the singleton loopback_pkce entry so it doesn't silently reinstate. * `hermes model` provider picker shows "xAI Grok OAuth (SuperGrok Subscription)" and the model-flow falls back to pool credentials when the singleton is missing. Hardening --------- * Discovery and refresh responses validate the returned `token_endpoint` host against the same `.x.ai` allowlist as the authorization endpoint, blocking MITM persistence of a hostile endpoint. Discovery / refresh / token-exchange `response.json()` calls are wrapped to raise typed `AuthError` on malformed bodies (captive portals, proxy error pages) instead of leaking JSONDecodeError tracebacks. * `prompt_cache_key` is routed through `extra_body` on the codex transport (sending it as a top-level kwarg trips xAI's SDK with a TypeError). * Credential-pool sync-back preserves `active_provider` so refreshing an OAuth entry doesn't silently flip the active provider out from under the running agent. Testing ------- * New `tests/hermes_cli/test_auth_xai_oauth_provider.py` (~63 tests) covers JWT expiry, OAuth URL params (plan + referrer), CORS origins, redirect URI validation, singleton↔pool sync, concurrency races, refresh error paths, runtime resolution, and malformed-JSON guards. * Extended `test_credential_pool.py`, `test_codex_transport.py`, and `test_run_agent_codex_responses.py` cover the pool sync-back, `extra_body` routing, and 401 reactive refresh paths. * 165 tests passing on this branch via `scripts/run_tests.sh`.	2026-05-15 12:11:32 -07:00
kshitij	db84a78e61	fix(langfuse): complete observability fix — trace I/O, tool outputs, placeholder credentials (closes #22342 , #22763 ) (#26320 ) * fix(langfuse): reject placeholder credentials with one-shot warning When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template value like 'placeholder', 'test-key', or 'your-langfuse-key', the Langfuse SDK silently accepts the credentials at construction time and drops every trace at flush time. No warning, no error — just an empty Langfuse dashboard the operator only notices hours later. Add prefix-based validation in _get_langfuse() against the documented 'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side. Anything else fires a single warning naming the offending env var(s) with a log-safe value preview (full string for short placeholders so the operator knows which template they left in place; truncated for long values so a real secret pasted into the wrong field never hits the log), then short-circuits via the existing _INIT_FAILED cache so the warning fires once per process, not once per hook invocation. The check sits after the 'Langfuse is None' SDK-installed guard so hosts without the optional langfuse SDK don't see misleading 'set real keys' hints when the actionable fix is 'pip install langfuse'. Missing credentials remains the documented opt-out path and stays silent — no log noise for unconfigured installs. Fixes #22763 Fixes #23823 * fix(langfuse): use actual API request messages for generation input on_pre_llm_request previously used the messages kwarg alone, which could be None when Hermes passes the payload via request_messages, conversation_history, or user_message instead. Add _coerce_request_messages to pick the first available list across all variants, falling back to a synthetic user message. Generations now show the real outbound payload rather than an empty input. * fix(langfuse): record tool call outputs in traces Tool observations showed input (arguments) but output was always undefined. Root cause: when tool_call_id is empty, pre_tool_call stored observations under a unique time-based key that post_tool_call could never reconstruct, so every tool span was closed without output by the _finish_trace sweep. Fix pre/post matching by routing empty-tool_call_id tools through a per-name FIFO queue (pending_tools_by_name) instead of the time-based key. Tools with a tool_call_id continue to use the id-keyed dict. Also: - Preserve OpenAI-style nested function shape in serialized tool calls so Langfuse renders name/arguments correctly - Keep name + tool_call_id on role:tool messages for proper pairing - Backfill tool results onto the matching turn_tool_calls entry so the generation's tool-call record carries the result alongside arguments - Coerce request messages from whichever field the runtime provides (request_messages, messages, conversation_history, user_message) * fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test Self-review of the combined #22345 + #23831 salvage surfaced three issues worth fixing in the same PR rather than as follow-ups: 1. Drop is_first_turn from the pre_api_request hook. The boolean expression `not bool(conversation_history)` was wrong: conversation_history is reassigned to None mid-run after compression (5 sites in run_agent.py), so the value flips False -> True mid-conversation on every post-compression API call. The langfuse plugin never consumed it, so the kwarg was both misleading AND dead. 2. Replace copy.deepcopy(request_messages) with shallow list() copy. The pre_api_request hook contract discards return values (invoke_hook never writes back to api_kwargs), and the langfuse plugin's _serialize_messages already builds its own snapshot dicts via _safe_value. A deepcopy on every API call would walk every tool result and base64 image — significant overhead for no real isolation benefit. Shallow copy of the outer list protects against later mutations of api_messages without paying for the inner-dict walk. 3. Rename test_empty_tool_call_id_concurrent_fifo_order -> test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8 threads behind a barrier to actually exercise _STATE_LOCK on the pending_tools_by_name queue. The original test was sequential and only validated Python list semantics; this one validates the lock discipline. 4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED — that function does not exist. Tests reload the module via sys.modules.pop + importlib.import_module instead. Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass. --------- Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: Brian Conklin <brian@dralth.com>	2026-05-15 05:04:02 -07:00
binhnt92	63991bbd97	fix(memory): skip OpenViking upload symlinks	2026-05-14 07:48:03 -07:00
kshitijk4poor	21e3a863bb	feat(web): firecrawl plugin natively supports crawl; delete legacy inline path The web-provider migration originally left firecrawl crawl as the only provider-specific code remaining inline in tools/web_tools.py (~250 lines of Firecrawl-specific crawl orchestration that didn't fit the plugin's existing surface). This commit closes that gap. What this adds -------------- 1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)`` - Accepts the same kwargs as the dispatcher passes to any crawl provider (``instructions``, ``depth``, ``limit``); Firecrawl's /crawl endpoint ignores ``instructions`` and ``depth`` so we log and drop with a clear info message. - Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the gateway event loop isn't blocked on a multi-page crawl. - Preserves the response-shape normalization across pydantic / typed-object / dict variants that the legacy inline code did. - Preserves per-page website-policy re-check (catches blocked redirects after the SDK returns). - Returns the same {"results": [...]} shape so the dispatcher's shared LLM-summarization post-processing path works unchanged. - Sets supports_crawl() to True so the dispatcher routes through the plugin instead of the legacy fallthrough. 2. tools/web_tools.py: delete the entire legacy firecrawl crawl block that used to run after "No registered provider supports crawl" — ~270 lines including: - check_firecrawl_api_key gate + typed error - inline SSRF + website-policy seed-URL gate (dispatcher already does this) - Firecrawl client setup with crawl_params - 100+ lines of pydantic/dict/typed-object normalization - Per-page LLM-processing loop (kept in the dispatcher's shared post-processing path; that's where it always belonged) - trimming + base64 image cleanup (still done in the dispatcher's shared path) Replaced with a single typed-error branch when no crawl-capable provider is available: "web_crawl has no available backend. Set FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set TAVILY_API_KEY for Tavily." Test updates ------------ - tests/tools/test_website_policy.py: - test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL gate still runs on web_tools.check_website_access (no change to that patch), but the firecrawl client lockdown moved to the plugin module — patch firecrawl_provider._get_firecrawl_client instead of web_tools._get_firecrawl_client. The dispatcher short-circuits before the plugin runs, so the test still passes. - test_web_crawl_blocks_redirected_final_url: patch the per-page policy gate at plugins.web.firecrawl.provider.check_website_access (where it now runs) AND on web_tools (where the seed-URL gate still runs). Patch firecrawl_provider._get_firecrawl_client for the FakeCrawlClient injection. Both checks flow through the same fake_check function. - tests/plugins/web/test_web_search_provider_plugins.py: - Update parametrized capability-flag spec: firecrawl supports_crawl is now True. - Add test_firecrawl_crawl_returns_error_dict_when_unconfigured — verifies inspect.iscoroutinefunction(p.crawl) is True and that the async crawl returns a per-page error dict (not a raise) when FIRECRAWL_API_KEY is missing. Verified -------- - 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl crawl test from this commit = 218 with the test deduplication). - Compile-clean (py_compile passes on both files). - Provider capabilities matrix confirmed end-to-end: name search extract crawl async-extract? async-crawl? firecrawl True True True True True tavily True True True False False Both crawl-capable providers exercise the dispatcher's inspect.iscoroutinefunction async-or-sync detection. Net diff -------- - tools/web_tools.py: -254 lines (legacy inline crawl gone) - plugins/web/firecrawl/provider.py: +185 lines (crawl method) - test_website_policy.py: +14/-9 lines (patch locations) - test_web_search_provider_plugins.py: +22/-1 lines (capability flag + new firecrawl crawl test) - Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763 before this commit, 2227 before the migration started).	2026-05-13 22:31:28 -07:00
kshitijk4poor	e8cee87e85	test(plugins): tests/plugins/web/ — coverage for the 7-plugin migration Adds 44 focused tests under tests/plugins/web/ covering the surface that the PR #25182 web-provider migration introduced. Complements the existing tests/tools/ coverage which is dispatcher-centric; this file is plugin-centric and tests each plugin + the registry directly. Test classes (44 tests, ~1.1s on 4 workers) ------------------------------------------- TestBundledPluginsRegister (16 tests) - All seven plugins present in the registry after _ensure_plugins_discovered() - Per-plugin parametrized capability-flag assertions (brave-free / ddgs / searxng: search-only; exa / parallel / firecrawl: search + extract; tavily: search + extract + crawl) - Every plugin exposes name + display_name properties - Every plugin returns a picker-compatible get_setup_schema() dict TestIsAvailable (7 tests) - Each premium plugin reports is_available()==False when its env var is absent and True once set (brave-free / searxng / tavily / exa / parallel) - firecrawl recognizes either FIRECRAWL_API_KEY or FIRECRAWL_API_URL as a "configured" signal - ddgs is the always-on fallback and must not raise from is_available() TestRegistryResolution (4 tests) - Option B semantics validated end-to-end: 1. Explicit configured provider wins even when is_available()==False (dispatcher surfaces typed credential errors, no silent switch) 2. Unknown/typo name falls back to first available legacy-preference provider 3. Asking for extract via a search-only backend falls back to an extract-capable available provider (capability-incompatible branch in _resolve()) 4. No config + no credentials → None (or ddgs if installed) TestAsyncExtractDispatch (4 tests) - parallel + firecrawl extract() are coroutine functions (async path in dispatcher uses await) - exa + tavily extract() are sync (dispatcher wraps in asyncio.to_thread) TestErrorResponseShapes (7 tests) - Plugins return typed error dicts (success=False + "error" key) when credentials are missing, never raise - async extract() returns list of per-URL error dicts - tavily crawl() returns {"results": [{"error": ...}]} on missing credentials Design notes ------------ - All tests use real imports of plugin modules — no mocking of provider classes themselves — so they catch drift in the ABC, registry, and glue layer simultaneously. Per the hermes-agent-dev skill's E2E testing guidance. - The autouse _isolate_env fixture clears every web-provider env var before each test so is_available() reflects the test's setup. - Resolution tests use the lower-level _resolve() directly rather than rebuilding the HERMES_HOME config dance — same observable behavior, no sys.modules.pop side-effects that would break the ABC isinstance check inside ctx.register_web_search_provider().	2026-05-13 22:31:28 -07:00
Teknium	9d42c2c286	feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 ) * feat(video_gen): unified video_generate tool with pluggable provider backends One core video_generate tool, every backend a plugin. Mirrors the image_gen + memory_provider + context_engine architecture: ABC, registry, plugin-context registration hook, and per-plugin model catalogs surfaced through hermes tools. Surface (one schema, every backend): - operation: generate / edit / extend - modalities: text-to-video (prompt only), image-to-video (prompt + image_url), video edit (prompt + video_url), video extend (video_url) - reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model override - Providers ignore unknown kwargs and declare what they support via VideoGenProvider.capabilities() — backend-specific quirks stay in the backend, the agent learns one tool Backends shipped: - plugins/video_gen/xai/ — Grok-Imagine, full generate/edit/extend + image-to-video + reference images (salvaged from PR #10600 by @Jaaneek, reshaped into the plugin interface) - plugins/video_gen/fal/ — Veo 3.1 (t2v + i2v), Kling O3 i2v, Pixverse v6 i2v with model-aware payload building that drops keys a model doesn't declare Wiring: - agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation, success_response / error_response, save_b64_video / save_bytes_video, $HERMES_HOME/cache/videos/ - agent/video_gen_registry.py — thread-safe register/get/list + get_active_provider() reading video_gen.provider from config.yaml - hermes_cli/plugins.py — PluginContext.register_video_gen_provider() - hermes_cli/tools_config.py — Video Generation category in hermes tools, plugin-only providers list, model picker per plugin, config write to video_gen.{provider,model} - toolsets.py — new video_gen toolset - tests: 31 new tests covering ABC, registry, tool dispatch, both plugins - docs: developer-guide/video-gen-provider-plugin.md (parallel to the image-gen guide), sidebar + toolsets-reference + plugin guides updated Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse), #10458 (provider categories), #10786 (xAI media+search bundle), #2984 (FAL duplicate), #19086 (Google Veo standalone — easy port to plugin interface). Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): dynamic schema reflects active backend's capabilities Address the 'capability variance' question — instead of one tool with a static schema that lies about what every backend supports, the video_generate tool now rebuilds its description at get_definitions() time based on the configured video_gen.provider and video_gen.model. The agent sees backend-specific guidance up-front: - 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is REQUIRED; text-only prompts will be rejected' - 'fal-ai/veo3.1' (t2v): no image_url restriction shown - xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7 reference_image_urls' - Backends without edit/extend: 'not supported on this backend — surface that they need to switch backends via hermes tools' This is the same pattern PR #22694 used for delegate_task self-capping — documented in the dynamic-tool-schemas skill. Cache invalidation is free: get_tool_definitions() already memoizes on config.yaml mtime, so a mid-session backend swap rebuilds the schema automatically. Tested: - Empirical FAL OpenAPI schema check confirms image-to-video models require image_url (FAL returns HTTP 422 otherwise) — client-side rejection in FALVideoGenProvider.generate() now prevents the wasted round-trip - Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches - 6 new tests cover the builder (no config / image-only / full-surface / text-only / unknown provider / registry wiring), all passing - 37/37 in the slice, 134/134 in the broader regression set * test(video_gen/xai): full surface integration tests + cleaner schema Verified end-to-end that the xAI plugin handles every documented mode from PR #10600's surface: text-to-video, image-to-video, reference-images-to-video, video edit, video extend (with and without prompt). All five modes route to the correct xAI endpoint (/videos/generations, /videos/edits, /videos/extensions) with the right payload shape (image / reference_images / video keys), and all five client-side rejections fire before the network: edit-without-prompt, extend-without-video_url, image+refs conflict, >7 references, and duration/aspect_ratio clamping. 15 new integration tests grouped into four classes (endpoint routing, modalities, validation, clamping). httpx is stubbed via a small fake AsyncClient that records POSTs so the tests assert the actual payload the plugin would send to xAI — not just the success/error envelope. Also cleaned up a description redundancy: when a model's operations match the backend's overall set, we no longer print the duplicate 'operations supported by this model' line. xAI's description now reads: Active backend: xAI . model: grok-imagine-video - operations supported by this backend: edit, extend, generate - modalities supported by this backend: image, reference_images, text - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16 - resolution choices: 480p, 720p - duration range: 1-15s - reference_image_urls: up to 7 images Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing Two design changes per Teknium: 1) Drop edit/extend from the tool surface entirely. Only text-to-video and image-to-video remain. The agent sees a clean tool with two modalities; backend-specific quirks like xAI's edit/extend endpoints stay out of the unified schema. 2) FAL: pick a model FAMILY once, the plugin routes between the family's text-to-video and image-to-video endpoints based on whether image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND 'fal-ai/veo3.1/image-to-video' as separate options — they pick 'veo3.1', and the plugin handles the rest. Catalog rewritten as families: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video / fal-ai/kling-video/o3/standard/image-to-video xAI uses a single endpoint (/videos/generations) for both modes, routed by the presence of the 'image' field in the payload — no edit/extend exposure. Schema changes: - VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params: prompt (required), image_url, reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model. - VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS, DEFAULT_OPERATION. capabilities() drops 'operations' key. - success_response: add 'modality' field ('text' \| 'image') so the agent and logs can see which endpoint was actually hit. Dynamic schema builder simplified — no operations bullet, no 'switch backends if you need edit/extend' guidance. When the active backend supports both modalities (the common case), description reads: Active backend: FAL . model: pixverse-v6 - supports both text-to-video (omit image_url) and image-to-video (pass image_url) - routes automatically - aspect_ratio choices: 16:9, 9:16, 1:1 - resolution choices: 360p, 540p, 720p, 1080p - duration range: 1-15s - audio: pass audio=true to enable native audio (pricing tier) - negative_prompt: supported Tests: 51 in the video_gen slice, 216 across the broader image+video sweep, all passing. New FAL routing tests prove pixverse-v6 + no image hits text-to-video endpoint, pixverse-v6 + image_url hits image-to-video endpoint, same for veo3.1 and kling-o3-standard. Docs updated: developer-guide page rewrites the 'model families' pattern as a first-class section so external plugin authors know the convention. toolsets-reference and toolsets.py descriptions match the new surface. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers Catalog now covers everything Teknium specced from FAL: Cheap tier: ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / image-to-video Premium tier: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video seedance-2.0 bytedance/seedance-2.0/text-to-video / image-to-video kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / image-to-video happy-horse fal-ai/happy-horse/text-to-video / image-to-video DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane defaults, both modalities) — better first-run UX for users who haven't explicitly picked a model. New family-entry knob: image_param_key. Kling v3 4K's image-to-video endpoint expects start_image_url instead of image_url; declaring image_param_key='start_image_url' on the family lets _build_payload remap correctly. Other families default to plain image_url. Per-family capability flags reflect each model's docs: - LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution enum exposed by FAL — let endpoint apply defaults) - Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported, negative prompts NOT supported per docs - Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative - Veo 3.1: unchanged, 16:9/9:16, 4/6/8s Tests: +5 covering the new families (full catalog, Kling 4K start_image_url remap, Seedance routing, LTX payload minimality, Happy Horse minimality). 56/56 in the slice green. Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes already has a direct xAI plugin that talks to xAI's own API; routing the same model through FAL's wrapper would duplicate the surface without adding capabilities. Users on FAL who want Grok-Imagine should use the xAI plugin directly; flag if you want both routes available. * test(video_gen): tool-surface routing matrix — every model x modality End-to-end matrix test driven through _handle_video_generate() — the actual function the agent's video_generate tool call lands in. Writes config.yaml, invokes the registered handler with a raw args dict, then asserts the outbound HTTP/SDK call hit the right endpoint with the right payload shape. Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new families as they're added (add a family to FAL_FAMILIES and you get both modalities tested for free). Coverage: - All 6 FAL families x {text-only, text+image} = 12 cases - xAI x {text-only, text+image} = 2 cases - tool-level model= arg overrides config = 2 cases For each case, verifies: - result['success'] is True - result['modality'] matches input shape ('text' if no image_url, 'image' otherwise) - outbound endpoint URL matches the family's text_endpoint or image_endpoint - text-only payloads carry no image-shaped keys - text+image payloads carry the family's image key (image_url for most, start_image_url for kling-v3-4k, wrapped 'image' object for xAI) All 16 cases passing. Confirms the tool surface routes every (provider, model, modality) combination correctly with zero leakage. * feat(video_gen): keep video_gen out of first-run setup, surface in status Two changes: 1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in the first-run toolset checklist. Video gen is niche, paid, and slow — most users don't want it nagging them during initial setup. Anyone who wants it opts in via 'hermes tools' -> Video Generation, which already routes to the provider+model picker. 2. The 'hermes setup' status panel learns about video_gen — but only shows the row when a plugin reports available. Users without FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of those keys see 'Video Generation (FAL) ✓' as confirmation it's wired. Verified live: - Fresh install (no creds): zero video_gen mentions in wizard. - With FAL_KEY: status row appears with active backend name. - 160/160 in the setup + tools_config + video_gen test slice. Rationale: image_gen is on by default because it's a featured creative tool used in casual chat (telegrams, etc). Video gen is heavier — long wait, paid per-second pricing. Default-off matches user intent better. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-05-13 16:39:41 -07:00
eloklam	b60462a205	test(kanban): remove stale t.summary assertion from search test Task.summary was never a real field; latest_summary already covers it. Matches the haystack cleanup in commit f3015e6ab.	2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam	0ea234e093	feat(kanban): dashboard batch QOL upgrade - Shift-click range selection, column select-all, select-all-visible - Multi-card drag/drop via selectedIds + /tasks/bulk - Expanded bulk actions: todo/ready/blocked/unblock/complete/archive, priority setter, reassign with reclaim_first checkbox - Partial failure card highlight (failedIds + hermes-kanban-card--failed) - Search expanded to body, result, latest_summary, summary - Clear filters button + reset all filters on board switch - Accessibility: larger checkbox hit target, tabIndex/role/aria-label, Enter/Space/Esc keyboard handlers - Fix temporal-dead-zone bug: move clearSelected before moveSelected	2026-05-10 21:44:37 -07:00
LeonSGP43	673418dfa1	fix(kanban): reject toolset names in task skills	2026-05-10 08:41:28 -07:00
Teknium	a91e5a8759	feat(kanban-dashboard): native <details> collapse + skip empty metadata Two follow-up improvements to Tranquil-Flow's metadata-panel restyle. Both stay within the parent PR's "tone down the panel" scope. 1. Native <details>/<summary> collapse for verbose metadata. The parent PR consciously deferred this ("adding native expand/collapse would be the next step but requires UX agreement"). The default they asked for is straightforward: collapsed when the rendered JSON exceeds 300 chars (the threshold where the max-height: 8.5rem cap actually starts mattering), expanded otherwise. <details>/<summary> is the right primitive — zero JS, browser-handled state, accessible by default (keyboard-navigable, screen-reader announces the disclosure state), and survives any react-state churn for free. The OS-default disclosure marker is suppressed (list-style: none + ::-webkit-details-marker hidden) and replaced with a CSS ::before chevron that rotates 90deg on the [open] attribute, so the look is consistent across Firefox/WebKit/Blink without the double-marker that would otherwise appear on the platforms that still render the default triangle. 2. Skip rendering when metadata is an empty object. `r.metadata && ...` truthy-checks, but `{}` is truthy in JS — so a completed task with no actual metadata would render a "Metadata" labeled disclosure block containing literal `{}`. Adds an Object.keys(r.metadata).length > 0 guard so empty payloads render nothing instead of an empty disclosure stub. Tests: three new static-asset assertions covering the <details> shape, the empty-object skip, and the suppress-default-marker + animated-chevron CSS — all in `tests/plugins/test_kanban_dashboard_plugin.py`.	2026-05-10 08:30:42 -07:00
Tranquil-Flow	0e0ddaac8f	fix(kanban-dashboard): tone down completed-run metadata panel (#19548 ) Hand-rebased onto current main from PR #19980; the original branch was stale against main (~6 unrelated dashboard fixes had landed since), so applying the PR's dist files directly would have silently reverted them. The run-history panel in the task drawer rendered each completed run's `metadata` field as a `<code class="hermes-kanban-run-meta">` containing `JSON.stringify(r.metadata)` — a single unindented monoline. With `white-space: pre-wrap` and a monospace font, a writer task's metadata (changed_files paths, source URLs, generated-artifact details) wrapped into a tall block of code-ish text that filled the parent run row. The container's faint `--color-foreground 3%` background then made the whole thing read like a crash dump even though the run completed normally. Restyle and label, no interactivity changes: - Wrap the meta payload in a `.hermes-kanban-run-meta-block` sub-block with an explicit `Metadata` label (small, uppercase, muted) so the panel reads as auxiliary detail at a glance. - Pretty-print the JSON (`indent=2`) so the structure is scannable instead of a wall of monoline text. - Cap `.hermes-kanban-run-meta` at `max-height: 8.5rem; overflow: auto` so a verbose blob scrolls inside its own pane rather than swamping the run row. - Sub-block uses a thin `border-left` rule and `background: transparent` — distinct from the destructive-tinted treatment used by crashed / timed_out / blocked / spawn_failed runs higher in the same file. Tests: two new static-asset assertions in `tests/plugins/test_kanban_dashboard_plugin.py` lock in the rendered shape (the plugin ships built-only, no src/).	2026-05-10 08:30:42 -07:00
Teknium	08c5b35a73	test(kanban-dashboard): pin assignee-casing static-asset regressions + AUTHOR_MAP Follow-up to the previous commit's casing fix. The original PR shipped the dist edits without test coverage. The contributor's reasoning (UI-only attributes in a pre-built JS bundle, nothing meaningful to unit-test) is fair, but a static-asset assertion catches the most likely regression vector — a future rebuild of the dist bundle that loses the attributes — at near-zero cost. Adds two regression tests in tests/plugins/test_kanban_dashboard_plugin.py: - test_dashboard_assignee_inputs_preserve_casing — reads dist/index.js and asserts autoCapitalize="none", autoCorrect="off", spellCheck=false, and textTransform="none" each appear at least twice (one per assignee input — inline triage/lane create + task-edit panel). - test_dashboard_lane_head_preserves_assignee_casing — reads dist/style.css and asserts the .hermes-kanban-lane-head rule body does NOT contain text-transform: uppercase. Locates the rule by marker so unrelated CSS churn nearby doesn't flake the test. Both follow the same shape as the existing test_dashboard_requests_default_board_explicitly static-asset guard from PR #22940's salvage. Also adds the AUTHOR_MAP entry for princepal9120's GitHub-noreply email so release notes credit the right account.	2026-05-10 07:35:01 -07:00

1 2 3

114 commits