The checkbox label echoed its state ("Auto (default)" / "Manual") instead
of describing the action, so a checked box reading "Auto" parsed as a
status indicator rather than a control. The accompanying sub-description
was also static and started with "When on, ...", which read awkwardly
when the box was unchecked.
Replace the dynamic label with a static action label
("Auto-decompose triage tasks") and flip the sub-description between the
two modes so it stays accurate either way. The top-of-page Orchestration
pill is unchanged — that one is intentionally a status badge / toggle.
Fixes#28178
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the kanban auto-decomposer fans a triage task into child tasks,
recompute_ready() immediately promotes parent-free children to 'ready'
so the dispatcher picks them up. Some users want a manual workflow
where children stay in 'todo' for review before dispatch.
Add 'kanban.auto_promote_children' config key (default: true):
- false: children stay in 'todo' after decomposition
- true: existing behavior (auto-promote to 'ready')
Changes:
- kanban_db.py: decompose_triage_task() gains auto_promote param
- kanban_decompose.py: reads auto_promote_children from config
- kanban dashboard API: exposes the new setting in GET/PUT /orchestration
Closes#28016
Switch .hermes-kanban-columns from auto-fit CSS grid to a flex row with
overflow-x: auto and a hidden scrollbar (scrollbar-width / ::-webkit-
scrollbar), and pin .hermes-kanban-column to flex: 0 0 280px so columns
sit side-by-side at a fixed width instead of wrapping into a 2xN grid.
Page vertical scroll is unaffected: each column already caps at
max-height: calc(100vh - 220px), so the container never grows tall
enough to introduce its own vertical scrollbar.
The dashboard SDK's <Select> is a shadcn-style popup that fires
onValueChange(value), not native onChange({target:{value}}). The file
even has a selectChangeHandler() helper at L213 documenting this:
"Older plugin code calls onChange({target:{value}}) which silently
never fires."
#24547 already fixed the bulk-reassign, workspace-kind, and new-task
parent selects. This patch covers the two OrchestrationPanel selects
introduced later in #27572 that regressed onto the same broken pattern:
- OrchestrationPanel orchestrator_profile picker
- OrchestrationPanel default_assignee picker
Users opened the popup, picked an option, and the popup closed without
firing a PUT to /orchestration — so the orchestrator profile and
default assignee dropdowns appeared totally inert.
Uses the same selectChangeHandler helper as the other working Selects
in the file for consistency.
Reported by Exaario.
SDK Select fires onValueChange(value) not onChange({target:{value}}), so
all three bare onChange handlers silently received undefined from e.target.
Replace raw onChange with selectChangeHandler() — the existing helper that
wires both onValueChange and a guarded onChange — so selections register
regardless of which event the SDK Select dispatches.
Closes#24520
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a 'triage_aux_unavailable' diagnostic for tasks stuck in triage when
neither the active aux helper slot nor the main-model auto fallback is usable.
Auto-decompose aware:
- kanban.auto_decompose=True (default): primary is auxiliary.kanban_decomposer,
triage_specifier is the fanout=false fallback.
- kanban.auto_decompose=False: primary is auxiliary.triage_specifier (manual
'hermes kanban specify' path).
Default aux slots use 'provider: auto' which falls back to the main model, so
this rule only fires when both the explicit slot config AND the main-model
auto fallback are absent. Quiet by default; informative when there is a real
config gap.
Also adds kd.config_from_runtime_config() that carries kanban + auxiliary +
model keys through to diagnostics, and updates CLI/dashboard call sites to
use it. config_from_kanban_config() is preserved for back-compat.
Reworks the original PR #25640 idea (@qWaitCrypto) to align with the new
auto-decompose dispatcher path landed in #27572. The original PR pointed only
at auxiliary.triage_specifier, which is now the fallback rather than the
primary helper.
Co-authored-by: qWaitCrypto <axmaiqiu@gmail.com>
* feat(kanban): orchestrator-driven auto-decomposition on triage
Closes the core gap in the kanban system: dropping a one-liner into Triage
now decomposes it into a graph of child tasks routed to specialist
profiles by description, matching teknium's original vision ("main
orchestrator splits/creates actual tasks, doles them out to each agent").
The build
---------
- hermes_cli/profiles.py: new `description` + `description_auto` fields
on ProfileInfo, persisted in <profile_dir>/profile.yaml. Helpers
read_profile_meta / write_profile_meta. `create_profile` accepts
optional description.
- hermes_cli/profile_describer.py: new module — auto-generate a 1-2
sentence description from a profile's skills + model + name via the
auxiliary LLM (`auxiliary.profile_describer`).
- hermes_cli/main.py: new `hermes profile create --description ...`
flag; new `hermes profile describe [name] [--text ... | --auto |
--all --auto]` subcommand.
- hermes_cli/kanban_db.py: new `decompose_triage_task` atomic helper —
creates N child tasks, links the root as a child of every leaf
(root waits for the whole graph), flips root `triage -> todo` with
orchestrator assignee, records an audit comment + `decomposed` event
in a single write_txn.
- hermes_cli/kanban_decompose.py: new module — calls the auxiliary LLM
(`auxiliary.kanban_decomposer`) with the profile roster + descriptions
to produce a JSON task graph, then invokes the DB helper. Rewrites
unknown assignees to the configured `kanban.default_assignee` (or
the active default profile) so a task NEVER lands with assignee=None.
Falls back to specify-style single-task promotion when the LLM
returns `fanout: false`.
- hermes_cli/kanban.py: new `hermes kanban decompose [task_id | --all]`
CLI verb.
- hermes_cli/config.py: new DEFAULT_CONFIG keys —
kanban.orchestrator_profile, kanban.default_assignee,
kanban.auto_decompose (default True), kanban.auto_decompose_per_tick
(default 3), auxiliary.kanban_decomposer, auxiliary.profile_describer.
- gateway/run.py: kanban dispatcher watcher now runs auto-decompose
before each `_tick_once`, capped by `auto_decompose_per_tick` so a
bulk-load of triage tasks doesn't burst-spend the aux LLM.
- plugins/kanban/dashboard/plugin_api.py: new endpoints —
GET /profiles (list roster + descriptions),
PATCH /profiles/<name> (set description, user-authored),
POST /profiles/<name>/describe-auto (LLM-generate),
POST /tasks/<id>/decompose (run decomposer),
GET/PUT /orchestration (orchestrator/default-assignee/auto-decompose
pickers, with resolved fallbacks echoed back).
- plugins/kanban/dashboard/dist/index.js: new OrchestrationPanel
collapsible — dropdowns for orchestrator profile and default
assignee, auto-decompose toggle, per-profile description editor with
Save and Auto-generate buttons. New ⚗ Decompose button next to
✨ Specify on triage-column task drawers.
Behavior
--------
- A task in Triage gets fanned out into a small DAG of child tasks.
Children with no internal parents flip to `ready` immediately
(parallel dispatch). Children with sibling parents wait. The root
stays alive as a parent of every child — when the whole graph
finishes, it promotes to `ready` and the orchestrator profile wakes
back up to judge completion (the "adds more tasks until done" part
of the original vision).
- `kanban.orchestrator_profile` unset -> falls back to the default
profile (whichever `hermes` launches with no -p flag).
- `kanban.default_assignee` unset -> same fallback. Tasks NEVER end
up unassigned.
- `kanban.auto_decompose=true` (default) runs the decomposer
automatically on dispatcher ticks; manual `hermes kanban decompose`
is always available.
Tests
-----
- tests/hermes_cli/test_kanban_decompose_db.py — 7 tests for the
atomic DB helper (status transitions, dep graph, audit trail,
validation errors).
- tests/hermes_cli/test_kanban_decompose.py — 6 tests for the
decomposer module (fanout, no-fanout fallback, unknown-assignee
rewrite, malformed-JSON resilience, no-aux-client path).
- tests/hermes_cli/test_profile_describer.py — 10 tests for
profile.yaml r/w + the LLM auto-describer (yaml corrupt tolerance,
user-vs-auto description protection, --overwrite, fallback parsing).
E2E
---
- CLI end-to-end: created profiles with descriptions, dropped a triage
task, mocked the aux LLM with a 3-task graph -> verified all three
children were created with the right assignees, the dependency
edges matched the LLM's graph, root flipped to todo gated by every
child, audit comment + `decomposed` event recorded.
- Dashboard end-to-end: started the dashboard against an isolated
HERMES_HOME, verified all four new endpoints via curl (profile
listing, PATCH for description, PUT for orchestration settings,
POST for decompose). Opened the UI in the browser, confirmed the
OrchestrationPanel renders with all three pickers + the per-profile
description editor, typed a description, clicked Save, verified
~/.hermes/profile.yaml was written. Clicked Decompose on the triage
card and confirmed the inline error message surfaced as designed
("no auxiliary client configured").
* feat(kanban): surface decompose mode (Auto/Manual) as a one-click pill
The auto/manual toggle already existed as kanban.auto_decompose (default
true), but it was buried inside the collapsed Orchestration settings
panel — users couldn't tell at a glance which mode they were in. This
hoists it to a pill at the top of the kanban page so the state is always
visible and one click flips it.
UX
- New "⚗ Decompose: AUTO|MANUAL" pill in the kanban header. Emerald
styling when Auto is on (the default), muted/gray when Manual.
- Pill is visible both in the collapsed AND expanded Orchestration
settings views so context is preserved when the user opens the panel.
- Tooltip explains both states + what clicking does.
- Renamed the in-panel "Auto-decompose on triage / Enabled" checkbox
to "Decompose mode / Auto (default) | Manual" for language parity
with the pill.
Behavior preserved
- Default remains Auto (kanban.auto_decompose=true).
- Manual mode restores pre-PR behavior: triage tasks stay in triage
until the user clicks ⚗ Decompose on each card (or runs
`hermes kanban decompose <id>`).
Implementation
- plugins/kanban/dashboard/dist/index.js: load /orchestration on mount
(not just on expand) so the collapsed pill reflects real state.
Render mode pill in both collapsed and expanded headers. Reuses the
existing PUT /api/plugins/kanban/orchestration endpoint — no new
backend, no new tests required.
E2E verified
- Pill renders as "⚗ Decompose: AUTO" on page load (default).
- One click flips to "⚗ Decompose: MANUAL" with muted styling.
- config.yaml on disk shows auto_decompose: false after the flip.
- Second click round-trips back to Auto; config.yaml flips to true.
* feat(kanban): rename mode pill to "Orchestration: Auto/Manual"
Per Teknium feedback — "Decompose" was too implementation-specific.
"Orchestration" is the user-facing concept (the whole pitch is the
orchestrator profile routing work), and the pill is the front door to it.
- Pill text: "Orchestration: Auto" / "Orchestration: Manual" (title case,
no ⚗ prefix, no SHOUTY-CAPS for the mode value)
- In-panel checkbox label: "Orchestration mode" (was "Decompose mode")
- Tooltips updated to match
- No behavior change
* docs(kanban): document decompose, profile descriptions, orchestration mode
Brings the docs site up to parity with the PR. English build verified
locally (npx docusaurus build --locale en) — clean, no new broken links
or anchors. Pre-existing broken-link warnings (rl-training, llms.txt,
step-by-step-checklist, fallback-model) untouched.
- website/docs/reference/cli-commands.md
+ `hermes kanban decompose` action row in the action table, with
pointer to the Auto vs Manual orchestration section.
- website/docs/reference/profile-commands.md
+ `--description "<text>"` flag on `hermes profile create`.
+ Full `hermes profile describe` section: read, --text, --auto,
--overwrite, --all flags with examples.
- website/docs/user-guide/features/kanban.md (the big one)
+ Triage column intro rewritten around the Auto-decompose default
behavior, with pointer to the new Auto vs Manual section.
+ Status action row updated to mention both ⚗ Decompose and
✨ Specify on triage cards.
+ New "Auto vs Manual orchestration" section explaining the two
modes, how to flip them (pill, config), how routing-by-description
works, the no-None-assignee guarantee, plus a config knob table
(auto_decompose, auto_decompose_per_tick, orchestrator_profile,
default_assignee) and the two new auxiliary slots
(kanban_decomposer, profile_describer).
+ REST surface table gains 6 new endpoint rows: /tasks/:id/decompose,
/profiles (GET), /profiles/:name (PATCH), /profiles/:name/describe-auto,
/orchestration (GET + PUT).
- website/docs/user-guide/features/kanban-tutorial.md
+ Triage column blurb updated for Auto by default + Manual via the
pill, with cross-link to the Auto vs Manual orchestration section.
- website/docs/user-guide/profiles.md
+ Blank-profile flow now mentions --description and points to the
kanban routing model for context.
- website/docs/user-guide/configuration.md
+ `kanban_decomposer` and `profile_describer` added to the
`hermes model -> Configure auxiliary models` menu listing.
PR #25580 was authored before #2746 landed on main, so its plugin
versions of browser_use/browserbase/firecrawl ship without the
requests.RequestException → RuntimeError wrapping that 13c72fb4 added
to the legacy tools/browser_providers/ files for #2746. Cherry-picking
the PR + git rm'ing the legacy files (the migration's intent) would
silently revert that network-error fix.
Port the same try/except pattern into the three plugin create_session()
methods. Browser Use managed-mode keeps its raw-exception propagation
(idempotency-key retry semantics).
Co-authored-by: nidhi-singh02 <nidhi2894@gmail.com>
Addresses findings from two self-review passes pre-merge.
First pass (3-agent parallel review):
1. plugins/browser/browser_use/provider.py: drop the
``_ = managed_nous_tools_enabled`` dead-import-hider in
_get_config_or_none(). The import was actively misleading — the
helper IS used in _get_config() (separate method, separate import),
not here. The "keep static analysis happy" comment was wrong about
what the helper does in this scope.
2. agent/browser_provider.py: drop ``pragma: no cover`` from
is_configured() / provider_name() backward-compat aliases. They ARE
covered by ``TestLegacyAbcAliases`` — the pragma would have masked
future regressions.
3. tools/browser_tool.py: refactor _is_legacy_provider_registry_overridden()
to compare against a module-frozen _DEFAULT_PROVIDER_REGISTRY snapshot
instead of hardcoded set of 3 keys. Future maintainers adding a 4th
built-in provider now just extend _PROVIDER_REGISTRY; the override
detection adapts automatically. Previously the hardcoded
``set(...) != {"browserbase", "browser-use", "firecrawl"}`` would flip
True forever on any 4-key registry, silently routing every install
onto the legacy fixture path.
4. tools/browser_tool.py: when explicit ``browser.cloud_provider`` is set
but the registry has no matching plugin (typo, uninstalled plugin,
discovery failure), emit a WARNING with actionable text instead of
silently falling through to auto-detect. Legacy code surfaced a typed
credentials error via direct class instantiation; this log restores
the signal in the post-migration path.
5. agent/browser_registry.py: trim the triple-redundant _LEGACY_PREFERENCE
documentation. Module docstring + 13-line block-comment + 5-line
inline comment was repeating the same point. Kept the docstring and
trimmed the block-comment to 5 lines.
6. agent/browser_registry.py: upgrade is_available()-raised logging from
DEBUG to WARNING with exc_info=True. A provider's availability check
throwing is unusual enough that users debugging "no cloud provider"
need the traceback in logs.
7. tests/plugins/browser/check_parity_vs_main.py: drop dead top-level
imports (os, shutil, tempfile — only referenced inside the
SUBPROCESS_SCRIPT string literal that runs in a child process).
Second pass (architecture + claim-verification review):
8. tools/browser_tool.py: rewrite the inline comment in _get_cloud_provider
auto-detect branch. Prior text claimed it "routes through the plugin
registry's legacy preference walk so third-party plugins still get a
chance to be selected when they're explicitly configured" — false on
both counts. The branch uses module-level legacy class aliases
(BrowserUseProvider / BrowserbaseProvider) directly; third-party
plugins are intentionally reachable only via explicit
``browser.cloud_provider``. Corrected comment now matches behaviour
and cross-references _LEGACY_PREFERENCE for the firecrawl gate
rationale.
9. tools/browser_tool.py + tests/tools/test_managed_browserbase_and_modal.py:
drop the unused ``get_active_browser_provider as
_registry_get_active_browser_provider`` alias from the
``from agent.browser_registry import ...`` block. It was never
referenced; matching test-stub line in the agent.browser_registry
SimpleNamespace also dropped. ``get_provider`` is still imported (used
by the explicit-config dispatch path at line 535).
10. plugins/browser/firecrawl/provider.py: align emergency_cleanup()
with the early-guard pattern used in browserbase + browser_use
plugins. Previously firecrawl tried the DELETE and relied on
``_headers()`` raising ValueError to trip a "missing credentials"
warning; same final outcome but a different control flow that read
like a bug to a maintainer skimming the three modules. Now: if
is_available() is False, log+return early — identical shape to the
other two providers.
Verification: 54/54 unit tests + 13/13 parity scenarios still pass.
Migrates the remaining two cloud browser providers to plugins:
plugins/browser/browser_use/ — dual auth (direct BROWSER_USE_API_KEY
or managed Nous gateway), idempotency-
key handling for retried managed-mode
creates, x-external-call-id capture.
plugins/browser/firecrawl/ — direct FIRECRAWL_API_KEY only;
distinct from plugins/web/firecrawl/
(same key, different endpoint).
Also drops the 'single-eligible shortcut' rule from
agent.browser_registry._resolve(). Was a copy-paste from
web_search_registry that would have introduced a real behavior change:
a user with only FIRECRAWL_API_KEY set (for web-extract) would silently
get routed to a paid Firecrawl cloud browser on a fresh install — not
matching origin/main, which only auto-detected between Browser Use and
Browserbase. Third-party browser plugins are subject to the same gate:
they require explicit `browser.cloud_provider` to take effect.
Verified end-to-end via plugin discovery:
- 3 plugins register (browser-use, browserbase, firecrawl)
- _resolve(None) with no creds: None (local mode)
- _resolve(None) with only FIRECRAWL_API_KEY: None (matches main)
- _resolve('firecrawl'): firecrawl (explicit wins)
- _resolve(None) with BU+firecrawl: browser-use (legacy walk first hit)
- _resolve(None) with all three: browser-use (legacy walk order)
Migrates tools/browser_providers/browserbase.py → plugins/browser/browserbase/.
Direct credentials only (BROWSERBASE_API_KEY + BROWSERBASE_PROJECT_ID); same
session-creation, 402-handling, and feature-flag logic as the legacy
implementation. Renames is_configured() → is_available() to match the new
BrowserProvider ABC.
The legacy module tools/browser_providers/browserbase.py is NOT yet deleted
and tools/browser_tool.py still references the in-tree class. The dispatcher
cutover happens in a later commit so the plugin migration and the dispatcher
switch land as separate reviewable units.
Verified via plugin-discovery E2E:
- browserbase registers as 'browserbase'
- is_available() correctly tracks BROWSERBASE_API_KEY + BROWSERBASE_PROJECT_ID
- _resolve('browserbase') returns the provider even when unavailable
(so dispatcher surfaces a typed credentials error)
- _resolve(None) returns the provider when it's the single eligible one
Six days after #23937 (608 fixes) the codebase had accumulated 241 new
PLR6201 violations. Same mechanical `x in (...)` → `x in {...}` fix,
same zero-risk profile: set lookup is O(1) vs O(n) for tuple and the
two are semantically equivalent for hashable scalar membership tests.
All 241 instances fixed via `ruff check --select PLR6201 --fix
--unsafe-fixes`, zero remaining. Every changed value is a hashable
scalar (str/int/None/enum/signal); no risk of unhashable runtime
errors. No behavior change.
Test plan:
- 119 files changed, +244/-244 (net zero) — exactly one-line edits
- `ruff check` clean afterward
- Compile checks pass on the largest touched files (cli.py, run_agent.py,
gateway/run.py, gateway/platforms/discord.py, model_tools.py)
- Subset broad test run on tests/gateway/ tests/hermes_cli/ tests/agent/
tests/tools/: 18187 passed, 59 pre-existing failures (verified against
origin/main with the same shape — identical failure count, identical
category — all xdist test-order flakes unrelated to this change)
Follows the same template as PR #23937 ([tracker: #23972](https://github.com/NousResearch/hermes-agent/issues/23972)).
_LineClient's five aiohttp.ClientSession() calls omit trust_env=True,
silently bypassing HTTP_PROXY / HTTPS_PROXY / ALL_PROXY. Result: every
LINE API call (reply, push, loading, fetch_content, get_bot_user_id)
ignores the system proxy.
Fix: add trust_env=True to all five session constructions. Symmetric
with the wecom and weixin adapters which already set this flag. No
behavior change for users not behind a proxy.
aiohttp.ClientSession defaults to trust_env=False, which silently ignores
HTTP_PROXY, HTTPS_PROXY, and ALL_PROXY environment variables. Users behind
a corporate or network proxy cannot reach external APIs on any of these
platforms — all outbound requests fail with connection errors.
Symmetric with wecom.py (line 276), weixin.py (lines 1055/1268/1274), and
matrix.py (no-proxy path) which already set this flag. Complements the
open LINE fix (#26635) with the remaining gateway and plugin adapters.
Changed:
- gateway/platforms/sms.py: persistent Twilio session (connect) + fallback
session (send) — both hit https://api.twilio.com
- gateway/platforms/slack.py: ephemeral response_url POST session —
hits https://hooks.slack.com/... callback URLs
- plugins/platforms/teams/adapter.py: standalone send session —
hits login.microsoftonline.com (token) + Bot Framework service URL
- plugins/platforms/google_chat/adapter.py: standalone send session —
hits https://chat.googleapis.com/v1/...
WhatsApp sessions are excluded: they connect to http://127.0.0.1:{port}
(local bridge) and must not be routed through a system proxy.
Closes#26924 (and supersedes #26926) in spirit.
DeepSeek was missing `default_aux_model` on its `ProviderProfile`, so
`_get_aux_model_for_provider("deepseek")` returned an empty string and
the compression / vision / session-search paths emitted
"No auxiliary LLM provider configured -- context compression will
drop middle turns without a summary."
on every DeepSeek session, even when the user had perfectly working
DeepSeek credentials.
Fix lands at the profile layer rather than the legacy
`_API_KEY_PROVIDER_AUX_MODELS_FALLBACK` dict the original PR targeted.
Every modern provider (gemini, zai, minimax, anthropic, kimi-coding,
stepfun, ollama-cloud, gmi, novita, kilocode, ai-gateway, opencode-zen)
sets `default_aux_model` on its `ProviderProfile`; the fallback dict
only exists for providers that predate the profiles system.
Tests added under `tests/plugins/model_providers/test_deepseek_profile.py`:
- `test_profile_advertises_deepseek_chat` -- pins the profile attribute
- `test_consumer_api_returns_deepseek_chat` -- pins the consumer API behavior
- `test_consumer_api_returns_non_empty` -- regression guard for the
symptom in the issue
Original diagnosis and aux-model choice from @kriscolab in PR #26926;
moved one layer up.
Co-authored-by: kriscolab <71590782+kriscolab@users.noreply.github.com>
The langfuse plugin is hooks-only (no toolsets), so it never appears in
`hermes tools` — that menu iterates `_get_effective_configurable_toolsets()`
(= `CONFIGURABLE_TOOLSETS` + plugin-registered toolsets), and "langfuse"
is in neither. The `TOOL_CATEGORIES["langfuse"]` setup wizard (with its
`post_setup: "langfuse"` hook that pip-installs the SDK and writes
`plugins.enabled`) was reachable only when a toolset key "langfuse" got
enabled, which can't happen — so it's been dead code, and the docs that
promised "Setup (interactive): hermes tools → Langfuse Observability"
were silently broken.
Right home for that wizard is `hermes plugins` (e.g. auto-running a
plugin's post-setup hook on enable), which is a generic plugin-setup
mechanism worth designing properly rather than shoehorning langfuse
back into `hermes tools`. Until that exists, point users at the
working manual flow.
Code:
- Delete `TOOL_CATEGORIES["langfuse"]` (24 lines) — unreachable.
- Delete the `post_setup_key == "langfuse"` branch in `_run_post_setup`
(29 lines) — only caller was the deleted TOOL_CATEGORIES entry.
Docs / comments (point at the manual flow + interactive `hermes plugins`):
- `plugins/observability/langfuse/README.md`: collapse the two-option
setup section to the single working flow.
- `plugins/observability/langfuse/plugin.yaml`: update `description`.
- `plugins/observability/langfuse/__init__.py`: update module docstring.
- `hermes_cli/config.py`: update inline comment above the LANGFUSE_*
env-var allow-list.
- `website/docs/user-guide/features/built-in-plugins.md`: collapse
"Setup (interactive)" + "Setup (manual)" into one accurate block.
- `website/docs/reference/environment-variables.md`: update the
cross-reference in the Langfuse env-vars section.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ready column help and fallbacks now describe dependency-ready work; show a
badge on unassigned ready cards and fix the stale unassigned tooltip. Align
localized Ready help strings with the new semantics.
Co-authored-by: Cursor <cursoragent@cursor.com>
The cherry-picked PR #15251 from @tw2818 correctly identified the
DeepSeek 400 root cause but placed the fix in the legacy fallback path
of `build_kwargs`, which DeepSeek never reaches — DeepSeek has a
registered ProviderProfile and goes through `_build_kwargs_from_profile`
instead. The legacy-path block was therefore dead code.
This commit pivots the fix to where it actually fires:
- New `DeepSeekProfile` in `plugins/model-providers/deepseek/__init__.py`
overrides `build_api_kwargs_extras` to emit DeepSeek's expected wire
format (mirrors `KimiProfile`):
{"reasoning_effort": "<low|medium|high|max>",
"extra_body": {"thinking": {"type": "enabled" | "disabled"}}}
- Model gating: only `deepseek-v4-*` and `deepseek-reasoner` emit
thinking control. `deepseek-chat` (V3) is untouched — current behavior.
- Effort mapping: low/medium/high passthrough, xhigh/max → max, unset →
omitted (DeepSeek server applies its own default).
- Revert the legacy-path additions from PR #15251 — they were dead code,
and the `_copy_reasoning_content_for_api` strip block specifically
would have nullified the existing reasoning_content padding machinery
(`_needs_deepseek_tool_reasoning` → space-pad on replay) that the
active provider already relies on for replay correctness.
- Unit tests pin the wire-shape contract and the model gating rules
(26 tests, all passing). Existing transport + provider profile suites
(321 tests) continue to pass.
- AUTHOR_MAP: map twebefy@gmail.com → tw2818 for release notes credit.
Closes#15700, #17212, #17825.
Co-authored-by: tw2818 <twebefy@gmail.com>
Wraps every sync->async coroutine-scheduling site in the codebase with a
new agent.async_utils.safe_schedule_threadsafe() helper that closes the
coroutine on scheduling failure (closed loop, shutdown race, etc.)
instead of leaking it as 'coroutine was never awaited' RuntimeWarnings
plus reference leaks.
22 production call sites migrated across the codebase:
- acp_adapter/events.py, acp_adapter/permissions.py
- agent/lsp/manager.py
- cron/scheduler.py (media + text delivery paths)
- gateway/platforms/feishu.py (5 sites, via existing _submit_on_loop helper
which now delegates to safe_schedule_threadsafe)
- gateway/run.py (10 sites: telegram rename, agent:step hook, status
callback, interim+bg-review, clarify send, exec-approval button+text,
temp-bubble cleanup, channel-directory refresh)
- plugins/memory/hindsight, plugins/platforms/google_chat
- tools/browser_supervisor.py (3), browser_cdp_tool.py,
computer_use/cua_backend.py, slash_confirm.py
- tools/environments/modal.py (_AsyncWorker)
- tools/mcp_tool.py (2 + 8 _run_on_mcp_loop callers converted to
factory-style so the coroutine is never constructed on a dead loop)
- tui_gateway/ws.py
Tests: new tests/agent/test_async_utils.py covers helper behavior under
live loop, dead loop, None loop, and scheduling exceptions. Regression
tests added at three PR-original sites (acp events, acp permissions,
mcp loop runner) mirroring contributor's intent.
Live-tested end-to-end:
- Helper stress test: 1500 schedules across live/dead/race scenarios,
zero leaked coroutines
- Race exercised: 5000 schedules with loop killed mid-flight, 100 ok /
4900 None returns, zero leaks
- hermes chat -q with terminal tool call (exercises step_callback bridge)
- MCP probe against failing subprocess servers + factory path
- Real gateway daemon boot + SIGINT shutdown across multiple platform
adapter inits
- WSTransport 100 live + 50 dead-loop writes
- Cron delivery path live + dead loop
Salvages PR #2657 — adopts contributor's intent over a much wider site
list and a single centralized helper instead of inline try/except at
each site. 3 of the original PR's 6 sites no longer exist on main
(environments/patches.py deleted, DingTalk refactored to native async);
the equivalent fix lives in tools/environments/modal.py instead.
Co-authored-by: JithendraNara <jithendranaidunara@gmail.com>
Adds a new authentication provider that lets SuperGrok subscribers sign
in to Hermes with their xAI account via the standard OAuth 2.0 PKCE
loopback flow, instead of pasting a raw API key from console.x.ai.
Highlights
----------
* OAuth 2.0 PKCE loopback login against accounts.x.ai with discovery,
state/nonce, and a strict CORS-origin allowlist on the callback.
* Authorize URL carries `plan=generic` (required for non-allowlisted
loopback clients) and `referrer=hermes-agent` for best-effort
attribution in xAI's OAuth server logs.
* Token storage in `auth.json` with file-locked atomic writes; JWT
`exp`-based expiry detection with skew; refresh-token rotation
synced both ways between the singleton store and the credential
pool so multi-process / multi-profile setups don't tear each other's
refresh tokens.
* Reactive 401 retry: on a 401 from the xAI Responses API, the agent
refreshes the token, swaps it back into `self.api_key`, and retries
the call once. Guarded against silent account swaps when the active
key was sourced from a different (manual) pool entry.
* Auxiliary tasks (curator, vision, embeddings, etc.) route through a
dedicated xAI Responses-mode auxiliary client instead of falling back
to OpenRouter billing.
* Direct HTTP tools (`tools/xai_http.py`, transcription, TTS, image-gen
plugin) resolve credentials through a unified runtime → singleton →
env-var fallback chain so xai-oauth users get them for free.
* `hermes auth add xai-oauth` and `hermes auth remove xai-oauth N` are
wired through the standard auth-commands surface; remove cleans up
the singleton loopback_pkce entry so it doesn't silently reinstate.
* `hermes model` provider picker shows
"xAI Grok OAuth (SuperGrok Subscription)" and the model-flow falls
back to pool credentials when the singleton is missing.
Hardening
---------
* Discovery and refresh responses validate the returned
`token_endpoint` host against the same `*.x.ai` allowlist as the
authorization endpoint, blocking MITM persistence of a hostile
endpoint.
* Discovery / refresh / token-exchange `response.json()` calls are
wrapped to raise typed `AuthError` on malformed bodies (captive
portals, proxy error pages) instead of leaking JSONDecodeError
tracebacks.
* `prompt_cache_key` is routed through `extra_body` on the codex
transport (sending it as a top-level kwarg trips xAI's SDK with a
TypeError).
* Credential-pool sync-back preserves `active_provider` so refreshing
an OAuth entry doesn't silently flip the active provider out from
under the running agent.
Testing
-------
* New `tests/hermes_cli/test_auth_xai_oauth_provider.py` (~63 tests)
covers JWT expiry, OAuth URL params (plan + referrer), CORS origins,
redirect URI validation, singleton↔pool sync, concurrency races,
refresh error paths, runtime resolution, and malformed-JSON guards.
* Extended `test_credential_pool.py`, `test_codex_transport.py`, and
`test_run_agent_codex_responses.py` cover the pool sync-back,
`extra_body` routing, and 401 reactive refresh paths.
* 165 tests passing on this branch via `scripts/run_tests.sh`.
* fix(langfuse): reject placeholder credentials with one-shot warning
When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY
at a template value like 'placeholder', 'test-key', or 'your-langfuse-key',
the Langfuse SDK silently accepts the credentials at construction time and
drops every trace at flush time. No warning, no error — just an empty
Langfuse dashboard the operator only notices hours later.
Add prefix-based validation in _get_langfuse() against the documented
'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side.
Anything else fires a single warning naming the offending env var(s)
with a log-safe value preview (full string for short placeholders so the
operator knows which template they left in place; truncated for long
values so a real secret pasted into the wrong field never hits the log),
then short-circuits via the existing _INIT_FAILED cache so the warning
fires once per process, not once per hook invocation.
The check sits after the 'Langfuse is None' SDK-installed guard so hosts
without the optional langfuse SDK don't see misleading 'set real keys'
hints when the actionable fix is 'pip install langfuse'. Missing
credentials remains the documented opt-out path and stays silent — no
log noise for unconfigured installs.
Fixes#22763Fixes#23823
* fix(langfuse): use actual API request messages for generation input
on_pre_llm_request previously used the messages kwarg alone, which
could be None when Hermes passes the payload via request_messages,
conversation_history, or user_message instead. Add _coerce_request_messages
to pick the first available list across all variants, falling back to a
synthetic user message. Generations now show the real outbound payload
rather than an empty input.
* fix(langfuse): record tool call outputs in traces
Tool observations showed input (arguments) but output was always
undefined. Root cause: when tool_call_id is empty, pre_tool_call stored
observations under a unique time-based key that post_tool_call could
never reconstruct, so every tool span was closed without output by the
_finish_trace sweep.
Fix pre/post matching by routing empty-tool_call_id tools through a
per-name FIFO queue (pending_tools_by_name) instead of the time-based
key. Tools with a tool_call_id continue to use the id-keyed dict.
Also:
- Preserve OpenAI-style nested function shape in serialized tool calls
so Langfuse renders name/arguments correctly
- Keep name + tool_call_id on role:tool messages for proper pairing
- Backfill tool results onto the matching turn_tool_calls entry so the
generation's tool-call record carries the result alongside arguments
- Coerce request messages from whichever field the runtime provides
(request_messages, messages, conversation_history, user_message)
* fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test
Self-review of the combined #22345 + #23831 salvage surfaced three issues
worth fixing in the same PR rather than as follow-ups:
1. Drop is_first_turn from the pre_api_request hook. The boolean expression
`not bool(conversation_history)` was wrong: conversation_history is
reassigned to None mid-run after compression (5 sites in run_agent.py),
so the value flips False -> True mid-conversation on every post-compression
API call. The langfuse plugin never consumed it, so the kwarg was both
misleading AND dead.
2. Replace copy.deepcopy(request_messages) with shallow list() copy. The
pre_api_request hook contract discards return values (invoke_hook never
writes back to api_kwargs), and the langfuse plugin's _serialize_messages
already builds its own snapshot dicts via _safe_value. A deepcopy on every
API call would walk every tool result and base64 image — significant
overhead for no real isolation benefit. Shallow copy of the outer list
protects against later mutations of api_messages without paying for the
inner-dict walk.
3. Rename test_empty_tool_call_id_concurrent_fifo_order ->
test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a
real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8
threads behind a barrier to actually exercise _STATE_LOCK on the
pending_tools_by_name queue. The original test was sequential and only
validated Python list semantics; this one validates the lock discipline.
4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED —
that function does not exist. Tests reload the module via sys.modules.pop
+ importlib.import_module instead.
Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass.
---------
Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: Brian Conklin <brian@dralth.com>
SimpleX Chat (https://simplex.chat) is a private, decentralised messenger
with no persistent user IDs — every contact is identified by an opaque
internal ID generated at connection time. This adds it as a Hermes
gateway platform via the plugin system.
The adapter connects to a local simplex-chat daemon via WebSocket,
listens for inbound messages, and sends replies. Originally proposed in
PR #2558 as a core-modifying integration; reshaped here as a self-
contained plugin under plugins/platforms/simplex/ with no edits to any
core file. Discovery is filesystem-based (scanned by gateway.config),
and the platform identity is resolved on demand via Platform("simplex").
Plugin contract:
- check_requirements() requires SIMPLEX_WS_URL AND the websockets package
- validate_config() / is_connected() accept env or config.yaml input
- _env_enablement() seeds PlatformConfig.extra (ws_url + home_channel)
- _standalone_send() supports out-of-process cron delivery
- interactive_setup() provides a stdin wizard for hermes gateway setup
- register() wires the adapter into the registry with required_env,
install_hint, cron_deliver_env_var, allowed_users_env, and a
platform_hint for the LLM.
Lazy dependency: the websockets Python package is imported inside the
functions that need it. The plugin is importable and discoverable even
when websockets is missing — check_requirements() simply returns False
until `pip install websockets` is run. No new pyproject extras are
introduced.
Environment variables:
SIMPLEX_WS_URL WebSocket URL of the daemon (required)
SIMPLEX_ALLOWED_USERS Comma-separated allowed contact IDs
SIMPLEX_ALLOW_ALL_USERS Set true to allow all contacts
SIMPLEX_HOME_CHANNEL Default contact for cron delivery
SIMPLEX_HOME_CHANNEL_NAME Human label for the home channel
Closes#2557.
Add NovitaAI as a first-class provider with dedicated model selection
flow, live pricing, and authoritative context length resolution.
- Register provider in PROVIDER_REGISTRY, HERMES_OVERLAYS, and all
alias/label maps (ID: novita, aliases: novita-ai, novitaai)
- Add dedicated _model_flow_novita() with 3-tier model list fallback:
Novita API → models.dev → static curated list
- Fetch live pricing from /v1/models with correct unit conversion
(input_token_price_per_m is 0.0001 USD per Mtok)
- Add Novita-specific context length resolution (step 4b) in
get_model_context_length(), prioritized over models.dev/OpenRouter
- Register api.novita.ai in _URL_TO_PROVIDER to prevent early return
from the custom-endpoint code path
- Add models.dev mapping (novita → novita-ai)
- Add default auxiliary model (deepseek/deepseek-v3-0324)
- Add NOVITA_API_KEY to test isolation (conftest.py)
- Update docs: providers page, env vars reference, CLI reference,
.env.example, README, and landing page
Surfaced by local E2E behavior-parity testing of PR vs origin/main: the
plugin-migrated dispatchers were quietly changing the error envelope
shape returned to function-calling models on unconfigured systems.
Two findings, both from per-result error wrapping bleeding into the
pre-flight configuration error path:
1. **search**: ``firecrawl.search()`` caught the
``ValueError("Web tools are not configured...")`` from
``_get_firecrawl_client()`` and returned it as
``{"success": False, "error": ...}``, losing the legacy
``{"error": "Error searching web: ..."}`` envelope that
``tool_error()`` emits on main. Models that special-case the
``error`` key still detect the failure, but the prefix is part of
the legacy contract some users rely on.
2. **crawl**: ``firecrawl.crawl()`` caught the same pre-flight
``ValueError`` and wrapped it as a per-page error inside
``results[0]``. Main short-circuits on ``check_firecrawl_api_key()``
BEFORE dispatching, so its unconfigured response is
``{"success": False, "error": "web_crawl requires Firecrawl..."}``
at the top level. The PR's per-page burying hid the failure inside
``results[]`` where models that check ``result.get("error")`` would
miss it.
Fix:
- ``plugins/web/firecrawl/provider.py``: pull
``_get_firecrawl_client()`` outside the broad ``try`` in
``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate
to the dispatcher's top-level exception handler. In-flight SDK
errors still get wrapped as ``{"success": False, ...}``.
- ``tools/web_tools.py``: mirror main's upstream availability gate in
``web_crawl_tool``. When the resolved crawl provider is
``is_available()==False``, short-circuit BEFORE dispatching with the
same top-level error shape main emits.
- ``tests/tools/test_web_providers.py``: 2 regression tests
(``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so
future plugin work can't undo this.
Verified via local subprocess-based parity test (14/14 scenarios match
origin/main shape exactly) and full 210/210 web test suite green.
Self-review of the plugin migration surfaced one warning and a handful of
doc/dead-code cleanups. None affect production behaviour through the main
dispatcher (which always calls `tools.web_tools._get_backend()` first and
preserves the full 7-provider walk), but direct callers of
`agent.web_search_registry.get_active_*_provider()` previously diverged
from the legacy order and could return `None` for users with credentials
but no explicit `web.backend` config key.
Changes
-------
1. `_LEGACY_PREFERENCE` was shipped as a 4-tuple
`("brave-free", "firecrawl", "searxng", "ddgs")` while the PR
description and the legacy `_get_backend()` candidate order both
call for the 7-tuple
`(firecrawl, parallel, tavily, exa, searxng, brave-free, ddgs)`.
Replaced with the 7-tuple. Verified empirically: with TAVILY+EXA keys
and no config, `get_active_search_provider()` now returns tavily
(was None); with EXA+PARALLEL it returns parallel (was None); with
BRAVE+FIRECRAWL it returns firecrawl (was brave-free).
2. `agent/web_search_registry.py` — module docstring, `_resolve` step-3
docstring, and inline comment all listed the old 4-tuple and claimed
"brave-free first because it was the shipped default". The legacy
default is `"firecrawl"`. Rewritten to match the new ordering and
reference `tools.web_tools._get_backend()` as the source of truth.
3. `agent/web_search_registry.py` — `get_active_crawl_provider`
docstring said "only Tavily implements it among built-in providers".
Firecrawl also advertises `supports_crawl=True` after the previous
commit. Updated to "Tavily and Firecrawl".
4. `plugins/web/tavily/provider.py` — module docstring said "Tavily is
the only built-in backend that natively crawls". Updated.
5. `agent/web_search_provider.py` — ABC docstring mentioned only
`search` / `extract` capabilities. Added `crawl` for accuracy.
6. `plugins/web/{firecrawl,parallel,exa}/provider.py` — dead plugin-level
cache globals (`_firecrawl_client`, `_parallel_client`,
`_async_parallel_client`, `_exa_client`) were declared but never read
(all reads/writes go through `_wt.*` per the `extracting-inline-
helpers-to-plugins` recipe). Removed the dead declarations; the
reset-for-tests helpers in firecrawl + parallel now clear the
canonical `_wt._<name>` slots, matching the pattern exa already used.
Tests
-----
218/218 web-targeted tests still pass (no test changes needed). 4910/4910
in `tests/tools/` still green.
The web-provider migration originally left firecrawl crawl as the only
provider-specific code remaining inline in tools/web_tools.py (~250
lines of Firecrawl-specific crawl orchestration that didn't fit the
plugin's existing surface). This commit closes that gap.
What this adds
--------------
1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)``
- Accepts the same kwargs as the dispatcher passes to any crawl
provider (``instructions``, ``depth``, ``limit``); Firecrawl's
/crawl endpoint ignores ``instructions`` and ``depth`` so we log
and drop with a clear info message.
- Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the
gateway event loop isn't blocked on a multi-page crawl.
- Preserves the response-shape normalization across pydantic /
typed-object / dict variants that the legacy inline code did.
- Preserves per-page website-policy re-check (catches blocked
redirects after the SDK returns).
- Returns the same {"results": [...]} shape so the dispatcher's
shared LLM-summarization post-processing path works unchanged.
- Sets supports_crawl() to True so the dispatcher routes through
the plugin instead of the legacy fallthrough.
2. tools/web_tools.py: delete the entire legacy firecrawl crawl block
that used to run after "No registered provider supports crawl" —
~270 lines including:
- check_firecrawl_api_key gate + typed error
- inline SSRF + website-policy seed-URL gate (dispatcher already
does this)
- Firecrawl client setup with crawl_params
- 100+ lines of pydantic/dict/typed-object normalization
- Per-page LLM-processing loop (kept in the dispatcher's shared
post-processing path; that's where it always belonged)
- trimming + base64 image cleanup (still done in the dispatcher's
shared path)
Replaced with a single typed-error branch when no crawl-capable
provider is available: "web_crawl has no available backend. Set
FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set
TAVILY_API_KEY for Tavily."
Test updates
------------
- tests/tools/test_website_policy.py:
- test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL
gate still runs on web_tools.check_website_access (no change to
that patch), but the firecrawl client lockdown moved to the
plugin module — patch firecrawl_provider._get_firecrawl_client
instead of web_tools._get_firecrawl_client. The dispatcher
short-circuits before the plugin runs, so the test still passes.
- test_web_crawl_blocks_redirected_final_url: patch the per-page
policy gate at plugins.web.firecrawl.provider.check_website_access
(where it now runs) AND on web_tools (where the seed-URL gate
still runs). Patch firecrawl_provider._get_firecrawl_client for
the FakeCrawlClient injection. Both checks flow through the same
fake_check function.
- tests/plugins/web/test_web_search_provider_plugins.py:
- Update parametrized capability-flag spec: firecrawl supports_crawl
is now True.
- Add test_firecrawl_crawl_returns_error_dict_when_unconfigured —
verifies inspect.iscoroutinefunction(p.crawl) is True and that
the async crawl returns a per-page error dict (not a raise) when
FIRECRAWL_API_KEY is missing.
Verified
--------
- 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl
crawl test from this commit = 218 with the test deduplication).
- Compile-clean (py_compile passes on both files).
- Provider capabilities matrix confirmed end-to-end:
name search extract crawl async-extract? async-crawl?
firecrawl True True True True True
tavily True True True False False
Both crawl-capable providers exercise the dispatcher's
inspect.iscoroutinefunction async-or-sync detection.
Net diff
--------
- tools/web_tools.py: -254 lines (legacy inline crawl gone)
- plugins/web/firecrawl/provider.py: +185 lines (crawl method)
- test_website_policy.py: +14/-9 lines (patch locations)
- test_web_search_provider_plugins.py: +22/-1 lines (capability flag
+ new firecrawl crawl test)
- Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763
before this commit, 2227 before the migration started).
Removes the seven hardcoded TOOL_CATEGORIES["web"] provider rows that
duplicated the plugin-registered providers, and deletes the
_WEB_PLUGIN_SKIPLIST that existed to prevent duplicate picker rows
during the migration. The Web Search & Extract category now derives its
provider rows entirely from agent.web_search_registry via
_plugin_web_search_providers(), matching how Spotify, Google Meet, and
the image_gen plugins are surfaced.
Removed (deduplicated against plugin schemas):
- Firecrawl Cloud → plugins.web.firecrawl
- Exa → plugins.web.exa
- Parallel → plugins.web.parallel
- Tavily → plugins.web.tavily
- SearXNG → plugins.web.searxng
- Brave Search (Free Tier) → plugins.web.brave_free
- DuckDuckGo (ddgs) → plugins.web.ddgs (post_setup hook preserved)
Retained in TOOL_CATEGORIES["web"]:
- Nous Subscription — requires requires_nous_auth +
managed_nous_feature + override_env_vars
to drive the managed-gateway UX. Not a
provider — a different *setup flow* for the
firecrawl backend.
- Firecrawl Self-Hosted — points firecrawl at a private Docker URL
via FIRECRAWL_API_URL only. Same reason:
UX setup-flow row, not a provider.
These two rows describe alternative auth/billing paths for the
firecrawl backend; they intentionally share web_backend="firecrawl"
with the plugin row but light up different env-var prompts.
Plugin schema extensions
------------------------
- ddgs plugin's get_setup_schema() now emits `post_setup: "ddgs"` so
selection still triggers the pip-install hook in _run_post_setup().
- _plugin_web_search_providers() passes `post_setup` through verbatim
when present in the schema (other future plugins like camofox / a
hypothetical playwright-web plugin can opt in the same way).
- Picker rows now carry both `web_backend` (legacy field consumed by
setup + selection helpers) and `web_search_plugin_name`
(informational marker), so behavior is identical between hardcoded
and plugin-registered rows.
Net diff
--------
- hermes_cli/tools_config.py: -141/+50 lines (~91 lines net)
- plugins/web/ddgs/provider.py: +7/-4 (post_setup field + badge polish)
Verified
--------
- Compile-clean for both files
- Picker shows: 2 hardcoded rows (Nous Subscription, Firecrawl
Self-Hosted) + 7 plugin rows (alphabetically: Brave Search,
DuckDuckGo, Exa, Firecrawl, Parallel, SearXNG, Tavily). DuckDuckGo
row carries post_setup="ddgs" for first-time install.
- 173 web-specific tests still pass.
Removes ~580 lines of dead code from tools/web_tools.py that were
superseded by the plugin migration but kept around in the cutover commit
to keep the diff focused. Replaces them with thin re-export shims so
existing tests and external callers that reach for the legacy
``tools.web_tools.<name>`` paths continue to work transparently.
Deleted from tools/web_tools.py
--------------------------------
- Lazy Firecrawl SDK proxy (_load_firecrawl_cls, _FirecrawlProxy,
_FIRECRAWL_CLS_CACHE, the Firecrawl singleton)
- Firecrawl client section (_get_direct_firecrawl_config,
_get_firecrawl_gateway_url, _is_tool_gateway_ready,
_has_direct_firecrawl_config, _raise_web_backend_configuration_error,
_firecrawl_backend_help_suffix, _get_firecrawl_client)
- Parallel client section (_get_parallel_client,
_get_async_parallel_client, _parallel_client, _async_parallel_client)
- Tavily client section (_TAVILY_BASE_URL, _tavily_request,
_normalize_tavily_search_results, _normalize_tavily_documents)
- Generic SDK normalizers (_to_plain_object, _normalize_result_list,
_extract_web_search_results, _extract_scrape_payload)
- Exa client section (_get_exa_client, _exa_client, _exa_search,
_exa_extract)
- Parallel helpers (_parallel_search, _parallel_extract)
- Duplicate inline check_firecrawl_api_key
Net: tools/web_tools.py drops from 2227 → 1613 lines (-614 lines).
Re-exports added at top of tools/web_tools.py
---------------------------------------------
- From plugins.web.firecrawl.provider:
Firecrawl, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, _load_firecrawl_cls,
_get_direct_firecrawl_config, _get_firecrawl_gateway_url,
_is_tool_gateway_ready, _has_direct_firecrawl_config,
_firecrawl_backend_help_suffix, _raise_web_backend_configuration_error,
_get_firecrawl_client, _to_plain_object, _normalize_result_list,
_extract_web_search_results, _extract_scrape_payload,
check_firecrawl_api_key
- From plugins.web.tavily.provider:
_tavily_request, _normalize_tavily_search_results,
_normalize_tavily_documents
- From plugins.web.parallel.provider:
_get_parallel_client, _get_async_parallel_client
- From plugins.web.exa.provider:
_get_exa_client
Plus retained module-level imports for backward-compat with tests:
- httpx (tests patch tools.web_tools.httpx for tavily request mocking)
- build_vendor_gateway_url, _read_nous_access_token,
resolve_managed_tool_gateway, managed_nous_tools_enabled,
prefers_gateway (tests patch tools.web_tools.<name>)
Plugin indirection pattern (key technique)
------------------------------------------
For functions inside the firecrawl/parallel/exa plugins to honor
unit-test patches that target ``tools.web_tools.<name>``, the plugin
implementations now do ``import tools.web_tools as _wt`` at call time
and read helper names through that module (``_wt._read_nous_access_token``,
``_wt.Firecrawl``, ``_wt.prefers_gateway``, etc.). This makes the
existing test patches transparently reach the plugin code without any
test changes.
The cached client globals (_firecrawl_client, _firecrawl_client_config,
_parallel_client, _async_parallel_client, _exa_client) also now live on
tools.web_tools so existing test setup_method handlers that reset
``tools.web_tools._<vendor>_client = None`` between cases keep working.
The plugins read/write the cache via getattr/setattr on the web_tools
module.
Verified
--------
- 173/173 targeted web tests pass:
test_web_providers.py, test_web_providers_brave_free.py,
test_web_providers_ddgs.py, test_web_providers_searxng.py,
test_web_tools_config.py, test_web_tools_tavily.py,
test_website_policy.py, test_config_null_guard.py
- Compile-clean (py_compile.compile passes)
- All inline implementations now exist in exactly one place
(plugins.web.<vendor>.provider)
Follow-up clean-up
------------------
- Drop _WEB_PLUGIN_SKIPLIST + hardcoded TOOL_CATEGORIES["web"] rows
(next commit)
- Delete tools/web_providers/ directory entirely
- Add tests/plugins/web/ coverage
- Full tests/tools/ + tests/gateway/ regression sweep before promoting PR
Two regressions discovered by running the full tests/tools/ suite after
the dispatcher cutover, both fixed in this commit:
1. web_crawl_tool incorrectly errored "search-only" for firecrawl
---------------------------------------------------------------------
The cutover treated any provider with supports_crawl()==False as a
search-only backend and returned the typed search-only error. But
firecrawl can crawl via the legacy multi-page-extract path inside
web_crawl_tool — it just doesn't expose supports_crawl on the plugin
(adding native firecrawl crawl is a clean follow-up).
Fix: only emit the search-only error when the provider supports
NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the
provider supports extract but not crawl (firecrawl), fall through to
the legacy firecrawl-via-extract path below.
2. firecrawl plugin's check_website_access wasn't patchable
---------------------------------------------------------------------
The plugin imported `from tools.website_policy import check_website_access`
INSIDE the extract() function body, so monkeypatching the name on
plugins.web.firecrawl.provider had no effect — the inner import re-bound
the name on every call.
Fix: hoist the import to module level. Cheap (website_policy itself
has no heavy deps) and makes the standard
monkeypatch.setattr(firecrawl_provider, "check_website_access", ...)
pattern work.
Test updates (tests/tools/test_website_policy.py — 4 tests):
- test_web_extract_short_circuits_blocked_url
- test_web_extract_blocks_redirected_final_url
Both: patch the gate at plugins.web.firecrawl.provider (where it
runs after migration) and force the firecrawl plugin to be the
active extract provider via FIRECRAWL_API_KEY.
- test_web_crawl_short_circuits_blocked_url
- test_web_crawl_blocks_redirected_final_url
Both: unchanged — the dispatcher-level gate at tools.web_tools.py
line 1651 still uses the imported `check_website_access` name and
the firecrawl-fallthrough path is exercised as before.
Verified: 22/22 tests/tools/test_website_policy.py pass.
Migrates Firecrawl from inline code in tools/web_tools.py to a bundled
plugin at plugins/web/firecrawl/. By line count this is the largest of
the seven provider migrations: the firecrawl path captured most of the
file's vendor-specific complexity.
What moved into the plugin (all previously in tools/web_tools.py):
Lazy Firecrawl SDK proxy
- _load_firecrawl_cls() — caches the imported SDK class
- _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK
imports until first construction or isinstance check.
Client construction (dual auth)
- _get_direct_firecrawl_config() — direct FIRECRAWL_API_KEY/URL path
- _get_firecrawl_gateway_url() — managed Nous tool-gateway URL
- _is_tool_gateway_ready() — gateway URL + Nous token check
- _has_direct_firecrawl_config() — direct config present?
- _get_firecrawl_client() — combined client construction
honoring web.use_gateway
- check_firecrawl_api_key() — top-level "is firecrawl usable"
- _firecrawl_backend_help_suffix() — managed-gateway help string
- _raise_web_backend_configuration_error() — typed misconfig error
Response shape normalization (vendor-specific)
- _to_plain_object(), _normalize_result_list() — SDK→dict helpers
- _extract_web_search_results() — handles SDK/direct/gateway shapes
- _extract_scrape_payload() — nested-data unwrap for scrape
Per-URL extract loop
- 60s asyncio.wait_for timeout per URL
- Pre-scrape website-policy gate
- Post-scrape redirect-aware SSRF re-check
- Format-aware content selection (markdown / html / auto)
- Per-URL errors returned as {"error": str} entries, no raises
Extract is declared `async def` — each URL is scraped in
asyncio.to_thread(...). This is the second async-extract plugin after
parallel.
The plugin re-exports `Firecrawl` (the lazy proxy) and
`check_firecrawl_api_key()` so existing tests doing
`patch("tools.web_tools.Firecrawl")` or
`monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep
working — tools/web_tools.py re-exports both names in the next
dispatcher-cutover commit.
Note: web_crawl_tool still has its own Firecrawl crawl path inline
(separate from extract); the Firecrawl SDK supports /crawl but we don't
expose supports_crawl=True on this plugin yet. Tavily handles crawl
today. Adding Firecrawl crawl is a clean follow-up.
Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST.
E2E verified:
- All 7 providers register: brave-free, ddgs, exa, firecrawl,
parallel, searxng, tavily
- inspect.iscoroutinefunction(firecrawl.extract) -> True
- Firecrawl proxy is a callable lazy proxy at module level
- check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence
Migrates Tavily from inline _tavily_request() / _normalize_tavily_*
helpers in tools/web_tools.py to a bundled plugin at plugins/web/tavily/.
First plugin in the codebase to advertise supports_crawl=True. Tavily is
unique among built-in backends in offering a native /crawl endpoint that
walks linked pages from a seed URL with optional natural-language
instructions and depth ("basic" or "advanced").
Capabilities:
- supports_search() -> True (Tavily /search)
- supports_extract() -> True (Tavily /extract)
- supports_crawl() -> True (Tavily /crawl)
All sync (httpx.post under the hood).
The crawl method accepts forward-compat kwargs (instructions, depth,
limit) and is gated against unsafe URLs/policy by the dispatcher in
web_crawl_tool — exactly as before.
Behavior preserved:
- TAVILY_API_KEY required (ValueError → typed error response)
- TAVILY_BASE_URL env override honored
- /crawl requires both body auth AND Bearer header — preserved
- failed_results[] and failed_urls[] response keys mapped to per-URL
items with error fields rather than raising
- max_results capped at 20 server-side
Adds "tavily" to _WEB_PLUGIN_SKIPLIST.
The legacy inline _tavily_request / _normalize_tavily_search_results /
_normalize_tavily_documents / _TAVILY_BASE_URL in tools/web_tools.py are
NOT deleted yet — search/extract dispatch and the entire web_crawl_tool
function still reference them. They go away when those dispatchers are
cut over to the registry.
E2E verified:
- Tavily registers with all 3 capabilities
- Provider list now: brave-free, ddgs, exa, parallel, searxng, tavily
Migrates Parallel.ai from inline `_parallel_search()` / `_parallel_extract()`
in tools/web_tools.py to a bundled plugin at plugins/web/parallel/.
First plugin in the codebase to expose an async :meth:`extract`:
- search() is sync — Parallel.beta.search
- extract() is **async def** — AsyncParallel.beta.extract
The ABC's docstring on supports_extract() already permits sync-or-async;
this commit is the first to exercise the async path. The web_extract_tool
dispatcher (next commit) detects coroutines via
inspect.iscoroutinefunction and awaits accordingly.
Behavior preserved:
- PARALLEL_API_KEY required (raises ValueError if missing → surfaced
as {"success": False, "error": "..."} instead)
- PARALLEL_SEARCH_MODE env var honored (agentic|fast|one-shot, default
agentic), validated via _resolve_search_mode()
- Limit capped at 20 server-side via min(limit, 20)
- Per-URL failure mode preserved: response.errors[] each become a
result dict with an "error" field rather than raising
- Module-level _parallel_client / _async_parallel_client caches kept
(mirrors legacy singleton pattern)
Adds "parallel" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so
the picker doesn't double-list.
The legacy inline _parallel_search, _parallel_extract, _get_parallel_client,
_get_async_parallel_client in tools/web_tools.py are NOT deleted yet — the
dispatcher still calls them. They go away when the dispatcher cuts over.
E2E verified:
- inspect.iscoroutinefunction(p.search) -> False
- inspect.iscoroutinefunction(p.extract) -> True
- extract() returns a coroutine (not a list)
- 5 providers register correctly (brave-free, ddgs, exa, parallel, searxng)
Migrates Exa from the inline `_exa_search()` / `_exa_extract()` helpers in
tools/web_tools.py to a bundled plugin at plugins/web/exa/.
This is the first plugin in this PR to advertise supports_extract=True,
exercising the multi-capability ABC path that the initial three migrations
(brave_free, ddgs, searxng — all search-only) did not cover.
Both Exa methods are sync — the SDK is sync-only. The web_extract_tool
dispatcher in tools/web_tools.py will continue to call them inline until
Task "dispatch-extract-all" cuts it over to the registry.
Behaviour preserved bit-for-bit aside from the ABC method-name change:
- is_configured() -> is_available()
- provider_name() -> name (property)
- "exa" stays as the registered name
- Module-level `_exa_client` cache + lazy `from exa_py import Exa`
preserved at the new location.
- Errors (ValueError for missing API key, ImportError for missing SDK,
generic Exception) caught and surfaced as {"success": False, "error": ...}
instead of raising.
Adds "exa" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so the
hardcoded TOOL_CATEGORIES["web"] row and the plugin-injected row don't
duplicate during the spike. The skip-list goes away in the cleanup phase
along with the hardcoded row.
The legacy inline `_exa_search` / `_exa_extract` / `_get_exa_client` /
`_exa_client` in tools/web_tools.py are NOT deleted yet — the dispatcher
still references them. They go away in the next dispatcher-cutover commit.
E2E verified:
- Plugin discovers + registers
- .supports_search/.supports_extract/.supports_crawl = (True, True, False)
- .get_setup_schema() returns the picker row shape
- resolve(): explicit exa + EXA_API_KEY -> exa; without key -> exa (registered
but unavailable, dispatcher surfaces "EXA_API_KEY not set" error)
Deletes tools/web_providers/{brave_free,ddgs,searxng}.py — the three
providers that moved to plugins/web/ in prior commits. tools/web_tools.py
no longer imports them (registry dispatch as of d8735963f), so removing
them is purely a cleanup pass.
Also migrates the existing tests to the new import paths:
tests/tools/test_web_providers_brave_free.py
tests/tools/test_web_providers_ddgs.py
tests/tools/test_web_providers_searxng.py
Mechanical rewrites:
- `from tools.web_providers.X import YSearchProvider`
-> `from plugins.web.X.provider import YWebSearchProvider`
- `.is_configured()` -> `.is_available()` (legacy method -> new method)
- `.provider_name()` -> `.name` (legacy method -> new property)
- `from tools.web_providers.base import WebSearchProvider`
-> `from agent.web_search_provider import WebSearchProvider`
(the subclass-check asserts membership in the new plugin-facing ABC)
- `sys.modules.delitem("tools.web_providers.ddgs")` updated to point at
`plugins.web.ddgs.provider` (cache-busting for lazy ddgs imports)
The TestXBackendWiring / TestXSearchOnlyErrors classes (covering
_is_backend_available, _get_backend, check_web_api_key, and the
"search-only" error paths in web_extract/web_crawl) are untouched —
those still test web_tools.py's backend-selection logic, which continues
to recognize the names "brave-free" / "ddgs" / "searxng" even after the
modules behind them moved to plugins.
tools/web_providers/base.py is intentionally NOT deleted by this commit
— it's the parent ABC of the legacy modules and shares its name with
agent/web_search_provider.py::WebSearchProvider. Removing it surfaces the
naming collision (see PR description Finding 0); the real migration PR
deletes it in the same commit that drops the _WEB_PLUGIN_SKIPLIST
guards in hermes_cli/tools_config.py.
Test results:
bash scripts/run_tests.sh tests/tools/test_web_providers_*.py
-> 65 passed in 3.41s (all rewritten unit tests + unchanged integration tests)
bash scripts/run_tests.sh tests/tools/test_web_*.py
-> 141 passed in 4.70s (full web test set, post-deletion)
Adds plugins/web/searxng/. SearXNG aggregates results from upstream engines
via its JSON API (/search?format=json) — search-only, no extract capability
(supports_extract() returns False).
E2E verified — registry now has ['brave-free', 'ddgs', 'searxng'].
Adds plugins/web/ddgs/ following the same plugins/image_gen/ pattern as
brave_free. DuckDuckGo search via the community ddgs package; no API key,
package is an optional dep gated by is_available().
E2E verified — registry now has ['brave-free', 'ddgs'].
Adds plugins/web/brave_free/ as the first plugin built against the new
WebSearchProvider ABC. Mirrors the plugins/image_gen/openai/ layout exactly:
plugins/web/brave_free/
plugin.yaml kind: backend, provides_web_providers: [brave-free]
__init__.py register(ctx) -> ctx.register_web_search_provider(...)
provider.py BraveFreeWebSearchProvider(WebSearchProvider)
Behavior preserved: same name ("brave-free" with hyphen), same env var
(BRAVE_SEARCH_API_KEY), same HTTP request shape, same response normalization.
The legacy tools/web_providers/brave_free.py is left in place — the
dispatcher in tools/web_tools.py still references it. Task 7 cuts over the
dispatcher to the new registry; Task 10 deletes the legacy file.
E2E verified:
HERMES_PLUGINS_DEBUG=1 python -c "
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
from agent.web_search_registry import list_providers
print([p.name for p in list_providers()])
"
# -> ['brave-free']
* feat(video_gen): unified video_generate tool with pluggable provider backends
One core video_generate tool, every backend a plugin. Mirrors the
image_gen + memory_provider + context_engine architecture: ABC, registry,
plugin-context registration hook, and per-plugin model catalogs surfaced
through hermes tools.
Surface (one schema, every backend):
- operation: generate / edit / extend
- modalities: text-to-video (prompt only), image-to-video (prompt +
image_url), video edit (prompt + video_url), video extend (video_url)
- reference_image_urls, duration, aspect_ratio, resolution,
negative_prompt, audio, seed, model override
- Providers ignore unknown kwargs and declare what they support via
VideoGenProvider.capabilities() — backend-specific quirks stay in the
backend, the agent learns one tool
Backends shipped:
- plugins/video_gen/xai/ — Grok-Imagine, full generate/edit/extend +
image-to-video + reference images (salvaged from PR #10600 by
@Jaaneek, reshaped into the plugin interface)
- plugins/video_gen/fal/ — Veo 3.1 (t2v + i2v), Kling O3 i2v,
Pixverse v6 i2v with model-aware payload building that drops keys a
model doesn't declare
Wiring:
- agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation,
success_response / error_response, save_b64_video / save_bytes_video,
$HERMES_HOME/cache/videos/
- agent/video_gen_registry.py — thread-safe register/get/list +
get_active_provider() reading video_gen.provider from config.yaml
- hermes_cli/plugins.py — PluginContext.register_video_gen_provider()
- hermes_cli/tools_config.py — Video Generation category in
hermes tools, plugin-only providers list, model picker per plugin,
config write to video_gen.{provider,model}
- toolsets.py — new video_gen toolset
- tests: 31 new tests covering ABC, registry, tool dispatch, both plugins
- docs: developer-guide/video-gen-provider-plugin.md (parallel to the
image-gen guide), sidebar + toolsets-reference + plugin guides updated
Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse),
#10458 (provider categories), #10786 (xAI media+search bundle), #2984
(FAL duplicate), #19086 (Google Veo standalone — easy port to plugin
interface).
Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
* feat(video_gen): dynamic schema reflects active backend's capabilities
Address the 'capability variance' question — instead of one tool with a
static schema that lies about what every backend supports, the
video_generate tool now rebuilds its description at get_definitions()
time based on the configured video_gen.provider and video_gen.model.
The agent sees backend-specific guidance up-front:
- 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is
REQUIRED; text-only prompts will be rejected'
- 'fal-ai/veo3.1' (t2v): no image_url restriction shown
- xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7
reference_image_urls'
- Backends without edit/extend: 'not supported on this backend — surface
that they need to switch backends via hermes tools'
This is the same pattern PR #22694 used for delegate_task self-capping —
documented in the dynamic-tool-schemas skill. Cache invalidation is
free: get_tool_definitions() already memoizes on config.yaml mtime, so a
mid-session backend swap rebuilds the schema automatically.
Tested:
- Empirical FAL OpenAPI schema check confirms image-to-video models
require image_url (FAL returns HTTP 422 otherwise) — client-side
rejection in FALVideoGenProvider.generate() now prevents the wasted
round-trip
- Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean
missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches
- 6 new tests cover the builder (no config / image-only / full-surface /
text-only / unknown provider / registry wiring), all passing
- 37/37 in the slice, 134/134 in the broader regression set
* test(video_gen/xai): full surface integration tests + cleaner schema
Verified end-to-end that the xAI plugin handles every documented mode
from PR #10600's surface: text-to-video, image-to-video,
reference-images-to-video, video edit, video extend (with and without
prompt). All five modes route to the correct xAI endpoint
(/videos/generations, /videos/edits, /videos/extensions) with the right
payload shape (image / reference_images / video keys), and all five
client-side rejections fire before the network: edit-without-prompt,
extend-without-video_url, image+refs conflict, >7 references, and
duration/aspect_ratio clamping.
15 new integration tests grouped into four classes (endpoint routing,
modalities, validation, clamping). httpx is stubbed via a small fake
AsyncClient that records POSTs so the tests assert the actual payload
the plugin would send to xAI — not just the success/error envelope.
Also cleaned up a description redundancy: when a model's operations
match the backend's overall set, we no longer print the duplicate
'operations supported by this model' line. xAI's description now reads:
Active backend: xAI . model: grok-imagine-video
- operations supported by this backend: edit, extend, generate
- modalities supported by this backend: image, reference_images, text
- aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16
- resolution choices: 480p, 720p
- duration range: 1-15s
- reference_image_urls: up to 7 images
Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
* feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing
Two design changes per Teknium:
1) Drop edit/extend from the tool surface entirely. Only text-to-video
and image-to-video remain. The agent sees a clean tool with two
modalities; backend-specific quirks like xAI's edit/extend endpoints
stay out of the unified schema.
2) FAL: pick a model FAMILY once, the plugin routes between the
family's text-to-video and image-to-video endpoints based on whether
image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND
'fal-ai/veo3.1/image-to-video' as separate options — they pick
'veo3.1', and the plugin handles the rest.
Catalog rewritten as families:
veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video
pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video
kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video / fal-ai/kling-video/o3/standard/image-to-video
xAI uses a single endpoint (/videos/generations) for both modes,
routed by the presence of the 'image' field in the payload — no
edit/extend exposure.
Schema changes:
- VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params:
prompt (required), image_url, reference_image_urls, duration,
aspect_ratio, resolution, negative_prompt, audio, seed, model.
- VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS,
DEFAULT_OPERATION. capabilities() drops 'operations' key.
- success_response: add 'modality' field ('text' | 'image') so the
agent and logs can see which endpoint was actually hit.
Dynamic schema builder simplified — no operations bullet, no
'switch backends if you need edit/extend' guidance. When the active
backend supports both modalities (the common case), description reads:
Active backend: FAL . model: pixverse-v6
- supports both text-to-video (omit image_url) and image-to-video
(pass image_url) - routes automatically
- aspect_ratio choices: 16:9, 9:16, 1:1
- resolution choices: 360p, 540p, 720p, 1080p
- duration range: 1-15s
- audio: pass audio=true to enable native audio (pricing tier)
- negative_prompt: supported
Tests: 51 in the video_gen slice, 216 across the broader image+video
sweep, all passing. New FAL routing tests prove pixverse-v6 + no image
hits text-to-video endpoint, pixverse-v6 + image_url hits
image-to-video endpoint, same for veo3.1 and kling-o3-standard.
Docs updated: developer-guide page rewrites the 'model families' pattern
as a first-class section so external plugin authors know the convention.
toolsets-reference and toolsets.py descriptions match the new surface.
Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
* feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers
Catalog now covers everything Teknium specced from FAL:
Cheap tier:
ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / image-to-video
pixverse-v6 fal-ai/pixverse/v6/text-to-video / image-to-video
Premium tier:
veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video
seedance-2.0 bytedance/seedance-2.0/text-to-video / image-to-video
kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / image-to-video
happy-horse fal-ai/happy-horse/text-to-video / image-to-video
DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane
defaults, both modalities) — better first-run UX for users who haven't
explicitly picked a model.
New family-entry knob: image_param_key. Kling v3 4K's image-to-video
endpoint expects start_image_url instead of image_url; declaring
image_param_key='start_image_url' on the family lets _build_payload
remap correctly. Other families default to plain image_url.
Per-family capability flags reflect each model's docs:
- LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution
enum exposed by FAL — let endpoint apply defaults)
- Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported,
negative prompts NOT supported per docs
- Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative
- Veo 3.1: unchanged, 16:9/9:16, 4/6/8s
Tests: +5 covering the new families (full catalog, Kling 4K
start_image_url remap, Seedance routing, LTX payload minimality, Happy
Horse minimality). 56/56 in the slice green.
Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes
already has a direct xAI plugin that talks to xAI's own API; routing
the same model through FAL's wrapper would duplicate the surface
without adding capabilities. Users on FAL who want Grok-Imagine should
use the xAI plugin directly; flag if you want both routes available.
* test(video_gen): tool-surface routing matrix — every model x modality
End-to-end matrix test driven through _handle_video_generate() — the
actual function the agent's video_generate tool call lands in. Writes
config.yaml, invokes the registered handler with a raw args dict, then
asserts the outbound HTTP/SDK call hit the right endpoint with the right
payload shape.
Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new
families as they're added (add a family to FAL_FAMILIES and you get
both modalities tested for free).
Coverage:
- All 6 FAL families x {text-only, text+image} = 12 cases
- xAI x {text-only, text+image} = 2 cases
- tool-level model= arg overrides config = 2 cases
For each case, verifies:
- result['success'] is True
- result['modality'] matches input shape ('text' if no image_url, 'image' otherwise)
- outbound endpoint URL matches the family's text_endpoint or image_endpoint
- text-only payloads carry no image-shaped keys
- text+image payloads carry the family's image key (image_url for most,
start_image_url for kling-v3-4k, wrapped 'image' object for xAI)
All 16 cases passing. Confirms the tool surface routes every
(provider, model, modality) combination correctly with zero leakage.
* feat(video_gen): keep video_gen out of first-run setup, surface in status
Two changes:
1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in
the first-run toolset checklist. Video gen is niche, paid, and slow —
most users don't want it nagging them during initial setup. Anyone
who wants it opts in via 'hermes tools' -> Video Generation, which
already routes to the provider+model picker.
2. The 'hermes setup' status panel learns about video_gen — but only
shows the row when a plugin reports available. Users without
FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of
those keys see 'Video Generation (FAL) ✓' as confirmation it's wired.
Verified live:
- Fresh install (no creds): zero video_gen mentions in wizard.
- With FAL_KEY: status row appears with active backend name.
- 160/160 in the setup + tools_config + video_gen test slice.
Rationale: image_gen is on by default because it's a featured creative
tool used in casual chat (telegrams, etc). Video gen is heavier — long
wait, paid per-second pricing. Default-off matches user intent better.
---------
Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>