Commit graph

190 commits

Author SHA1 Message Date
Keira Voss
2ba9b29f37 docs(plugins): correct pre_gateway_dispatch doc text and add hooks.md section
Follow-up to aeff6dfe:

- Fix semantic error in VALID_HOOKS inline comment ("after core auth" ->
  "before auth"). Hook intentionally runs BEFORE auth so plugins can
  handle unauthorized senders without triggering the pairing flow.
- Fix wrong class name in the same comment (HermesGateway ->
  GatewayRunner, matching gateway/run.py).
- Add a full ### pre_gateway_dispatch section in
  website/docs/user-guide/features/hooks.md (matches the pattern of
  every other plugin hook: signature, params table, fires-where,
  return-value table, use cases, two worked examples) plus a row in
  the quick-reference table.
- Add the anchor link on the plugins.md table row so it matches the
  other hook entries.

No code behavior change.
2026-04-24 03:02:03 -07:00
Keira Voss
1ef1e4c669 feat(plugins): add pre_gateway_dispatch hook
Introduces a new plugin hook `pre_gateway_dispatch` fired once per
incoming MessageEvent in `_handle_message`, after the internal-event
guard but before the auth / pairing chain. Plugins may return a dict
to influence flow:

    {"action": "skip",    "reason": "..."}  -> drop (no reply)
    {"action": "rewrite", "text":   "..."}  -> replace event.text
    {"action": "allow"}  /  None             -> normal dispatch

Motivation: gateway-level message-flow patterns that don't fit cleanly
into any single adapter — e.g. listen-only group-chat windows (buffer
ambient messages, collapse on @mention), or human-handover silent
ingest (record messages while an owner handles the chat manually).
Today these require forking core; with this hook they can live in a
single profile-agnostic plugin.

Hook runs BEFORE auth so plugins can handle unauthorized senders
(e.g. customer-service handover ingest) without triggering the
pairing-code flow. Exceptions in plugin callbacks are caught and
logged; the first non-None action dict wins, remaining results are
ignored.

Includes:
- `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py`
- Invocation block in `gateway/run.py::_handle_message`
- 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py`
  (skip, rewrite, allow, exception safety, internal-event bypass)
- 2 additional tests in `tests/hermes_cli/test_plugins.py`
- Table entry in `website/docs/user-guide/features/plugins.md`

Made-with: Cursor
2026-04-24 03:02:03 -07:00
Teknium
5a1c599412
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)

Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.

Supersedes #12550.

No code changes in this commit.

* feat(browser): add persistent CDP supervisor for dialog + frame detection

Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.

Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.

Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.

Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.

Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.

E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.

No agent-facing tool wiring in this commit (comes next).

* feat(browser): add browser_dialog tool wired to CDP supervisor

Agent-facing response-only tool. Schema:
  action: 'accept' | 'dismiss' (required)
  prompt_text: response for prompt() dialogs (optional)
  dialog_id: disambiguate when multiple dialogs queued (optional)

Handler:
  SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)

check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.

Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.

* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot

Supervisor lifecycle:
  * _get_session_info lazy-starts the supervisor after a session row is
    materialized — covers every backend code path (Browserbase, cdp_url
    override, /browser connect, future providers) with one hook.
  * cleanup_browser(task_id) stops the supervisor for that task first
    (before the backend tears down CDP).
  * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
  * /browser connect eagerly starts the supervisor for task 'default'
    so the first snapshot already shows pending_dialogs.
  * /browser disconnect stops the supervisor.

CDP URL resolution for the supervisor:
  1. BROWSER_CDP_URL / browser.cdp_url override.
  2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).

browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.

Config defaults:
  * browser.dialog_policy: 'must_respond' (new)
  * browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.

Deadlock fix in supervisor event dispatch:
  * _on_dialog_opening and _on_target_attached used to await CDP calls
    while the reader was still processing an event — but only the reader
    can set the response Future, so the call timed out.
  * Both now fire asyncio.create_task(...) so the reader stays pumping.
  * auto_dismiss/auto_accept now actually close the dialog immediately.

Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
  * supervisor start/snapshot
  * main-frame alert detection + dismiss
  * iframe.contentWindow alert
  * prompt() with prompt_text reply
  * respond with no pending dialog -> clean error
  * auto_dismiss clears on event
  * registry idempotency
  * registry stop -> snapshot reports inactive
  * browser_dialog tool no-supervisor error
  * browser_dialog invalid action
  * browser_dialog end-to-end via tool handler

xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.

* docs(browser): document browser_dialog tool + CDP supervisor

- user-guide/features/browser.md: new browser_dialog section with
  workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
  bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
  toolset row with note on pending_dialogs / frame_tree snapshot fields

Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).

* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility

Found via Browserbase E2E test that revealed two production-critical issues:

1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
   CDP proxy tears down our long-lived WebSocket whenever a short-lived
   client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
   Fixed with a reconnecting _run loop that re-attaches with exponential
   backoff on drops. _page_session_id and _child_sessions are reset on each
   reconnect; pending_dialogs and frames are preserved across reconnects.

2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
   Playwright-based CDP proxy dismisses alert/confirm/prompt before our
   Page.handleJavaScriptDialog call can respond. So pending_dialogs is
   empty by the time the agent reads a snapshot on Browserbase.

   Added a recent_dialogs ring buffer (capacity 20) that retains a
   DialogRecord for every dialog that opened, with a closed_by tag:
     * 'agent'       — agent called browser_dialog
     * 'auto_policy' — local auto_dismiss/auto_accept fired
     * 'watchdog'    — must_respond timeout auto-dismissed (300s default)
     * 'remote'      — browser/backend closed it on us (Browserbase)

   Agents on Browserbase now see the dialog history with closed_by='remote'
   so they at least know a dialog fired, even though they couldn't respond.

3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
   'message' field (CDP spec has only 'result' and 'userInput') but our
   _on_dialog_closed was matching on message. Fixed to match by session_id
   + oldest-first, with a safety assumption that only one dialog is in
   flight per session (the JS thread is blocked while a dialog is up).

Docs + tests updated:
  * browser.md: new availability matrix showing the three backends and
    which mode (pending / recent / response) each supports
  * developer-guide/browser-supervisor.md: three-field snapshot schema
    with closed_by semantics
  * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
    passing against real Chrome)

E2E verified both backends:
  * Local Chrome via /browser connect: detect + respond full workflow
    (smoke_supervisor.py all 7 scenarios pass)
  * Browserbase: detect via recent_dialogs with closed_by='remote'
    (smoke_supervisor_browserbase_v2.py passes)

Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.

* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)

Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.

The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.

Flow when a page calls alert('hi'):
  1. window.alert override intercepts, builds XHR GET to
     http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
  2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
  3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
     it as a pending dialog with bridge_request_id set
  4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
  5. Supervisor calls Fetch.fulfillRequest with JSON body:
     {accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
  6. The injected script parses the body, returns the appropriate value
     from the override (undefined for alert, bool for confirm, string|null
     for prompt)

This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.

Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.

Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).

E2E VERIFIED:
  * Local Chrome: 13/13 pytest tests green (12 original + new
    test_bridge_captures_prompt_and_returns_reply_text that asserts
    window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
  * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
    - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
    - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
      → page.prompt_ret === 'AGENT-REPLY' ✓
    - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
    - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓

Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.

* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)

Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).

Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.

Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.

Agent workflow:
  1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
  2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
                 params={'expression': 'document.title', 'returnByValue': True})
  3. Supervisor dispatches the call on the OOPIF's child session

Supervisor state fixes needed along the way:
  * _on_frame_detached now skips reason='swap' (frame migrating processes)
  * _on_frame_detached also skips when the frame is an OOPIF with a live
    child session — Browserbase fires spurious remove events when a
    same-origin iframe gets promoted to OOPIF
  * _on_target_detached clears cdp_session_id but KEEPS the frame record
    so the agent still sees the OOPIF in frame_tree during transient
    session flaps

E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
  browser_cdp(method='Runtime.evaluate',
              params={'expression': 'document.title', 'returnByValue': True},
              frame_id=<OOPIF>)
  → {'success': True, 'result': {'value': 'Example Domain'}}

  The iframe is <iframe src='https://example.com/'> inside a top-level
  data: URL page on a real Browserbase session. The agent Runtime.evaluates
  INSIDE the cross-origin iframe and gets example.com's title back.

Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
  * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
    verifies routing via supervisor, Runtime.evaluate returns 1+1=2
  * test_browser_cdp_frame_id_missing_supervisor — clean error when no
    supervisor attached
  * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
    frame_id

Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.

* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process

When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:

  * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
  * Chrome with --site-per-process so the cross-origin iframe becomes a
    real OOPIF in its own process
  * Navigate, find OOPIF in supervisor.frame_tree, call
    browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
    through the supervisor's child session
  * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
    inner page, retrieved via OOPIF eval)

PASSED on 2026-04-23.

Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.

chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.

Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.

* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count

Pre-merge docs audit revealed two gaps:

1. user-guide/configuration.md browser config example was missing the
   two new dialog_* knobs. Added with a short table explaining
   must_respond / auto_dismiss / auto_accept semantics and a link to
   the feature page for the full workflow.

2. reference/tools-reference.md header said '54 built-in tools' — real
   count on main is 54, this branch adds browser_dialog so it's 55.
   Fixed the header.  (browser count was already correctly bumped
   11 -> 12 in the earlier docs commit.)

No code changes.
2026-04-23 22:23:37 -07:00
Teknium
f593c367be
feat(dashboard): reskin extension points for themes and plugins (#14776)
Themes and plugins can now pull off arbitrary dashboard reskins (cockpit
HUD, retro terminal, etc.) without touching core code.

Themes gain four new fields:
- layoutVariant: standard | cockpit | tiled — shell layout selector
- assets: {bg, hero, logo, crest, sidebar, header, custom: {...}} —
  artwork URLs exposed as --theme-asset-* CSS vars
- customCSS: raw CSS injected as a scoped <style> tag on theme apply
  (32 KiB cap, cleaned up on theme switch)
- componentStyles: per-component CSS-var overrides (clipPath,
  borderImage, background, boxShadow, ...) for card/header/sidebar/
  backdrop/tab/progress/badge/footer/page

Plugin manifests gain three new fields:
- tab.override: replaces a built-in route instead of adding a tab
- tab.hidden: register component + slots without adding a nav entry
- slots: declares shell slots the plugin populates

10 named shell slots: backdrop, header-left/right/banner, sidebar,
pre-main, post-main, footer-left/right, overlay. Plugins register via
window.__HERMES_PLUGINS__.registerSlot(name, slot, Component). A
<PluginSlot> React helper is exported on the plugin SDK.

Ships a full demo at plugins/strike-freedom-cockpit/ — theme YAML +
slot-only plugin that reproduces a Gundam cockpit dashboard: MS-STATUS
sidebar with live telemetry, COMPASS crest in header, notched card
corners via componentStyles, scanline overlay via customCSS, gold/cyan
palette, Orbitron typography.

Validation:
- 15 new tests in test_web_server.py covering every extended field
- tests/hermes_cli/: 2615 passed (3 pre-existing unrelated failures)
- tsc -b --noEmit: clean
- vite build: 418 kB bundle, ~2 kB delta for slots/theme extensions

Co-authored-by: Teknium <p@nousresearch.com>
2026-04-23 15:31:01 -07:00
Teknium
255ba5bf26
feat(dashboard): expand themes to fonts, layout, density (#14725)
Dashboard themes now control typography and layout, not just colors.
Each built-in theme picks its own fonts, base size, radius, and density
so switching produces visible changes beyond hue.

Schema additions (per theme):

- typography — fontSans, fontMono, fontDisplay, fontUrl, baseSize,
  lineHeight, letterSpacing. fontUrl is injected as <link> on switch
  so Google/Bunny/self-hosted stylesheets all work.
- layout — radius (any CSS length) and density
  (compact | comfortable | spacious, multiplies Tailwind spacing).
- colorOverrides (optional) — pin individual shadcn tokens that would
  otherwise derive from the palette.

Built-in themes are now distinct beyond palette:

- default  — system stack, 15px, 0.5rem radius, comfortable
- midnight — Inter + JetBrains Mono, 14px, 0.75rem, comfortable
- ember    — Spectral (serif) + IBM Plex Mono, 15px, 0.25rem
- mono     — IBM Plex Sans + Mono, 13px, 0 radius, compact
- cyberpunk— Share Tech Mono everywhere, 14px, 0 radius, compact
- rose     — Fraunces (serif) + DM Mono, 16px, 1rem, spacious

Also fixes two bugs:

1. Custom user themes silently fell back to default. ThemeProvider
   only applied BUILTIN_THEMES[name], so YAML files in
   ~/.hermes/dashboard-themes/ showed in the picker but did nothing.
   Server now ships the full normalised definition; client applies it.
2. Docs documented a 21-token flat colors schema that never matched
   the code (applyPalette reads a 3-layer palette). Rewrote the
   Themes section against the actual shape.

Implementation:

- web/src/themes/types.ts: extend DashboardTheme with typography,
  layout, colorOverrides; ThemeListEntry carries optional definition.
- web/src/themes/presets.ts: 6 built-ins with distinct typography+layout.
- web/src/themes/context.tsx: applyTheme() writes palette+typography+
  layout+overrides as CSS vars, injects fontUrl stylesheet, fixes the
  fallback-to-default bug via resolveTheme(name).
- web/src/index.css: html/body/code read the new theme-font vars;
  --radius-sm/md/lg/xl derive from --theme-radius; --spacing scales
  with --theme-spacing-mul so Tailwind utilities shift with density.
- hermes_cli/web_server.py: _normalise_theme_definition() parses loose
  YAML (bare hex strings, partial blocks) into the canonical wire
  shape; /api/dashboard/themes ships full definitions for user themes.
- tests/hermes_cli/test_web_server.py: 16 new tests covering the
  normaliser and discovery (rejection cases, clamping, defaults).
- website/docs/user-guide/features/web-dashboard.md: rewrite Themes
  section with real schema, per-model tables, full YAML example.
2026-04-23 13:49:51 -07:00
Vivek Ganesan
7ca2f70055 fix(docs): Add links to Atropos and wandb in user guide
fix #7724

The user guide has mention of atropos and wandb but no links.  This PR adds links so that users dont have to search for them.
2026-04-23 03:07:06 -07:00
wenhao7
156b358320 docs(cron): explain runtime resolution for null model/provider
Clarify job storage behavior regarding model and provider fields.
2026-04-23 02:04:45 -07:00
saitsuki
9357db2844 docs: fix fallback behavior description — it is per-turn, not per-session
The documentation claimed fallback activates 'at most once per session',
but the actual implementation restores the primary model at the start of
every run_conversation() call via _restore_primary_runtime().

Relevant source: run_agent.py lines 1666-1694 (snapshot), 6454-6517
(restore), 8681-8684 (called each turn).

Updated the One-Shot info box and the summary table to accurately
describe the per-turn restoration behavior.
2026-04-22 21:29:49 -07:00
Abner
b66644f0ec feat(hindsight): richer session-scoped retain metadata
- Add configurable retain_tags / retain_source / retain_user_prefix /
  retain_assistant_prefix knobs for native Hindsight.
- Thread gateway session identity (user_name, chat_id, chat_name,
  chat_type, thread_id) through AIAgent and MemoryManager into
  MemoryProvider.initialize kwargs so providers can scope and tag
  retained memories.
- Hindsight attaches the new identity fields as retain metadata,
  merges per-call tool tags with configured default tags, and uses
  the configurable transcript labels for auto-retained turns.

Co-authored-by: Abner <abner.the.foreman@agentmail.to>
2026-04-22 05:27:10 -07:00
Teknium
bf73ced4f5
docs: document delegation width + depth knobs (#13745)
Fills the three gaps left by the orchestrator/width-depth salvage:

- configuration.md §Delegation: max_concurrent_children, max_spawn_depth,
  orchestrator_enabled are now in the canonical config.yaml reference
  with a paragraph covering defaults, clamping, role-degradation, and
  the 3x3x3=27-leaf cost scaling.
- environment-variables.md: adds DELEGATION_MAX_CONCURRENT_CHILDREN to
  the Agent Behavior table.
- features/delegation.md: corrects stale 'default 5, cap 8' wording
  (that was from the original PR; the salvage landed on default 3 with
  no ceiling and a tool error on excess instead of truncation).
2026-04-21 17:54:39 -07:00
Teknium
3e1a3372ab
docs(delegate): clarify that the parent agent, not the user, populates goal/context (#13698)
The 'subagents know nothing' warning and the 'no conversation history'
constraint both said the user provides the goal/context fields. In
practice the LLM parent agent calls delegate_task; the user configures
the feature but doesn't write delegation calls. Rewording to point at
the parent agent matches how the tool actually works.
2026-04-21 14:27:06 -07:00
pefontana
48ecb98f8a feat(delegate): orchestrator role and configurable spawn depth (default flat)
Adds role='leaf'|'orchestrator' to delegate_task. With max_spawn_depth>=2,
an orchestrator child retains the 'delegation' toolset and can spawn its
own workers; leaf children cannot delegate further (identical to today).

Default posture is flat — max_spawn_depth=1 means a depth-0 parent's
children land at the depth-1 floor and orchestrator role silently
degrades to leaf. Users opt into nested delegation by raising
max_spawn_depth to 2 or 3 in config.yaml.

Also threads acp_command/acp_args through the main agent loop's delegate
dispatch (previously silently dropped in the schema) via a new
_dispatch_delegate_task helper, and adds a DelegateEvent enum with
legacy-string back-compat for gateway/ACP/CLI progress consumers.

Config (hermes_cli/config.py defaults):
  delegation.max_concurrent_children: 3   # floor-only, no upper cap
  delegation.max_spawn_depth: 1           # 1=flat (default), 2-3 unlock nested
  delegation.orchestrator_enabled: true   # global kill switch

Salvaged from @pefontana's PR #11215. Overrides vs. the original PR:
concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only,
no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which
silently enabled one level of orchestration for every user).

Co-authored-by: pefontana <fontana.pedro93@gmail.com>
2026-04-21 14:23:45 -07:00
pefontana
baaf49e9fd docs(delegate): remove default_toolsets from example config and docs
Matches the default-config removal in the preceding commit.
default_toolsets was documented for users to set but was never actually
read at runtime, so showing it in the example config and the delegation
user guide was misleading.

No deprecation note is added: the key was always a no-op, so users who
copied it from the example continue to see no behavior change. Their
config.yaml still parses; the key is just silently unused, same as
before.

Part of Initiative 2 / M0.5.
2026-04-21 13:44:27 -07:00
Teknium
5ffae9228b
feat(image-gen): add GPT Image 2 to FAL catalog (#13677)
Adds OpenAI's new GPT Image 2 model via FAL.ai, selectable through
`hermes tools` → Image Generation. SOTA text rendering (including CJK)
and world-aware photorealism.

- FAL_MODELS entry with image_size_preset style
- 4:3 presets on all aspect ratios — 16:9 (1024x576) falls below
  GPT-Image-2's 655,360 min-pixel floor and would be rejected
- quality pinned to medium (same rule as gpt-image-1.5) for
  predictable Nous Portal billing
- BYOK (openai_api_key) deliberately omitted from supports so all
  users stay on shared FAL billing
- 6 new tests covering preset mapping, quality pinning, and
  supports-whitelist integrity
- Docs table + aspect-ratio map updated

Live-tested end-to-end: 39.9s cold request, clean 1024x768 PNG
2026-04-21 13:35:31 -07:00
kshitijk4poor
9556fef5a1 fix(tui): improve macOS paste and shortcut parity
- support Cmd-as-super and readline-style fallback shortcuts on macOS
- add layered clipboard/OSC52 paste handling and immediate image-path attach
- add IDE terminal setup helpers, terminal parity hints, and aligned docs
2026-04-21 08:00:00 -07:00
Teknium
2d7ff9c5bd feat(tts): complete KittenTTS integration (tools/setup/docs/tests)
Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the
provider behaves like every other TTS backend end to end.

- tools/tts_tool.py: `_check_kittentts_available()` helper and wire
  into `check_tts_requirements()`; extend Opus-conversion list to
  include kittentts (WAV → Opus for Telegram voice bubbles); point the
  missing-package error at `hermes setup tts`.
- hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech"
  toolset picker, with a `kittentts` post_setup hook that auto-installs
  the wheel + soundfile via pip.
- hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install
  flow in `_setup_tts_provider()`, provider_labels entry, and status row
  in the `hermes setup` summary.
- website/docs/user-guide/features/tts.md: add KittenTTS to the provider
  table, config example, ffmpeg note, and the zero-config voice-bubble tip.
- tests/tools/test_tts_kittentts.py: 10 unit tests covering generation,
  model caching, config passthrough, ffmpeg conversion, availability
  detection, and the missing-package dispatcher branch.

E2E verified against the real `kittentts` wheel:
- WAV direct output (pcm_s16le, 24kHz mono)
- MP3 conversion via ffmpeg (from WAV)
- Telegram flow (provider in Opus-conversion list) produces
  `codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the
  `[[audio_as_voice]]` marker
- check_tts_requirements() returns True when kittentts is installed
2026-04-21 01:28:32 -07:00
helix4u
b48ea41d27 feat(voice): add cli beep toggle 2026-04-21 00:29:29 -07:00
Peter Fontana
3988c3c245 feat: shell hooks — wire shell scripts as Hermes hook callbacks
Users can declare shell scripts in config.yaml under a hooks: block that
fire on plugin-hook events (pre_tool_call, post_tool_call, pre_llm_call,
subagent_stop, etc). Scripts receive JSON on stdin, can return JSON on
stdout to block tool calls or inject context pre-LLM.

Key design:
- Registers closures on existing PluginManager._hooks dict — zero changes
  to invoke_hook() call sites
- subprocess.run(shell=False) via shlex.split — no shell injection
- First-use consent per (event, command) pair, persisted to allowlist JSON
- Bypass via --accept-hooks, HERMES_ACCEPT_HOOKS=1, or hooks_auto_accept
- hermes hooks list/test/revoke/doctor CLI subcommands
- Adds subagent_stop hook event fired after delegate_task children exit
- Claude Code compatible response shapes accepted

Cherry-picked from PR #13143 by @pefontana.
2026-04-20 20:53:51 -07:00
Teknium
70111eea24 feat(plugins): make all plugins opt-in by default
Plugins now require explicit consent to load. Discovery still finds every
plugin — user-installed, bundled, and pip — so they all show up in
`hermes plugins` and `/plugins`, but the loader only instantiates
plugins whose name appears in `plugins.enabled` in config.yaml. This
removes the previous ambient-execution risk where a newly-installed or
bundled plugin could register hooks, tools, and commands on first run
without the user opting in.

The three-state model is now explicit:
  enabled     — in plugins.enabled, loads on next session
  disabled    — in plugins.disabled, never loads (wins over enabled)
  not enabled — discovered but never opted in (default for new installs)

`hermes plugins install <repo>` prompts "Enable 'name' now? [y/N]"
(defaults to no). New `--enable` / `--no-enable` flags skip the prompt
for scripted installs. `hermes plugins enable/disable` manage both lists
so a disabled plugin stays explicitly off even if something later adds
it to enabled.

Config migration (schema v20 → v21): existing user plugins already
installed under ~/.hermes/plugins/ (minus anything in plugins.disabled)
are auto-grandfathered into plugins.enabled so upgrades don't silently
break working setups. Bundled plugins are NOT grandfathered — even
existing users have to opt in explicitly.

Also: HERMES_DISABLE_BUNDLED_PLUGINS env var removed (redundant with
opt-in default), cmd_list now shows bundled + user plugins together with
their three-state status, interactive UI tags bundled entries
[bundled], docs updated across plugins.md and built-in-plugins.md.

Validation: 442 plugin/config tests pass. E2E: fresh install discovers
disk-cleanup but does not load it; `hermes plugins enable disk-cleanup`
activates hooks; migration grandfathers existing user plugins correctly
while leaving bundled plugins off.
2026-04-20 04:46:45 -07:00
Teknium
a25c8c6a56 docs(plugins): rename disk-guardian to disk-cleanup + bundled-plugins docs
The original name was cute but non-obvious; disk-cleanup says what it
does. Plugin directory, script, state path, log lines, slash command,
and test module all renamed. No user-visible state exists yet, so no
migration path is needed.

New website page "Built-in Plugins" documents the <repo>/plugins/<name>/
source, how discovery interacts with user/project plugins, the
HERMES_DISABLE_BUNDLED_PLUGINS escape hatch, disk-cleanup's hook
behaviour and deletion rules, and guidance on when a plugin belongs
bundled vs. user-installable. Added to the Features → Core sidebar next
to the main Plugins page, with a cross-reference from plugins.md.
2026-04-20 04:46:45 -07:00
Teknium
f683132c1d
feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969)
OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision
requests to the API server. Both endpoints accept the canonical OpenAI
multimodal shape:

  Chat Completions: {type: text|image_url, image_url: {url, detail?}}
  Responses:        {type: input_text|input_image, image_url: <str>, detail?}

The server validates and converts both into a single internal shape that the
existing agent pipeline already handles (Anthropic adapter converts,
OpenAI-wire providers pass through). Remote http(s) URLs and data:image/*
URLs are supported.

Uploaded files (file, input_file, file_id) and non-image data: URLs are
rejected with 400 unsupported_content_type.

Changes:

- gateway/platforms/api_server.py
  - _normalize_multimodal_content(): validates + normalizes both Chat and
    Responses content shapes. Returns a plain string for text-only content
    (preserves prompt-cache behavior on existing callers) or a canonical
    [{type:text|image_url,...}] list when images are present.
  - _content_has_visible_payload(): replaces the bare truthy check so a
    user turn with only an image no longer rejects as 'No user message'.
  - _handle_chat_completions and _handle_responses both call the new helper
    for user/assistant content; system messages continue to flatten to text.
  - Codex conversation_history, input[], and inline history paths all share
    the same validator. No duplicated normalizers.

- run_agent.py
  - _summarize_user_message_for_log(): produces a short string summary
    ('[1 image] describe this') from list content for logging, spinner
    previews, and trajectory writes. Fixes AttributeError when list
    user_message hit user_message[:80] + '...' / .replace().
  - _chat_content_to_responses_parts(): module-level helper that converts
    chat-style multimodal content to Responses 'input_text'/'input_image'
    parts. Used in _chat_messages_to_responses_input for Codex routing.
  - _preflight_codex_input_items() now validates and passes through list
    content parts for user/assistant messages instead of stringifying.

- tests/gateway/test_api_server_multimodal.py (new, 38 tests)
  - Unit coverage for _normalize_multimodal_content, including both part
    formats, data URL gating, and all reject paths.
  - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses
    verifying multimodal payloads reach _run_agent intact.
  - 400 coverage for file / input_file / non-image data URL.

- tests/run_agent/test_run_agent_multimodal_prologue.py (new)
  - Regression coverage for the prologue no-crash contract.
  - _chat_content_to_responses_parts round-trip coverage.

- website/docs/user-guide/features/api-server.md
  - Inline image examples for both endpoints.
  - Updated Limitations: files still unsupported, images now supported.

Validated live against openrouter/anthropic/claude-opus-4.6:
  POST /v1/chat/completions  → 200, vision-accurate description
  POST /v1/responses         → 200, same image, clean output_text
  POST /v1/chat/completions [file] → 400 unsupported_content_type
  POST /v1/responses [input_file]  → 400 unsupported_content_type
  POST /v1/responses [non-image data URL] → 400 unsupported_content_type

Closes #5621, #8253, #4046, #6632.

Co-authored-by: Paul Bergeron <paul@gamma.app>
Co-authored-by: zhangxicen <zhangxicen@example.com>
Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com>
Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>
2026-04-20 04:16:13 -07:00
helix4u
afba54364e docs(config): document session_search auxiliary controls 2026-04-20 00:47:39 -07:00
Teknium
ea0bd81b84 feat(skills): consolidate find-nearby into maps as a single location skill
find-nearby and the (new) maps optional skill both used OpenStreetMap's
Overpass + Nominatim to answer the same question — 'what's near this
location?' — so shipping both would be duplicate code for overlapping
capability. Consolidate into one active-by-default skill at
skills/productivity/maps/ that is a strict superset of find-nearby.

Moves + deletions:
- optional-skills/productivity/maps/ → skills/productivity/maps/ (active,
  no install step needed)
- skills/leisure/find-nearby/ → DELETED (fully superseded)

Upgrades to maps_client.py so it covers everything find-nearby did:
- Overpass server failover — tries overpass-api.de then
  overpass.kumi.systems so a single-mirror outage doesn't break the skill
  (new overpass_query helper, used by both nearby and bbox)
- nearby now accepts --near "<address>" as a shortcut that auto-geocodes,
  so one command replaces the old 'search → copy coords → nearby' chain
- nearby now accepts --category (repeatable) for multi-type queries in
  one call (e.g. --category restaurant --category bar), results merged
  and deduped by (osm_type, osm_id), sorted by distance, capped at --limit
- Each nearby result now includes maps_url (clickable Google Maps search
  link) and directions_url (Google Maps directions from the search point
  — only when a ref point is known)
- Promoted commonly-useful OSM tags to top-level fields on each result:
  cuisine, hours (opening_hours), phone, website — instead of forcing
  callers to dig into the raw tags dict

SKILL.md:
- Version bumped 1.1.0 → 1.2.0, description rewritten to lead with
  capability surface
- New 'Working With Telegram Location Pins' section replacing
  find-nearby's equivalent workflow
- metadata.hermes.supersedes: [find-nearby] so tooling can flag any
  lingering references to the old skill

External references updated:
- optional-skills/productivity/telephony/SKILL.md — related_skills
  find-nearby → maps
- website/docs/reference/skills-catalog.md — removed the (now-empty)
  'leisure' section, added 'maps' row under productivity
- website/docs/user-guide/features/cron.md — find-nearby example
  usages swapped to maps
- tests/tools/test_cronjob_tools.py, tests/hermes_cli/test_cron.py,
  tests/cron/test_scheduler.py — fixture string values swapped
- cli.py:5290 — /cron help-hint example swapped

Not touched:
- RELEASE_v0.2.0.md — historical record, left intact

E2E-verified live (Nominatim + Overpass, one query each):
- nearby --near "Times Square" --category restaurant --category bar → 3 results,
  sorted by distance, all with maps_url, directions_url, cuisine, phone, website
  where OSM had the tags

All 111 targeted tests pass across tests/cron/, tests/tools/, tests/hermes_cli/.
2026-04-19 05:19:22 -07:00
Teknium
ce410521b3
feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369)
Agents can now send arbitrary CDP commands to the browser. The tool is
gated on a reachable CDP endpoint at session start — it only appears in
the toolset when BROWSER_CDP_URL is set (from '/browser connect') or
'browser.cdp_url' is configured in config.yaml. Backends that don't
currently expose CDP to the Python side (Camofox, default local
agent-browser, cloud providers whose per-session cdp_url is not yet
surfaced) do not see the tool at all.

Tool schema description links to the CDP method reference at
https://chromedevtools.github.io/devtools-protocol/ so the agent can
web_extract specific method docs on demand.

Stateless per call. Browser-level methods (Target.*, Browser.*,
Storage.*) omit target_id. Page-level methods attach to the target
with flatten=true and dispatch the method on the returned sessionId.
Clean errors when the endpoint becomes unreachable mid-session or
the URL isn't a WebSocket.

Tests: 19 unit (mock CDP server + gate checks) + E2E against real
headless Chrome (Target.getTargets, Browser.getVersion,
Runtime.evaluate with target_id, Page.navigate + re-eval, bogus
method, bogus target_id, missing endpoint) + E2E of the check_fn
gate (tool hidden without CDP URL, visible with it, hidden again
after unset).
2026-04-19 00:03:10 -07:00
helix4u
d66414a844 docs(custom-providers): use key_env in examples 2026-04-18 23:07:59 -07:00
Erosika
21d5ef2f17 feat(honcho): wizard cadence default 2, surface reasoning level, backwards-compat fallback
Setup wizard now always writes dialecticCadence=2 on new configs and
surfaces the reasoning level as an explicit step with all five options
(minimal / low / medium / high / max), always writing
dialecticReasoningLevel.

Code keeps a backwards-compat fallback of 1 when dialecticCadence is
unset so existing honcho.json configs that predate the setting keep
firing every turn on upgrade. New setups via the wizard get 2
explicitly; docs show 2 as the default.

Also scrubs editorial lines from code and docs ("max is reserved for
explicit tool-path selection", "Unset → every turn; wizard pre-fills 2",
and similar process-exposing phrasing) and adds an inline link to
app.honcho.dev where the server-side observation sync is mentioned in
honcho.md. Recommended cadence range updated to 1-5 across docs and
wizard copy.
2026-04-18 22:50:55 -07:00
Erosika
098efde848 docs(honcho): wizard cadence default 2, prewarm/depth + observation + multi-peer
- cli: setup wizard pre-fills dialecticCadence=2 (code default stays 1
  so unset → every turn)
- honcho.md: fix stale dialecticCadence default in tables, add
  Session-Start Prewarm subsection (depth runs at init), add
  Query-Adaptive Reasoning Level subsection, expand Observation
  section with directional vs unified semantics and per-peer patterns
- memory-providers.md: fix stale default, rename Multi-agent/Profiles
  to Multi-peer setup, add concrete walkthrough for new profiles and
  sync, document observation toggles + presets, link to honcho.md
- SKILL.md: fix stale defaults, add Depth at session start callout
2026-04-18 22:50:55 -07:00
Erosika
5f9907c116 chore(honcho): drop docs from PR scope, scrub commentary
- Revert website/docs and SKILL.md changes; docs unification handled separately
- Scrub commit/PR refs and process narration from code comments and test
  docstrings (no behavior change)
2026-04-18 22:50:55 -07:00
Erosika
78586ce036 fix(honcho): dialectic lifecycle — defaults, retry, prewarm consumption
Several correctness and cost-safety fixes to the Honcho dialectic path
after a multi-turn investigation surfaced a chain of silent failures:

- dialecticCadence default flipped 3 → 1. PR #10619 changed this from 1 to
  3 for cost, but existing installs with no explicit config silently went
  from per-turn dialectic to every-3-turns on upgrade. Restores pre-#10619
  behavior; 3+ remains available for cost-conscious setups. Docs + wizard
  + status output updated to match.

- Session-start prewarm now consumed. Previously fired a .chat() on init
  whose result landed in HonchoSessionManager._dialectic_cache and was
  never read — pop_dialectic_result had zero call sites. Turn 1 paid for
  a duplicate synchronous dialectic. Prewarm now writes directly to the
  plugin's _prefetch_result via _prefetch_lock so turn 1 consumes it with
  no extra call.

- Prewarm is now dialecticDepth-aware. A single-pass prewarm can return
  weak output on cold peers; the multi-pass audit/reconcile cycle is
  exactly the case dialecticDepth was built for. Prewarm now runs the
  full configured depth in the background.

- Silent dialectic failure no longer burns the cadence window.
  _last_dialectic_turn now advances only when the result is non-empty.
  Empty result → next eligible turn retries immediately instead of
  waiting the full cadence gap.

- Thread pile-up guard. queue_prefetch skips when a prior dialectic
  thread is still in-flight, preventing stacked races on _prefetch_result.

- First-turn sync timeout is recoverable. Previously on timeout the
  background thread's result was stored in a dead local list. Now the
  thread writes into _prefetch_result under lock so the next turn
  picks it up.

- Cadence gate applies uniformly. At cadence=1 the old "cadence > 1"
  guard let first-turn sync + same-turn queue_prefetch both fire.
  Gate now always applies.

- Restored query-length reasoning-level scaling, dropped in 9a0ab34c.
  Scales dialecticReasoningLevel up on longer queries (+1 at ≥120 chars,
  +2 at ≥400), clamped at reasoningLevelCap. Two new config keys:
  `reasoningHeuristic` (bool, default true) and `reasoningLevelCap`
  (string, default "high"; previously parsed but never enforced).
  Respects dialecticDepthLevels and proportional lighter-early passes.

- Restored short-prompt skip, dropped in ef7f3156. One-word
  acknowledgements ("ok", "y", "thanks") and slash commands bypass
  both injection and dialectic fire.

- Purged dead code in session.py: prefetch_dialectic, _dialectic_cache,
  set_dialectic_result, pop_dialectic_result — all unused after prewarm
  refactor.

Tests: 542 passed across honcho_plugin/, agent/test_memory_provider.py,
and run_agent/test_run_agent.py. New coverage:
- TestTrivialPromptHeuristic (classifier + prefetch/queue skip)
- TestDialecticCadenceAdvancesOnSuccess (empty-result retry, pile-up guard)
- TestSessionStartDialecticPrewarm (prewarm consumed, sync fallback)
- TestReasoningHeuristic (length bumps, cap clamp, interaction with depth)
- TestDialecticLifecycleSmoke (end-to-end 8-turn session walk)
2026-04-18 22:50:55 -07:00
Teknium
f9667331e5
docs(browser): improve /browser connect setup guidance (#12123)
- Note that /browser connect is CLI-only and won't work in gateways (WebUI, Telegram, Discord).
- Update the Chrome launch command to use a dedicated --user-data-dir, so port 9222 actually comes up even when Chrome is already running with the user's regular profile.
- Add --no-first-run --no-default-browser-check to skip the fresh-profile wizard.
- Explain why the dedicated user-data-dir matters.

Community tip via Karamjit Singh.

Co-authored-by: teknium1 <teknium@noreply.github.com>
2026-04-18 04:14:05 -07:00
Teknium
a2c9f5d0a7
docs(execute_code): document project/strict execution modes (#12073)
Follow-up to PR #11971. Documents the new code_execution.mode config
key and what each mode actually does.

- user-guide/configuration.md: add mode: project to the yaml example,
  explain project vs strict and call out that security invariants are
  identical across modes.
- user-guide/features/code-execution.md: new 'Execution Mode' section
  with a comparison table and usage guidance; update the 'temporary
  directory' note so it reflects that script.py runs in the session
  CWD in project mode (staging dir stays on PYTHONPATH for imports);
  drop stale 'sandboxed' framing from the intro and skill-passthrough
  paragraph.
- getting-started/learning-path.md: update the one-line Code Execution
  summary to match (no longer 'sandboxed environments' — the default
  runs in the session's real working directory).

No code changes.
2026-04-18 01:53:09 -07:00
Teknium
54e0eb24c0
docs: correctness audit — fix wrong values, add missing coverage (#11972)
Comprehensive audit of every reference/messaging/feature doc page against the
live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY,
TOOLSETS, tool registry, on-disk skills). Every fix was verified against code
before writing.

### Wrong values fixed (users would paste-and-fail)

- reference/environment-variables.md:
  - DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192
    actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`.
  - MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual
    `/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint).
- reference/toolsets-reference.md MCP example used the non-existent nested
  `mcp: servers:` key \u2192 real key is the flat `mcp_servers:`.
- reference/skills-catalog.md listed ~20 bundled skills that no longer exist
  on disk (all moved to `optional-skills/`). Regenerated the whole bundled
  section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names.
- messaging/slack.md ":::info" callout claimed Slack has no
  `free_response_channels` equivalent; both the env var and the yaml key are
  in fact read.
- messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the
  adapter only reads `extra.markdown_support` from config.yaml. Removed the
  env var row and noted config-only nature.
- messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`.

### Missing coverage added

- Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in
  PROVIDER_REGISTRY but undocumented everywhere. Added sections to
  integrations/providers.md, rows to quickstart.md and fallback-providers.md.
- integrations/providers.md "Fallback Model" provider list now includes
  gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock.
- reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER
  enum in env-vars now include the same set.
- reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`.
  Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`.
- reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added
  `feishu_doc` and `feishu_drive` toolset sections.
- reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core
  rows + all missing `hermes-<platform>` toolsets in the platform table
  (bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin,
  homeassistant, webhook, gateway). Fixed the `debugging` composite to
  describe the actual `includes=[...]` mechanism.
- reference/optional-skills-catalog.md: added `fitness-nutrition`.
- reference/environment-variables.md: added NOUS_BASE_URL,
  NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL,
  XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE,
  BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS,
  DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS,
  QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX.
- messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY,
  HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT
  _DELAY_SECONDS (all actively read by the adapter).
- messaging/matrix.md: documented MATRIX_REACTIONS (default true).
- messaging/telegram.md: removed the redundant second Webhook Mode section
  that invented a `telegram.webhook_mode: true` yaml key the adapter does
  not read.
- user-guide/features/hooks.md: added `on_session_finalize` and
  `on_session_reset` (both emitted via invoke_hook but undocumented).
- user-guide/features/api-server.md: documented GET /health/detailed, the
  `/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events
  (10 routes that were live but undocumented).
- user-guide/features/fallback-providers.md: added `approval` and
  `title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth
  to the supported-providers table.
- user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add
  oversight in #11942).
- user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`;
  yaml example block gains `mistral:`, `gemini:`, `xai:` subsections.
  Auxiliary-provider enum now enumerates all real registry entries.
- reference/faq.md: stale AIAgent/config examples bumped from
  `nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to
  `claude-opus-4.7`.

### Docs-site integrity

- guides/build-a-hermes-plugin.md referenced two nonexistent hooks
  (`pre_api_request`, `post_api_request`). Replaced with the real
  `on_session_finalize` / `on_session_reset` entries.
- messaging/open-webui.md and features/api-server.md had pre-existing
  broken links to `/docs/user-guide/features/profiles` (actual path is
  `/docs/user-guide/profiles`). Fixed.
- reference/skills-catalog.md had one `<1%` literal that MDX parsed as a
  JSX tag. Escaped to `&lt;1%`.

### False positives filtered out (not changed, verified correct)

- `/set-home` is a registered alias of `/sethome` \u2014 docs were fine.
- `hermes setup gateway` is valid syntax (`hermes setup \<section\>`);
  changed in qqbot.md for cross-doc consistency, not as a bug fix.
- Telegram reactions "disabled by default" matches code (default `"false"`).
- Matrix encryption "opt-in" matches code (empty env default \u2192 disabled).
- `pre_api_request` / `post_api_request` hooks do NOT exist in current code;
  documented instead the real `on_session_finalize` / `on_session_reset`.
- SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it).

Validation:
- `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning).
- `ascii-guard lint docs` \u2014 124 files, 0 errors.
- 22 files changed, +317 / \u2212158.
2026-04-18 01:45:48 -07:00
Teknium
1c352f6b1d
docs(browser): expand Camofox persistence guide with troubleshooting (#11957)
The existing 'Persistent browser sessions' section had the correct config
snippet but users still hit the flag at the wrong config path, assumed
Hermes could force persistence when the server was ephemeral, and had no
way to verify the flag was actually taking effect.

Adds to that section:
- Warning admonition calling out the nested path vs top-level mistake.
- Explicit 'What Hermes does / does not do' split so users understand
  Hermes can only send a stable userId; the Camofox server must map it
  to a persistent profile.
- 5-step verification flow for confirming persistence works end-to-end.
- Reminder to restart Hermes after editing config.yaml.
- Where Hermes derives the stable userId (~/.hermes/browser_auth/camofox/)
  so users can reset or back up state.

Docs-only change.
2026-04-17 21:23:31 -07:00
Teknium
11a89cc032
docs: backfill coverage for recently-merged features (#11942)
Fills documentation gaps that accumulated as features merged ahead of their
docs updates. All additions are verified against code and the originating PRs.

Providers:
- Ollama Cloud (#10782) — new provider section, env vars, quickstart/fallback rows
- xAI Grok Responses API + TTS (#10783) — provider note, TTS table + config
- Google Gemini CLI OAuth (#11270) — quickstart/fallback/cli-commands entries
- NVIDIA NIM (#11774) — NVIDIA_API_KEY / NVIDIA_BASE_URL in env-vars reference
- HERMES_INFERENCE_PROVIDER enum updated

Messaging:
- DISCORD_ALLOWED_ROLES (#11608) — env-vars, discord.md access control section
- DingTalk QR device-flow (#11574) — wizard path in Option A + openClaw disclosure
- Feishu document comment intelligent reply (#11898) — full section + 3-tier access control + CLI

Skills / commands:
- concept-diagrams skill (#11363) — optional-skills-catalog entry
- /gquota (#11270) — slash-commands reference

Build: docusaurus build passes, ascii-guard lint 0 errors.
2026-04-17 21:22:11 -07:00
asurla
3b569ff576 feat(providers): add native NVIDIA NIM provider
Adds NVIDIA NIM as a first-class provider: ProviderConfig in
auth.py, HermesOverlay in providers.py, curated models
(Nemotron plus other open source models hosted on
build.nvidia.com), URL mapping in model_metadata.py, aliases
(nim, nvidia-nim, build-nvidia, nemotron), and env var tests.

Docs updated: providers page, quickstart table, fallback
providers table, and README provider list.
2026-04-17 13:47:46 -07:00
Teknium
e5cde568b7
feat(skills): add 'hermes skills reset' to un-stick bundled skills (#11468)
When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.

Adds an escape hatch for this case.

  hermes skills reset <name>
      Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
      re-baselines against the user's current copy. Future 'hermes update'
      runs accept upstream changes again. Non-destructive.

  hermes skills reset <name> --restore
      Also deletes the user's copy and re-copies the bundled version.
      Use when you want the pristine upstream skill back.

Also available as /skills reset in chat.

- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
  handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
  repro, --restore, unknown-skill error, upstream-removed-skill, and
  no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
  section explaining the origin-hash mechanic + reset usage
2026-04-17 00:41:31 -07:00
Teknium
220fa7db90
feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro (#11406)
* feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro

Upstream asked for these two upgrades ASAP — the old entries show
stale models when newer, higher-quality versions are available on FAL.

Recraft V3 → Recraft V4 Pro
  ID:    fal-ai/recraft-v3 → fal-ai/recraft/v4/pro/text-to-image
  Price: $0.04/image → $0.25/image (6x — V4 Pro is premium tier)
  Schema: V4 dropped the required `style` enum entirely; defaults
          handle taste now. Added `colors` and `background_color`
          to supports for brand-palette control. `seed` is not
          supported by V4 per the API docs.

Nano Banana → Nano Banana Pro
  ID:    fal-ai/nano-banana → fal-ai/nano-banana-pro
  Price: $0.08/image → $0.15/image (1K); $0.30 at 4K
  Schema: Aspect ratio family unchanged. Added `resolution`
          (1K/2K/4K, default 1K for billing predictability),
          `enable_web_search` (real-time info grounding, +$0.015),
          and `limit_generations` (force exactly 1 image).
  Architecture: Gemini 2.5 Flash → Gemini 3 Pro Image. Quality
                and reasoning depth improved; slower (~6s → ~8s).

Migration: users who had the old IDs in `image_gen.model` will
fall through the existing 'unknown model → default' warning path
in `_resolve_fal_model()` and get the Klein 9B default on the next
run. Re-run `hermes tools` → Image Generation to pick the new
version. No silent cost-upgrade aliasing — the 2-6x price jump
on these tiers warrants explicit user re-selection.

Portal note: both new model IDs need to be allowlisted on the
Nous fal-queue-gateway alongside the previous 7 additions, or
users on Nous Subscription will see the 'managed gateway rejected
model' error we added previously (which is clear and
self-remediating, just noisy).

* docs: wrap '<1s' in backticks to unblock MDX compilation

Docusaurus's MDX parser treats unquoted '<' as the start of JSX, and
'<1s' fails because '1' isn't a valid tag-name start character. This
was broken on main since PR #11265 (never noticed because
docs-site-checks was failing on OTHER issues at the time and we
admin-merged through it).

Wrapping in backticks also gives the cell monospace styling which
reads more cleanly alongside the inline-code model ID in the same row.

The other '<1s' occurrence (line 52) is inside a fenced code block
and is already safe — code fences bypass MDX parsing.
2026-04-16 22:05:41 -07:00
Teknium
01906e99dd
feat(image_gen): multi-model FAL support with picker in hermes tools (#11265)
* feat(image_gen): multi-model FAL support with picker in hermes tools

Adds 8 FAL text-to-image models selectable via `hermes tools` →
Image Generation → (FAL.ai | Nous Subscription) → model picker.

Models supported:
- fal-ai/flux-2/klein/9b (new default, <1s, $0.006/MP)
- fal-ai/flux-2-pro (previous default, kept backward-compat upscaling)
- fal-ai/z-image/turbo (Tongyi-MAI, bilingual EN/CN)
- fal-ai/nano-banana (Gemini 2.5 Flash Image)
- fal-ai/gpt-image-1.5 (with quality tier: low/medium/high)
- fal-ai/ideogram/v3 (best typography)
- fal-ai/recraft-v3 (vector, brand styles)
- fal-ai/qwen-image (LLM-based)

Architecture:
- FAL_MODELS catalog declares per-model size family, defaults, supports
  whitelist, and upscale flag. Three size families handled uniformly:
  image_size_preset (flux family), aspect_ratio (nano-banana), and
  gpt_literal (gpt-image-1.5).
- _build_fal_payload() translates unified inputs (prompt + aspect_ratio)
  into model-specific payloads, merges defaults, applies caller overrides,
  wires GPT quality_setting, then filters to the supports whitelist — so
  models never receive rejected keys.
- IMAGEGEN_BACKENDS registry in tools_config prepares for future imagegen
  providers (Replicate, Stability, etc.); each provider entry tags itself
  with imagegen_backend: 'fal' to select the right catalog.
- Upscaler (Clarity) defaults off for new models (preserves <1s value
  prop), on for flux-2-pro (backward-compat). Per-model via FAL_MODELS.

Config:
  image_gen.model           = fal-ai/flux-2/klein/9b  (new)
  image_gen.quality_setting = medium                  (new, GPT only)
  image_gen.use_gateway     = bool                    (existing)

Agent-facing schema unchanged (prompt + aspect_ratio only) — model
choice is a user-level config decision, not an agent-level arg.

Picker uses curses_radiolist (arrow keys, auto numbered-fallback on
non-TTY). Column-aligned: Model / Speed / Strengths / Price.

Docs: image-generation.md rewritten with the model table and picker
walkthrough. tools-reference, tool-gateway, overview updated to drop
the stale "FLUX 2 Pro" wording.

Tests: 42 new in tests/tools/test_image_generation.py covering catalog
integrity, all 3 size families, supports filter, default merging, GPT
quality wiring, model resolution fallback. 8 new in
tests/hermes_cli/test_tools_config.py for picker wiring (registry,
config writes, GPT quality follow-up prompt, corrupt-config repair).

* feat(image_gen): translate managed-gateway 4xx to actionable error

When the Nous Subscription managed FAL proxy rejects a model with 4xx
(likely portal-side allowlist miss or billing gate), surface a clear
message explaining:
  1. The rejected model ID + HTTP status
  2. Two remediation paths: set FAL_KEY for direct access, or
     pick a different model via `hermes tools`

5xx, connection errors, and direct-FAL errors pass through unchanged
(those have different root causes and reasonable native messages).

Motivation: new FAL models added to this release (flux-2-klein-9b,
z-image-turbo, nano-banana, gpt-image-1.5, ideogram-v3, recraft-v3,
qwen-image) are untested against the Nous Portal proxy. If the portal
allowlists model IDs, users on Nous Subscription will hit cryptic
4xx errors without guidance on how to work around it.

Tests: 8 new cases covering status extraction across httpx/fal error
shapes and 4xx-vs-5xx-vs-ConnectionError translation policy.

Docs: brief note in image-generation.md for Nous subscribers.

Operator action (Nous Portal side): verify that fal-queue-gateway
passes through these 7 new FAL model IDs. If the proxy has an
allowlist, add them; otherwise Nous Subscription users will see the
new translated error and fall back to direct FAL.

* feat(image_gen): pin GPT-Image quality to medium (no user choice)

Previously the tools picker asked a follow-up question for GPT-Image
quality tier (low / medium / high) and persisted the answer to
`image_gen.quality_setting`. This created two problems:

1. Nous Portal billing complexity — the 22x cost spread between tiers
   ($0.009 low / $0.20 high) forces the gateway to meter per-tier per
   user, which the portal team can't easily support at launch.
2. User footgun — anyone picking `high` by mistake burns through
   credit ~6x faster than `medium`.

This commit pins quality at medium by baking it into FAL_MODELS
defaults for gpt-image-1.5 and removes all user-facing override paths:

- Removed `_resolve_gpt_quality()` runtime lookup
- Removed `honors_quality_setting` flag on the model entry
- Removed `_configure_gpt_quality_setting()` picker helper
- Removed `_GPT_QUALITY_CHOICES` constant
- Removed the follow-up prompt call in `_configure_imagegen_model()`
- Even if a user manually edits `image_gen.quality_setting` in
  config.yaml, no code path reads it — always sends medium.

Tests:
- Replaced TestGptQualitySetting (6 tests) with TestGptQualityPinnedToMedium
  (5 tests) — proves medium is baked in, config is ignored, flag is
  removed, helper is removed, non-gpt models never get quality.
- Replaced test_picker_with_gpt_image_also_prompts_quality with
  test_picker_with_gpt_image_does_not_prompt_quality — proves only 1
  picker call fires when gpt-image is selected (no quality follow-up).

Docs updated: image-generation.md replaces the quality-tier table
with a short note explaining the pinning decision.

* docs(image_gen): drop stale 'wires GPT quality tier' line from internals section

Caught in a cleanup sweep after pinning quality to medium. The
"How It Works Internally" walkthrough still described the removed
quality-wiring step.
2026-04-16 20:19:53 -07:00
Teknium
fce6c3cdf6
feat(tts): add Google Gemini TTS provider (#11229)
Adds Google Gemini TTS as the seventh voice provider, with 30 prebuilt
voices (Zephyr, Puck, Kore, Enceladus, Gacrux, etc.) and natural-language
prompt control. Integrates through the existing provider chain:

- tools/tts_tool.py: new _generate_gemini_tts() calls the
  generativelanguage REST endpoint with responseModalities=[AUDIO],
  wraps the returned 24kHz mono 16-bit PCM (L16) in a WAV RIFF header,
  then ffmpeg-converts to MP3 or Opus depending on output extension.
  For .ogg output, libopus is forced explicitly so Telegram voice
  bubbles get Opus (ffmpeg defaults to Vorbis for .ogg).
- hermes_cli/tools_config.py: exposes 'Google Gemini TTS' as a provider
  option in the curses-based 'hermes tools' UI.
- hermes_cli/setup.py: adds gemini to the setup wizard picker, tool
  status display, and API key prompt branch (accepts existing
  GEMINI_API_KEY or GOOGLE_API_KEY, falls back to Edge if neither set).
- tests/tools/test_tts_gemini.py: 15 unit tests covering WAV header
  wrap correctness, env var fallback (GEMINI/GOOGLE), voice/model
  overrides, snake_case vs camelCase inlineData handling, HTTP error
  surfacing, and empty-audio edge cases.
- docs: TTS features page updated to list seven providers with the new
  gemini config block and ffmpeg notes.

Live-tested against api key against gemini-2.5-flash-preview-tts: .wav,
.mp3, and Telegram-compatible .ogg (Opus codec) all produce valid
playable audio.
2026-04-16 14:23:16 -07:00
Teknium
80855f964e
fix: stop hermes update from nagging about llm-wiki's wiki.path (#11222)
llm-wiki was the only shipped skill using metadata.hermes.config, which
caused 'hermes update' and 'hermes config migrate' to prompt for a wiki
directory on every run — even for users who have never touched the skill
— because 'enabled' is opt-out (all shipped skills count as enabled unless
explicitly disabled). Declining the prompt didn't persist anything, so
the nag fired again on every update.

Switch llm-wiki to the env var + runtime default pattern that obsidian and
google-workspace already use: WIKI_PATH env var, default $HOME/wiki. No
prompting infrastructure, no config.yaml touch, no nag loop.

Changes:
- skills/research/llm-wiki/SKILL.md: remove metadata.hermes.config,
  document WIKI_PATH env var in the Wiki Location section, update the
  orientation snippet and initialization guidance.
- Docs: replace llm-wiki's wiki.path examples with a generic 'myplugin.path'
  placeholder across configuration.md, features/skills.md, and
  creating-skills.md so users don't try to set skills.config.wiki.path
  expecting llm-wiki to use it.
- skills-catalog.md: mention WIKI_PATH instead of skills.config.wiki.path.

E2E verified: discover_all_skill_config_vars() and get_missing_skill_config_vars()
both return 0 entries after this change, so the prompt branch in migrate_config()
no longer fires.

The metadata.hermes.config feature stays in place for third-party skills
that genuinely need structured config, but built-ins now prefer env vars.
2026-04-16 13:34:16 -07:00
Teknium
dead2dfd4f
docs: add portal subscription links to tool-gateway page (#11208) 2026-04-16 12:48:03 -07:00
Jeffrey Quesnelle
3d8be06bce remove tool gateway from core features in docs 2026-04-16 12:36:49 -07:00
emozilla
10edd288c3 docs: add Nous Tool Gateway documentation
- New page: user-guide/features/tool-gateway.md covering eligibility,
  setup (hermes model, hermes tools, manual config), how use_gateway
  works, precedence, switching back, status checking, self-hosted
  gateway env vars, and FAQ
- Added to sidebar under Features (top-level, before Core category)
- Cross-references from: overview.md, tools.md, browser.md,
  image-generation.md, tts.md, providers.md, environment-variables.md
- Added Nous Tool Gateway subsection to env vars reference with
  TOOL_GATEWAY_DOMAIN, TOOL_GATEWAY_SCHEME, TOOL_GATEWAY_USER_TOKEN,
  and FIRECRAWL_GATEWAY_URL
2026-04-16 12:36:49 -07:00
Teknium
131d261a74 docs: add dashboard themes and plugins documentation
- web-dashboard.md: add Themes section covering built-in themes, custom
  theme YAML format (21 color tokens + overlay), and theme API endpoints
- dashboard-plugins.md: full plugin authoring guide covering manifest
  format, plugin SDK reference, backend API routes, custom CSS, loading
  flow, discovery, and tips
2026-04-16 04:10:06 -07:00
Teknium
23a42635f0
docs: remove nonexistent CAMOFOX_PROFILE_DIR env var references (#10976)
Camofox automatically maps each userId to a persistent Firefox profile
on the server side — no CAMOFOX_PROFILE_DIR env var exists. Our docs
incorrectly told users to configure this on the server.

Removed the fabricated env var from:
- browser docs (:::note block)
- config.py DEFAULT_CONFIG comment
- test docstring
2026-04-16 04:07:11 -07:00
Teknium
fb903b8f08
docs: document register_command() for plugin slash commands (#10671)
* feat: implement register_command() on plugin context

Complete the half-built plugin slash command system. The dispatch
code in cli.py and gateway/run.py already called
get_plugin_command_handler() but the registration side was never
implemented.

Changes:
- Add register_command() to PluginContext — stores handler,
  description, and plugin name; normalizes names; rejects conflicts
  with built-in commands
- Add _plugin_commands dict to PluginManager
- Add commands_registered tracking on LoadedPlugin
- Add get_plugin_command_handler() and get_plugin_commands()
  module-level convenience functions
- Fix commands.py to use actual plugin description in Telegram
  bot menu (was hardcoded 'Plugin command')
- Add plugin commands to SlashCommandCompleter autocomplete
- Show command count in /plugins display
- 12 new tests covering registration, conflict detection,
  normalization, handler dispatch, and introspection

Closes #10495

* docs: add register_command() to plugin guides

- Build a Plugin guide: new 'Register slash commands' section with
  full API reference, comparison table vs register_cli_command(),
  sync/async examples, and conflict protection docs
- Features/Plugins page: add slash commands to capabilities table
  and plugin types summary
2026-04-15 19:55:25 -07:00
Teknium
cc6e8941db
feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation (#10619)
Salvaged from PR #9884 by erosika. Cherry-picked plugin changes onto
current main with minimal core modifications.

Plugin changes (plugins/memory/honcho/):
- New honcho_reasoning tool (5th tool, splits LLM calls from honcho_context)
- Two-layer context injection: base context (summary + representation + card)
  on contextCadence, dialectic supplement on dialecticCadence
- Multi-pass dialectic depth (1-3 passes) with early bail-out on strong signal
- Cold/warm prompt selection based on session state
- dialecticCadence defaults to 3 (was 1) — ~66% fewer Honcho LLM calls
- Session summary injection for conversational continuity
- Bidirectional peer targeting on all 5 tools
- Correctness fixes: peer param fallback, None guard on set_peer_card,
  schema validation, signal_sufficient anchored regex, mid->medium level fix

Core changes (~20 lines across 3 files):
- agent/memory_manager.py: Enhanced sanitize_context() to strip full
  <memory-context> blocks and system notes (prevents leak from saveMessages)
- run_agent.py: gateway_session_key param for stable per-chat Honcho sessions,
  on_turn_start() call before prefetch_all() for cadence tracking,
  sanitize_context() on user messages to strip leaked memory blocks
- gateway/run.py: skip_memory=True on 2 temp agents (prevents orphan sessions),
  gateway_session_key threading to main agent

Tests: 509 passed (3 skipped — honcho SDK not installed locally)
Docs: Updated honcho.md, memory-providers.md, tools-reference.md, SKILL.md

Co-authored-by: erosika <erosika@users.noreply.github.com>
2026-04-15 19:12:19 -07:00
Teknium
2871ef1807
docs: note session continuity for previous_response_id chains (#10060) 2026-04-14 21:07:37 -07:00
simon-marcus
d6c09ab94a feat(api-server): stream /v1/responses SSE tool events 2026-04-14 20:51:52 -07:00
Teknium
8bb5973950 docs: add proxy mode documentation
- Matrix docs: full Proxy Mode section with architecture diagram,
  step-by-step setup (host + Docker), docker-compose.yml/Dockerfile
  examples, configuration reference, and limitations notes
- API Server docs: add Proxy Mode section explaining the api_server
  serves as the backend for gateway proxy mode
- Environment variables reference: add GATEWAY_PROXY_URL and
  GATEWAY_PROXY_KEY entries
2026-04-14 10:49:48 -07:00