Commit graph

19 commits

Author SHA1 Message Date
Teknium
5a1c599412
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)

Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.

Supersedes #12550.

No code changes in this commit.

* feat(browser): add persistent CDP supervisor for dialog + frame detection

Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.

Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.

Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.

Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.

Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.

E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.

No agent-facing tool wiring in this commit (comes next).

* feat(browser): add browser_dialog tool wired to CDP supervisor

Agent-facing response-only tool. Schema:
  action: 'accept' | 'dismiss' (required)
  prompt_text: response for prompt() dialogs (optional)
  dialog_id: disambiguate when multiple dialogs queued (optional)

Handler:
  SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)

check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.

Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.

* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot

Supervisor lifecycle:
  * _get_session_info lazy-starts the supervisor after a session row is
    materialized — covers every backend code path (Browserbase, cdp_url
    override, /browser connect, future providers) with one hook.
  * cleanup_browser(task_id) stops the supervisor for that task first
    (before the backend tears down CDP).
  * cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
  * /browser connect eagerly starts the supervisor for task 'default'
    so the first snapshot already shows pending_dialogs.
  * /browser disconnect stops the supervisor.

CDP URL resolution for the supervisor:
  1. BROWSER_CDP_URL / browser.cdp_url override.
  2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).

browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.

Config defaults:
  * browser.dialog_policy: 'must_respond' (new)
  * browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.

Deadlock fix in supervisor event dispatch:
  * _on_dialog_opening and _on_target_attached used to await CDP calls
    while the reader was still processing an event — but only the reader
    can set the response Future, so the call timed out.
  * Both now fire asyncio.create_task(...) so the reader stays pumping.
  * auto_dismiss/auto_accept now actually close the dialog immediately.

Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
  * supervisor start/snapshot
  * main-frame alert detection + dismiss
  * iframe.contentWindow alert
  * prompt() with prompt_text reply
  * respond with no pending dialog -> clean error
  * auto_dismiss clears on event
  * registry idempotency
  * registry stop -> snapshot reports inactive
  * browser_dialog tool no-supervisor error
  * browser_dialog invalid action
  * browser_dialog end-to-end via tool handler

xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.

* docs(browser): document browser_dialog tool + CDP supervisor

- user-guide/features/browser.md: new browser_dialog section with
  workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
  bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
  toolset row with note on pending_dialogs / frame_tree snapshot fields

Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).

* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility

Found via Browserbase E2E test that revealed two production-critical issues:

1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
   CDP proxy tears down our long-lived WebSocket whenever a short-lived
   client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
   Fixed with a reconnecting _run loop that re-attaches with exponential
   backoff on drops. _page_session_id and _child_sessions are reset on each
   reconnect; pending_dialogs and frames are preserved across reconnects.

2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
   Playwright-based CDP proxy dismisses alert/confirm/prompt before our
   Page.handleJavaScriptDialog call can respond. So pending_dialogs is
   empty by the time the agent reads a snapshot on Browserbase.

   Added a recent_dialogs ring buffer (capacity 20) that retains a
   DialogRecord for every dialog that opened, with a closed_by tag:
     * 'agent'       — agent called browser_dialog
     * 'auto_policy' — local auto_dismiss/auto_accept fired
     * 'watchdog'    — must_respond timeout auto-dismissed (300s default)
     * 'remote'      — browser/backend closed it on us (Browserbase)

   Agents on Browserbase now see the dialog history with closed_by='remote'
   so they at least know a dialog fired, even though they couldn't respond.

3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
   'message' field (CDP spec has only 'result' and 'userInput') but our
   _on_dialog_closed was matching on message. Fixed to match by session_id
   + oldest-first, with a safety assumption that only one dialog is in
   flight per session (the JS thread is blocked while a dialog is up).

Docs + tests updated:
  * browser.md: new availability matrix showing the three backends and
    which mode (pending / recent / response) each supports
  * developer-guide/browser-supervisor.md: three-field snapshot schema
    with closed_by semantics
  * test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
    passing against real Chrome)

E2E verified both backends:
  * Local Chrome via /browser connect: detect + respond full workflow
    (smoke_supervisor.py all 7 scenarios pass)
  * Browserbase: detect via recent_dialogs with closed_by='remote'
    (smoke_supervisor_browserbase_v2.py passes)

Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.

* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)

Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.

The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.

Flow when a page calls alert('hi'):
  1. window.alert override intercepts, builds XHR GET to
     http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
  2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
  3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
     it as a pending dialog with bridge_request_id set
  4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
  5. Supervisor calls Fetch.fulfillRequest with JSON body:
     {accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
  6. The injected script parses the body, returns the appropriate value
     from the override (undefined for alert, bool for confirm, string|null
     for prompt)

This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.

Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.

Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).

E2E VERIFIED:
  * Local Chrome: 13/13 pytest tests green (12 original + new
    test_bridge_captures_prompt_and_returns_reply_text that asserts
    window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
  * Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
    - alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
    - prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
      → page.prompt_ret === 'AGENT-REPLY' ✓
    - confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
    - confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓

Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.

* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)

Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).

Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.

Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.

Agent workflow:
  1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
  2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
                 params={'expression': 'document.title', 'returnByValue': True})
  3. Supervisor dispatches the call on the OOPIF's child session

Supervisor state fixes needed along the way:
  * _on_frame_detached now skips reason='swap' (frame migrating processes)
  * _on_frame_detached also skips when the frame is an OOPIF with a live
    child session — Browserbase fires spurious remove events when a
    same-origin iframe gets promoted to OOPIF
  * _on_target_detached clears cdp_session_id but KEEPS the frame record
    so the agent still sees the OOPIF in frame_tree during transient
    session flaps

E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
  browser_cdp(method='Runtime.evaluate',
              params={'expression': 'document.title', 'returnByValue': True},
              frame_id=<OOPIF>)
  → {'success': True, 'result': {'value': 'Example Domain'}}

  The iframe is <iframe src='https://example.com/'> inside a top-level
  data: URL page on a real Browserbase session. The agent Runtime.evaluates
  INSIDE the cross-origin iframe and gets example.com's title back.

Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
  * test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
    verifies routing via supervisor, Runtime.evaluate returns 1+1=2
  * test_browser_cdp_frame_id_missing_supervisor — clean error when no
    supervisor attached
  * test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
    frame_id

Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.

* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process

When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:

  * 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
  * Chrome with --site-per-process so the cross-origin iframe becomes a
    real OOPIF in its own process
  * Navigate, find OOPIF in supervisor.frame_tree, call
    browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
    through the supervisor's child session
  * Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
    inner page, retrieved via OOPIF eval)

PASSED on 2026-04-23.

Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.

chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.

Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.

* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count

Pre-merge docs audit revealed two gaps:

1. user-guide/configuration.md browser config example was missing the
   two new dialog_* knobs. Added with a short table explaining
   must_respond / auto_dismiss / auto_accept semantics and a link to
   the feature page for the full workflow.

2. reference/tools-reference.md header said '54 built-in tools' — real
   count on main is 54, this branch adds browser_dialog so it's 55.
   Fixed the header.  (browser count was already correctly bumped
   11 -> 12 in the earlier docs commit.)

No code changes.
2026-04-23 22:23:37 -07:00
Teknium
ce410521b3
feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369)
Agents can now send arbitrary CDP commands to the browser. The tool is
gated on a reachable CDP endpoint at session start — it only appears in
the toolset when BROWSER_CDP_URL is set (from '/browser connect') or
'browser.cdp_url' is configured in config.yaml. Backends that don't
currently expose CDP to the Python side (Camofox, default local
agent-browser, cloud providers whose per-session cdp_url is not yet
surfaced) do not see the tool at all.

Tool schema description links to the CDP method reference at
https://chromedevtools.github.io/devtools-protocol/ so the agent can
web_extract specific method docs on demand.

Stateless per call. Browser-level methods (Target.*, Browser.*,
Storage.*) omit target_id. Page-level methods attach to the target
with flatten=true and dispatch the method on the returned sessionId.
Clean errors when the endpoint becomes unreachable mid-session or
the URL isn't a WebSocket.

Tests: 19 unit (mock CDP server + gate checks) + E2E against real
headless Chrome (Target.getTargets, Browser.getVersion,
Runtime.evaluate with target_id, Page.navigate + re-eval, bogus
method, bogus target_id, missing endpoint) + E2E of the check_fn
gate (tool hidden without CDP URL, visible with it, hidden again
after unset).
2026-04-19 00:03:10 -07:00
Teknium
54e0eb24c0
docs: correctness audit — fix wrong values, add missing coverage (#11972)
Comprehensive audit of every reference/messaging/feature doc page against the
live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY,
TOOLSETS, tool registry, on-disk skills). Every fix was verified against code
before writing.

### Wrong values fixed (users would paste-and-fail)

- reference/environment-variables.md:
  - DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192
    actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`.
  - MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual
    `/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint).
- reference/toolsets-reference.md MCP example used the non-existent nested
  `mcp: servers:` key \u2192 real key is the flat `mcp_servers:`.
- reference/skills-catalog.md listed ~20 bundled skills that no longer exist
  on disk (all moved to `optional-skills/`). Regenerated the whole bundled
  section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names.
- messaging/slack.md ":::info" callout claimed Slack has no
  `free_response_channels` equivalent; both the env var and the yaml key are
  in fact read.
- messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the
  adapter only reads `extra.markdown_support` from config.yaml. Removed the
  env var row and noted config-only nature.
- messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`.

### Missing coverage added

- Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in
  PROVIDER_REGISTRY but undocumented everywhere. Added sections to
  integrations/providers.md, rows to quickstart.md and fallback-providers.md.
- integrations/providers.md "Fallback Model" provider list now includes
  gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock.
- reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER
  enum in env-vars now include the same set.
- reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`.
  Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`.
- reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added
  `feishu_doc` and `feishu_drive` toolset sections.
- reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core
  rows + all missing `hermes-<platform>` toolsets in the platform table
  (bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin,
  homeassistant, webhook, gateway). Fixed the `debugging` composite to
  describe the actual `includes=[...]` mechanism.
- reference/optional-skills-catalog.md: added `fitness-nutrition`.
- reference/environment-variables.md: added NOUS_BASE_URL,
  NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL,
  XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE,
  BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS,
  DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS,
  QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX.
- messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY,
  HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT
  _DELAY_SECONDS (all actively read by the adapter).
- messaging/matrix.md: documented MATRIX_REACTIONS (default true).
- messaging/telegram.md: removed the redundant second Webhook Mode section
  that invented a `telegram.webhook_mode: true` yaml key the adapter does
  not read.
- user-guide/features/hooks.md: added `on_session_finalize` and
  `on_session_reset` (both emitted via invoke_hook but undocumented).
- user-guide/features/api-server.md: documented GET /health/detailed, the
  `/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events
  (10 routes that were live but undocumented).
- user-guide/features/fallback-providers.md: added `approval` and
  `title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth
  to the supported-providers table.
- user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add
  oversight in #11942).
- user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`;
  yaml example block gains `mistral:`, `gemini:`, `xai:` subsections.
  Auxiliary-provider enum now enumerates all real registry entries.
- reference/faq.md: stale AIAgent/config examples bumped from
  `nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to
  `claude-opus-4.7`.

### Docs-site integrity

- guides/build-a-hermes-plugin.md referenced two nonexistent hooks
  (`pre_api_request`, `post_api_request`). Replaced with the real
  `on_session_finalize` / `on_session_reset` entries.
- messaging/open-webui.md and features/api-server.md had pre-existing
  broken links to `/docs/user-guide/features/profiles` (actual path is
  `/docs/user-guide/profiles`). Fixed.
- reference/skills-catalog.md had one `<1%` literal that MDX parsed as a
  JSX tag. Escaped to `&lt;1%`.

### False positives filtered out (not changed, verified correct)

- `/set-home` is a registered alias of `/sethome` \u2014 docs were fine.
- `hermes setup gateway` is valid syntax (`hermes setup \<section\>`);
  changed in qqbot.md for cross-doc consistency, not as a bug fix.
- Telegram reactions "disabled by default" matches code (default `"false"`).
- Matrix encryption "opt-in" matches code (empty env default \u2192 disabled).
- `pre_api_request` / `post_api_request` hooks do NOT exist in current code;
  documented instead the real `on_session_finalize` / `on_session_reset`.
- SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it).

Validation:
- `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning).
- `ascii-guard lint docs` \u2014 124 files, 0 errors.
- 22 files changed, +317 / \u2212158.
2026-04-18 01:45:48 -07:00
Teknium
1acf81fdf5 docs: add QQBot to all 14 docs pages (full platform parity)
- sidebars.ts: sidebar navigation entry
- webhooks.md: deliver field routing table
- configuration.md: platform keys list
- sessions.md: platform identifiers table
- features/cron.md: delivery target table
- developer-guide/architecture.md: adapter listing
- developer-guide/cron-internals.md: delivery target table
- developer-guide/gateway-internals.md: file tree listing
- guides/cron-troubleshooting.md: supported platforms list
- integrations/index.md: platform links list
- reference/toolsets-reference.md: toolset table

(qqbot.md, environment-variables.md, and messaging/index.md were
already included in the contributor's original PR)
2026-04-14 00:11:49 -07:00
Teknium
ba50fa3035
docs: fix 30+ inaccuracies across documentation (#9023)
Cross-referenced all docs pages against the actual codebase and fixed:

Reference docs (cli-commands.md, slash-commands.md, profile-commands.md):
- Fix: hermes web -> hermes dashboard (correct subparser name)
- Fix: Wrong provider list (removed deepseek, ai-gateway, opencode-zen,
  opencode-go, alibaba; added gemini)
- Fix: Missing tts in hermes setup section choices
- Add: Missing --image flag for hermes chat
- Add: Missing --component flag for hermes logs
- Add: Missing CLI commands: debug, backup, import
- Fix: /status incorrectly marked as messaging-only (available everywhere)
- Fix: /statusbar moved from Session to Configuration category
- Add: Missing slash commands: /fast, /snapshot, /image, /debug
- Add: Missing /restart from messaging commands table
- Fix: /compress description to match COMMAND_REGISTRY
- Add: --no-alias flag to profile create docs

Configuration docs (configuration.md, environment-variables.md):
- Fix: Vision timeout default 30s -> 120s
- Fix: TTS providers missing minimax and mistral
- Fix: STT providers missing mistral
- Fix: TTS openai base_url shown with wrong default
- Fix: Compression config showing stale summary_model/provider/base_url
  keys (migrated out in config v17) -> target_ratio/protect_last_n

Getting-started docs:
- Fix: Redundant faster-whisper install (already in voice extra)
- Fix: Messaging extra description missing Slack

Developer guide:
- Fix: architecture.md tool count 48 -> 47, toolset count 40 -> 19
- Fix: run_agent.py line count 9,200 -> 10,700
- Fix: cli.py line count 8,500 -> 10,000
- Fix: main.py line count 5,500 -> 6,000
- Fix: gateway/run.py line count 7,500 -> 9,000
- Fix: Browser tools count 11 -> 10
- Fix: Platform adapter count 15 -> 18 (add wecom_callback, api_server)
- Fix: agent-loop.md wrong budget sharing (not shared, independent)
- Fix: agent-loop.md non-existent _get_budget_warning() reference
- Fix: context-compression-and-caching.md non-existent function name
- Fix: toolsets-reference.md safe toolset includes mixture_of_agents (it doesn't)
- Fix: toolsets-reference.md hermes-cli tool count 38 -> 36

Guides:
- Fix: automate-with-cron.md claims daily at 9am is valid (it's not)
- Fix: delegation-patterns.md Max 3 presented as hard cap (configurable)
- Fix: sessions.md group thread key format (shared by default, not per-user)
- Fix: cron-internals.md job ID format and JSON structure
2026-04-13 10:53:10 -07:00
Teknium
289d2745af
docs: add platform adapter developer guide + WeCom Callback docs (#7969)
Add the missing 'Adding a Platform Adapter' developer guide — a
comprehensive step-by-step checklist covering all 20+ integration
points (enum, adapter, config, runner, CLI, tools, toolsets, cron,
webhooks, tests, and docs). Includes common patterns for long-poll,
callback/webhook, and token-lock adapters with reference implementations.

Also adds full docs coverage for the WeCom Callback platform:
- New docs page: user-guide/messaging/wecom-callback.md
- Environment variables reference (9 WECOM_CALLBACK_* vars)
- Toolsets reference (hermes-wecom-callback)
- Messaging index (comparison table, architecture diagram, toolsets,
  security, next-steps links)
- Integrations index listing
- Sidebar entries for both new pages
2026-04-11 15:50:54 -07:00
Teknium
7cec784b64 fix: complete Weixin platform parity audit — 16 missing integration points
Systematic audit found Weixin missing from:

Code:
- gateway/run.py: early WEIXIN_ALLOW_ALL_USERS env check
- gateway/platforms/webhook.py: cross-platform delivery routing
- hermes_cli/dump.py: platform detection for config export
- hermes_cli/setup.py: hermes setup wizard platform list + _setup_weixin
- hermes_cli/skills_config.py: platform labels for skills config UI

Docs (11 pages):
- developer-guide/architecture.md: platform adapter listing
- developer-guide/cron-internals.md: delivery target table
- developer-guide/gateway-internals.md: file tree
- guides/cron-troubleshooting.md: supported platforms list
- integrations/index.md: platform links
- reference/toolsets-reference.md: toolset table
- user-guide/configuration.md: platform keys for tool_progress
- user-guide/features/cron.md: delivery target table
- user-guide/messaging/index.md: intro text, feature table,
  mermaid diagram, toolset table, setup links
- user-guide/messaging/webhooks.md: deliver field + routing table
- user-guide/sessions.md: platform identifiers table
2026-04-10 05:54:37 -07:00
Teknium
7120d6cdd6
fix(bluebubbles): add missing integration points and documentation (#6460)
- hermes_cli/skills_config.py: add platform label for per-platform skill config
- gateway/session.py: add to PII-safe platforms (no mention system)
- website/docs/user-guide/messaging/bluebubbles.md: full setup guide
- website/sidebars.ts: sidebar navigation entry
- 10 docs pages: add BlueBubbles to all platform enumerations
  (env vars, toolsets, cron delivery, gateway internals, etc.)
2026-04-09 00:19:05 -07:00
Teknium
c58e16757a
docs: fix 40+ discrepancies between documentation and codebase (#5818)
Comprehensive audit of all ~100 doc pages against the actual code, fixing:

Reference docs:
- HERMES_API_TIMEOUT default 900 -> 1800 (env-vars)
- TERMINAL_DOCKER_IMAGE default python:3.11 -> nikolaik/python-nodejs (env-vars)
- compression.summary_model default shown as gemini -> actually empty string (env-vars)
- Add missing GOOGLE_API_KEY, GEMINI_API_KEY, GEMINI_BASE_URL env vars (env-vars)
- Add missing /branch (/fork) slash command (slash-commands)
- Fix hermes-cli tool count 39 -> 38 (toolsets-reference)
- Fix hermes-api-server drop list to include text_to_speech (toolsets-reference)
- Fix total tool count 47 -> 48, standalone 14 -> 15 (tools-reference)

User guide:
- web_extract.timeout default 30 -> 360 (configuration)
- Remove display.theme_mode (not implemented in code) (configuration)
- Remove display.background_process_notifications (not in defaults) (configuration)
- Browser inactivity timeout 300/5min -> 120/2min (browser)
- Screenshot path browser_screenshots -> cache/screenshots (browser)
- batch_runner default model claude-sonnet-4-20250514 -> claude-sonnet-4.6
- Add minimax to TTS provider list (voice-mode)
- Remove credential_pool_strategies from auth.json example (credential-pools)
- Fix Slack token path platforms/slack/ -> root ~/.hermes/ (slack)
- Fix Matrix store path for new installs (matrix)
- Fix WhatsApp session path for new installs (whatsapp)
- Fix HomeAssistant config from gateway.json to config.yaml (homeassistant)
- Fix WeCom gateway start command (wecom)

Developer guide:
- Fix tool/toolset counts in architecture overview
- Update line counts: main.py ~5500, setup.py ~3100, run.py ~7500, mcp_tool ~2200
- Replace nonexistent agent/memory_store.py with memory_manager.py + memory_provider.py
- Update _discover_tools() list: remove honcho_tools, add skill_manager_tool
- Add session_search and delegate_task to intercepted tools list (agent-loop)
- Fix budget warning: two-tier system (70% caution, 90% warning) (agent-loop)
- Fix gateway auth order (per-platform first, global last) (gateway-internals)
- Fix email_adapter.py -> email.py, add webhook.py + api_server.py (gateway-internals)
- Add 7 missing providers to provider-runtime list

Other:
- Add Docker --cap-add entries to security doc
- Fix Python version 3.10+ -> 3.11+ (contributing)
- Fix AGENTS.md discovery claim (not hierarchical walk) (tips)
- Fix cron 'add' -> canonical 'create' (cron-internals)
- Add pre_api_request/post_api_request hooks to plugin guide
- Add Google/Gemini provider to providers page
- Clarify OPENAI_BASE_URL deprecation (providers)
2026-04-07 10:17:44 -07:00
Teknium
8b861b77c1
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it

The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.

Removing it saves one tool schema slot in every browser-enabled API call.

Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.

Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
  camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
  acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references

* fix: repeat browser_scroll 5x per call for meaningful page movement

Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.

Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.

* feat: auto-return compact snapshot from browser_navigate

Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.

The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.

Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.

Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.

* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params

Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:

Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object

Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
  If provider is omitted, the current main provider is pinned at creation
  time so the job stays stable even if the user changes their default.

Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading

Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.

* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools

MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.

Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
Teknium
43d468cea8
docs: comprehensive documentation audit — fix stale info, expand thin pages, add depth (#5393)
Major changes across 20 documentation pages:

Staleness fixes:
- Fix FAQ: wrong import path (hermes.agent → run_agent)
- Fix FAQ: stale Gemini 2.0 model → Gemini 3 Flash
- Fix integrations/index: missing MiniMax TTS provider
- Fix integrations/index: web_crawl is not a registered tool
- Fix sessions: add all 19 session sources (was only 5)
- Fix cron: add all 18 delivery targets (was only telegram/discord)
- Fix webhooks: add all delivery targets
- Fix overview: add missing MCP, memory providers, credential pools
- Fix all line-number references → use function name searches instead
- Update file size estimates (run_agent ~9200, gateway ~7200, cli ~8500)

Expanded thin pages (< 150 lines → substantial depth):
- honcho.md: 43 → 108 lines — added feature comparison, tools, config, CLI
- overview.md: 49 → 55 lines — added MCP, memory providers, credential pools
- toolsets-reference.md: 57 → 175 lines — added explanations, config examples,
  custom toolsets, wildcards, platform differences table
- optional-skills-catalog.md: 74 → 153 lines — added 25+ missing skills across
  communication, devops, mlops (18!), productivity, research categories
- integrations/index.md: 82 → 115 lines — added messaging, HA, plugins sections
- cron-internals.md: 90 → 195 lines — added job JSON example, lifecycle states,
  tick cycle, delivery targets, script-backed jobs, CLI interface
- gateway-internals.md: 111 → 250 lines — added architecture diagram, message
  flow, two-level guard, platform adapters, token locks, process management
- agent-loop.md: 112 → 235 lines — added entry points, API mode resolution,
  turn lifecycle detail, message alternation rules, tool execution flow,
  callback table, budget tracking, compression details
- architecture.md: 152 → 295 lines — added system overview diagram, data flow
  diagrams, design principles table, dependency chain

Other depth additions:
- context-references.md: added platform availability, compression interaction,
  common patterns sections
- slash-commands.md: added quick commands config example, alias resolution
- image-generation.md: added platform delivery table
- tools-reference.md: added tool counts, MCP tools note
- index.md: updated platform count (5 → 14+), tool count (40+ → 47)
2026-04-05 19:45:50 -07:00
Teknium
77a2aad771
docs: fix stale references across 8 doc pages
Audit found 24+ discrepancies between docs and code. Fixed:

HIGH severity:
- Remove honcho toolset from tools-reference, toolsets-reference, and tools.md
  (converted to memory provider plugin, not a built-in toolset)
- Add note that Honcho is available via plugin

MEDIUM severity:
- Add hermes memory command family to cli-commands.md (setup/status/off)
- Add --clone-all, --clone-from to profile create in cli-commands.md
- Add --max-turns option to hermes chat in cli-commands.md
- Add /btw slash command to slash-commands.md
- Fix profile show example output (remove nonexistent disk usage,
  add .env and SOUL.md status lines)
- Add missing hermes-webhook toolset to toolsets-reference.md
- Add 5 missing providers to fallback-providers.md table
- Add 7 missing providers to providers.md fallback list
- Fix outdated model examples: glm-4-plus→glm-5, moonshot-v1-auto→kimi-for-coding
2026-04-03 23:30:29 -07:00
Teknium
7e0c2c3ce3
docs: comprehensive documentation audit — fix 9 HIGH, 20+ MEDIUM gaps (#4087)
Reference docs fixes:
- cli-commands.md: remove non-existent --provider alibaba, add hermes
  profile/completion/plugins/mcp to top-level table, add --profile/-p
  global flag, add --source chat option
- slash-commands.md: add /yolo and /commands, fix /q alias conflict
  (resolves to /queue not /quit), add missing aliases (/bg, /set-home,
  /reload_mcp, /gateway)
- toolsets-reference.md: fix hermes-api-server (not same as hermes-cli,
  omits clarify/send_message/text_to_speech)
- profile-commands.md: fix show name required not optional, --clone-from
  not --from, add --remove/--name to alias, fix alias path, fix export/
  import arg types, remove non-existent fish completion
- tools-reference.md: add EXA_API_KEY to web tools requires_env
- mcp-config-reference.md: add auth key for OAuth, tool name sanitization
- environment-variables.md: add EXA_API_KEY, update provider values
- plugins.md: remove non-existent ctx.register_command(), add
  ctx.inject_message()

Feature docs additions:
- security.md: add /yolo mode, approval modes (manual/smart/off),
  configurable timeout, expanded dangerous patterns table
- cron.md: add wrap_response config, [SILENT] suppression
- mcp.md: add dynamic tool discovery, MCP sampling support
- cli.md: add Ctrl+Z suspend, busy_input_mode, tool_preview_length
- docker.md: add skills/credential file mounting

Messaging platform docs:
- telegram.md: add webhook mode, DoH fallback IPs
- slack.md: add multi-workspace OAuth support
- discord.md: add DISCORD_IGNORE_NO_MENTION
- matrix.md: add MSC3245 native voice messages
- feishu.md: expand from 129 to 365 lines (encrypt key, verification
  token, group policy, card actions, media, rate limiting, markdown,
  troubleshooting)
- wecom.md: expand from 86 to 264 lines (per-group allowlists, media,
  AES decryption, stream replies, reconnection, troubleshooting)

Configuration docs:
- quickstart.md: add DeepSeek, Copilot, Copilot ACP providers
- configuration.md: add DeepSeek provider, Exa web backend, terminal
  env_passthrough/images, browser.command_timeout, compression params,
  discord config, security/tirith config, timezone, auxiliary models

21 files changed, ~1000 lines added
2026-03-30 17:15:21 -07:00
Teknium
ce2841f3c9
feat(gateway): add WeCom (Enterprise WeChat) platform support (#3847)
Adds WeCom as a gateway platform adapter using the AI Bot WebSocket
gateway for real-time bidirectional communication. No public endpoint
or new pip dependencies needed (uses existing aiohttp + httpx).

Features:
- WebSocket persistent connection with auto-reconnect (exponential backoff)
- DM and group messaging with configurable access policies
- Media upload/download with AES decryption for encrypted attachments
- Markdown rendering, quote context preservation
- Proactive + passive reply message modes
- Chunked media upload pipeline (512KB chunks)

Cherry-picked from PR #1898 by EvilRan with:
- Moved to current main (PR was 300 commits behind)
- Skipped base.py regressions (reply_to additions are good but belong
  in a separate PR since they affect all platforms)
- Fixed test assertions to match current base class send() signature
  (reply_to=None kwarg now explicit)
- All 16 integration points added surgically to current main
- No new pip dependencies (aiohttp + httpx already installed)

Fixes #1898

Co-authored-by: EvilRan <EvilRan@users.noreply.github.com>
2026-03-29 21:29:13 -07:00
Teknium
ca4907dfbc
feat(gateway): add Feishu/Lark platform support (#3817)
Adds Feishu (ByteDance's enterprise messaging platform) as a gateway
platform adapter with full feature parity: WebSocket + webhook transports,
message batching, dedup, rate limiting, rich post/card content parsing,
media handling (images/audio/files/video), group @mention gating,
reaction routing, and interactive card button support.

Cherry-picked from PR #1793 by penwyp with:
- Moved to current main (PR was 458 commits behind)
- Fixed _send_with_retry shadowing BasePlatformAdapter method (renamed to
  _feishu_send_with_retry to avoid signature mismatch crash)
- Fixed import structure: aiohttp/websockets imported independently of
  lark_oapi so they remain available when SDK is missing
- Fixed get_hermes_home import (hermes_constants, not hermes_cli.config)
- Added skip decorators for tests requiring lark_oapi SDK
- All 16 integration points added surgically to current main

New dependency: lark-oapi>=1.5.3,<2 (optional, pip install hermes-agent[feishu])

Fixes #1788

Co-authored-by: penwyp <penwyp@users.noreply.github.com>
2026-03-29 18:17:42 -07:00
Teknium
02fb7c4aaf
docs: comprehensive docs audit — fix 12 stale/missing items across 10 pages (#3618)
Fixes found by auditing docs against recent PRs/commits:

Critical (misleading):
- hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks
  that are now active (#3542). Add correct callback signatures.
- security.md: Update tirith verdict behavior — block verdicts now go
  through approval flow instead of hard-blocking (#3428). Add pkill/killall
  self-termination guard and gateway-run backgrounding patterns (#3593).

New feature docs:
- configuration.md: Add tool_use_enforcement section with value table
  (auto/true/false/list) from #3551/#3528.
- configuration.md: Expand auxiliary config with per-task timeouts
  (compression 120s, web_extract 30s, approval 30s) from #3597.
- api-server.md: Add /v1/health alias, Security Headers section,
  CORS details (Max-Age, SSE headers, Idempotency-Key) from
  #3572/#3573/#3576/#3580/#3530.

Stale/incomplete:
- configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (#3484).
- environment-variables.md: Specify actual DashScope default URL.
- cli-commands.md: Add alibaba to --provider list.
- fallback-providers.md: Add Alibaba/DashScope to provider table.
- email.md: Document noreply/automated sender filtering (#3606).
- toolsets-reference.md: Add 4 missing platform toolsets — matrix,
  mattermost, dingtalk, api-server (#3583).
- skills.md: List default GitHub taps including garrytan/gstack (#3605).
2026-03-28 15:26:35 -07:00
Teknium
e648863d52
docs: fix documentation inconsistencies across reference and user guides
- toolsets-reference: add browser_console to browser + all platform toolsets,
  add missing hermes-acp, hermes-sms, messaging toolsets, correct hermes-gateway
  as composite, deduplicate platform toolset listings
- tools-reference: add missing vision and web toolset sections
- slash-commands: fix /new+/reset as alias (not separate commands), add /stop to
  CLI section (available in both CLI and gateway), add /plugins command, fix Notes
  section about messaging-only vs CLI-only
- environment-variables: fix HERMES_MAX_ITERATIONS default (90 not 60), add
  DEEPSEEK_API_KEY/BASE_URL, OPENCODE_ZEN/GO keys, TAVILY_API_KEY,
  GITHUB_TOKEN, HERMES_EPHEMERAL_SYSTEM_PROMPT
- configuration: remove duplicate Alibaba Cloud row, add OpenCode Zen/Go providers
- cli-commands: add missing providers to --provider list (opencode-zen,
  opencode-go, ai-gateway, kilocode, alibaba)
- quickstart: add OpenCode Zen and OpenCode Go to provider table

Co-authored-by: Test <test@test.com>
2026-03-18 16:26:27 -07:00
teknium1
c3ea620796 feat: add multi-skill cron editing and docs 2026-03-14 19:18:10 -07:00
Teknium
984f00e0b0
docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
  bundled skills, and official optional skills
- document the skin system and link visual theming separately from
  conversational personality
- refresh quickstart, configuration, environment variable, and messaging
  docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00