Commit graph

3394 commits

Author SHA1 Message Date
Dilee
397f750bb4 feat(teams): add pipeline outbound delivery via existing adapter 2026-05-08 12:00:09 -07:00
Teknium
a99547740d fix(teams-pipeline): drop-scheduler fallback + test wiring for enablement gate
Two salvage follow-ups on top of @dlkakbs's plugin runtime.

1. Install a drop-scheduler when the runtime fails to build.

   Previously when ``build_pipeline_runtime()`` raised (e.g. missing
   Graph env vars, subscription store path unwritable), ``bind_gateway_runtime``
   logged a warning and returned False, leaving the msgraph_webhook
   adapter with no scheduler at all. Incoming Graph notifications
   would then fall back to the adapter's default ``handle_message``
   path, which produces a raw JSON dump as a user-role message — not
   useful and fires every time Graph retries.

   Now a no-op drop-scheduler is installed instead, so:
   - Graph notifications ack cleanly (202) so Graph stops retrying.
   - The failure is surfaced once in the log with the error.
   - No user-role messages get manufactured from raw change payloads.

   The adapter is still bindable later once the runtime becomes
   available (e.g. after the operator runs ``hermes teams-pipeline
   validate`` and fixes the config), since the gateway's
   ``_teams_pipeline_runtime`` sentinel wasn't set to a non-None value.

2. Test wiring for ``_teams_pipeline_plugin_enabled()`` gate.

   The happy-path runner-wiring tests monkeypatched ``bind_gateway_runtime``
   but not ``_load_gateway_config``. In the hermetic test environment
   the real config read ran, saw no enabled plugins, and short-circuited
   the bind call before the test could observe it — so the test
   expected ``calls == [runner]`` but got ``calls == []``.

   Adds a ``_load_gateway_config`` monkeypatch with
   ``plugins.enabled = ["teams_pipeline"]`` to the happy-path tests.
   The explicit-disabled test ``test_gateway_runner_skips_wiring_when_teams_pipeline_plugin_disabled``
   already patches the config correctly.

   Also renames ``test_bind_gateway_runtime_leaves_scheduler_unchanged_on_failure``
   to ``test_bind_gateway_runtime_installs_drop_scheduler_on_failure``
   and updates the assertion — this test contradicted the drop-scheduler
   test in ``tests/plugins/test_teams_pipeline_plugin.py`` which
   expected the scheduler to be installed. The plugin-test name
   (``test_bind_gateway_runtime_drops_notifications_when_unavailable``)
   clearly describes the intended behavior; fixing the wiring-test
   assertion aligns both tests.

Validation:
- ``scripts/run_tests.sh tests/plugins/test_teams_pipeline_plugin.py
  tests/gateway/test_teams_pipeline_runtime_wiring.py
  tests/hermes_cli/test_teams_pipeline_plugin_cli.py`` — 25/25 passed.
2026-05-08 11:18:14 -07:00
Dilee
07bbd93337 feat(teams-pipeline): add plugin runtime and operator cli
Third slice of the Microsoft Teams meeting pipeline stack, salvaged
onto current main. Adds the standalone teams_pipeline plugin that
consumes Graph change notifications from the webhook listener,
resolves meeting artifacts (transcript first, recording + STT fallback
later), persists job state in a durable store, and exposes an operator
CLI for inspection, replay, subscription management, and validation.

Design choices follow maintainer review feedback on PR #19815:

- Standalone plugin rather than bolted-on core surface
  (plugins/teams_pipeline/, kind: standalone in plugin.yaml).
- Zero new model tools. The agent drives the pipeline by invoking
  the operator CLI via the terminal tool, guided by the skill that
  ships with a follow-up PR.
- Reuses the existing msgraph_webhook gateway platform for Graph
  ingress. Pipeline runtime is wired in via bind_gateway_runtime and
  gated on plugins.enabled so gateways that don't run the plugin
  boot cleanly.

Additions:

- plugins/teams_pipeline/: runtime (gateway wiring + config builder),
  pipeline core, durable SQLite store, subscription maintenance
  helpers, Graph artifact resolution, operator CLI (list, show,
  run/replay, fetch dry-run, subscriptions list, subscribe,
  renew-subscription, delete-subscription, maintain-subscriptions,
  token-health, validate).
- hermes_cli/main.py: second-pass plugin CLI discovery so any
  standalone plugin registered via ctx.register_cli_command()
  outside the memory-plugin convention path gets its subcommand
  wired into argparse without touching core.
- gateway/run.py: _teams_pipeline_plugin_enabled() config gate,
  _wire_teams_pipeline_runtime() binding after adapter setup, and
  the two runner attributes used by the runtime.

Credit to @dlkakbs for the entire plugin implementation.
2026-05-08 11:18:14 -07:00
Teknium
d0aad4b021 fix(computer-use): harden image-rejection fallback + AUTHOR_MAP
Follow-up to #15328's vision-unsupported retry branch in run_agent.py.

_strip_images_from_messages() previously deleted any message whose content
was entirely images. That's fine for synthetic user messages injected for
attachment delivery, but it breaks providers for tool-role messages — the
paired tool_call_id on the preceding assistant message ends up unmatched,
which OpenAI-compatible APIs reject with HTTP 400.

Fix: tool-role messages whose content becomes empty are replaced with a
plaintext placeholder that preserves the tool_call_id linkage. Only
non-tool messages are dropped. Added 10 tests covering the role-alternation
invariants + image-type coverage.

Image-rejection detector: expanded phrase list (image content not
supported / multimodal input / vision input / model does not support
image) and gated on 4xx status so transient 5xx errors never get
misinterpreted as 'server said no to images'. Detection is documented as
best-effort English phrase matching.

AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to
ddupont808 so release notes attribute the salvage correctly.
2026-05-08 11:07:38 -07:00
Teknium
850413f120 feat(computer-use): cua-driver backend, universal any-model schema
Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.

Replaces the Anthropic-native `computer_20251124` approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.

- `tools/computer_use/` package — swappable ComputerUseBackend ABC +
  CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal `computer_use` tool with one schema for all providers.
  Actions: capture (som/vision/ax), click, double_click, right_click,
  middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style
  `content: [text, image_url]` parts) that flows through
  handle_function_call into the tool message. Anthropic adapter converts
  into native `tool_result` image blocks; OpenAI-compatible providers
  get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
  recent screenshots carry real image data; older ones become text
  placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
  their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
  instead of its base64 char length (~1MB would have registered as
  ~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
  is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- `hermes tools` post-setup installs cua-driver via the upstream script
  and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
  through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
  (curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
  force delete, lock screen, log out).
- Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic)
  workflow guide.
- Docs: `user-guide/features/computer-use.md` plus reference catalog
  entries.

44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.

469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.

- `model_tools.py` provider-gating: the tool is available to every
  provider. Providers without multi-part tool message support will see
  text-only tool results (graceful degradation via `text_summary`).
- Anthropic server-side `clear_tool_uses_20250919` — deferred;
  client-side eviction + compressor pruning cover the same cost ceiling
  without a beta header.

- macOS only. cua-driver uses private SkyLight SPIs
  (SLEventPostToPid, SLPSPostEventRecordTo,
  _AXObserverAddNotificationAndCheckRemote) that can break on any macOS
  update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
  prints the Settings path.

Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
2026-05-08 11:07:38 -07:00
Teknium
b8d7e0e6d3 fix(msgraph_webhook): harden auth surface + IP allowlisting + response hygiene
Defense-in-depth polish on top of the webhook listener before it becomes
a real attack surface once the pipeline starts creating subscriptions
and Graph starts POSTing to the configured public URL.

- Timing-safe clientState comparison. Previously used `==` on strings;
  switches to hmac.compare_digest so a mismatch does not leak how many
  leading characters matched. client_state is documented as a strong
  shared secret (openssl rand -hex 32 in the setup docs), so a
  timing-safe primitive is the right call.

- Split GET and POST handlers. Graph validates a subscription by sending
  GET with validationToken in the query; anything else on GET is now a
  400 so the endpoint cannot be probed or mistakenly used for data
  exfil. Previously a bare GET fell through to the POST path and blew
  up on request.json() with a confusing 400.

- Empty response bodies on success. 202 is returned with no body so
  internal counters (accepted / duplicates / scheduled) do not leak to
  any caller that can reach the endpoint; counters remain observable
  via /health for operators. 403 on every-item-bad-clientState batches
  (so forged POSTs stop retrying), 400 on malformed / unknown-resource
  batches (sender configuration issue).

- Optional source-IP allowlist. New `allowed_source_cidrs` extra field
  (list or comma-separated string) and `MSGRAPH_WEBHOOK_ALLOWED_SOURCE_CIDRS`
  env var let operators restrict the webhook to Microsoft Graph's
  published webhook source ranges in production. Empty = allow all,
  preserving dev-tunnel / localhost workflows. Invalid CIDRs are
  logged and ignored rather than crashing. Also gates the handshake
  endpoint so disallowed IPs cannot probe it.

- Tests updated for the new response contract (empty-body 202,
  auth-only 403, config-error 400) and extended to cover: bare GET
  rejection, POST-with-validationToken handshake tolerance,
  timing-safe compare actually invoked via hmac.compare_digest spy,
  malformed body / missing value array, IP allowlist accept/reject
  paths, handshake IP allowlist, invalid CIDR entries, comma-string
  CIDR list parsing. 52/52 passed (was 40).

Full gateway suite: 5049 passed / 1 pre-existing failure in
test_discord_free_response (unrelated, reproduces on clean origin/main).
2026-05-08 10:29:58 -07:00
Dilee
26a59e4f6c fix(msgraph): normalize webhook dedupe and resource matching 2026-05-08 10:29:58 -07:00
Dilee
2a215de9af fix(msgraph): bound webhook receipt dedupe cache 2026-05-08 10:29:58 -07:00
Dilee
46a6f39024 feat(msgraph): add webhook listener platform 2026-05-08 10:29:58 -07:00
Teknium
f209a35859
feat(profile): shareable profile distributions via git (#20831)
* feat(profile): shareable profile distributions (pack/install/update/info)

Closes #20456.

Turns a profile into a portable, versioned artifact. Packs SOUL.md, config,
skills, cron, and an env-var manifest into a tar.gz that others can install
from a local path, URL, or git repo. Updates re-pull the distribution while
preserving user data (memories, sessions, auth.json, .env) and the user's
config.yaml overrides.

New subcommands (under hermes profile, no parallel tree):
  hermes profile pack    <name> [-o FILE]
  hermes profile install <source> [--name N] [--alias] [--force] [-y]
  hermes profile update  <name> [--force-config] [-y]
  hermes profile info    <name>

Manifest (distribution.yaml at the profile root): name, version,
hermes_requires, author, env_requires, distribution_owned.

Security:
  - Installer shows manifest + env-var requirements before mutating disk;
    confirmation required unless -y.
  - auth.json and .env are never packed (same exclude set as profile export).
  - Cron jobs are packed but NOT auto-scheduled — user is pointed at
    'hermes -p <name> cron list' to review.
  - Archive extraction rejects path traversal (../ members).
  - Alias creation is opt-in via --alias.

Update semantics:
  - Distribution-owned paths (SOUL.md, skills/, cron/, mcp.json, manifest):
    replaced from the new archive.
  - config.yaml: preserved by default; --force-config to overwrite.
  - User-owned paths (memories/, sessions/, auth.json, .env, state.db*,
    logs/, workspace/, plans/, home/, *_cache/, local/): never touched.

Version pin:
  hermes_requires accepts >=, <=, ==, !=, >, < or a bare version (treated
  as >=). Install fails with a clear error when the running Hermes version
  doesn't satisfy the spec.

Sources supported by 'install':
  - Local .tar.gz / .tgz archive
  - Local directory
  - HTTP(S) URL pointing to a .tar.gz (uses httpx, already a dep)
  - Git URL (github.com/user/repo, https://..., git@..., ssh://, git://)

Tests: 43 new unit tests (manifest parsing, version checks, env template,
pack/install/update round-trip, config-preservation, security).
E2E validated via real CLI invocations against an isolated HERMES_HOME
covering pack, install with confirmation, update preservation, update
--force-config, decline-preview, duplicate-install rejection, and
version-requirement rejection.

* refactor(profile-dist): git-only — drop tar.gz/HTTP transports and pack

Scope-cut on top of the original distribution PR: a profile distribution
is now exclusively a git repository (or a local directory during
development). The tar.gz / HTTP archive transports and the matching
`hermes profile pack` subcommand have been removed.

Why:
* GitHub tags, branches, and commits are already the right versioning
  primitive. Tag pushes do for us what 'pack + upload' did.
* `hermes profile export` / `import` already cover local backup and
  restore; they are not a distribution format and stay untouched.
* One transport means one install/update code path, one doc page,
  and one mental model. The extra source types doubled the surface
  for no real user win — GitHub auto-attaches release tarballs, and
  `git bundle` / `git clone --mirror` cover the airgap case.

Changes:
* hermes_cli/profile_distribution.py — removed pack_profile,
  _fetch_tar_archive (_http_fetch), _safe_extract, _archive_roots,
  _safe_parts, _find_dist_root, tarfile/io/urlparse imports. The
  new _stage_source has two arms: git URL → clone, local directory
  → use in place.
* hermes_cli/main.py — removed the 'pack' subparser and action
  handler. Install help text updated to match the reduced source list.
* tests/hermes_cli/test_profile_distribution.py — rewritten around a
  local-directory staging fixture. The install/update/describe suites
  now build a distribution tree on disk directly and install from it,
  which is what a real git clone produces after .git is stripped.
  Dropped TestPack, TestFindDistRoot, and the tar-specific security
  test. New tests cover _looks_like_git_url, env_example emission,
  hermes_requires enforcement, and 'installer does not import
  credentials if an author mistakenly leaks them in the staging tree'.
* website/docs/reference/profile-commands.md — 'Distribution commands'
  section rewritten around git. Added a 'Publishing a distribution'
  section. export/import stay documented as local backup/restore.
* website/docs/reference/cli-commands.md — dropped 'pack' from the
  profile subcommand table.
* website/package.json — 'lint:diagrams' now passes
  --exclude-code-blocks to ascii-guard. Without it, markdown tables
  and box-drawing diagrams inside fenced code blocks were being
  misidentified as malformed ASCII boxes, blocking the PR's
  docs-site-checks CI with 8 false-positive errors.

Validation:
* Targeted suite: tests/hermes_cli/test_profile_distribution.py —
  56/56 pass (down from 43 — reorganized to cover the new
  local-dir paths).
* Regression: test_profiles.py + test_profile_export_credentials.py
  102/102 still pass. export/import behaviour unchanged.
* Docs lint: ascii-guard lint --exclude-code-blocks docs returns
  0 errors (was 8 on the PR before the flag bump).
* E2E: ran the real `hermes profile install`/`info` against a
  local staging dir under an isolated HERMES_HOME — install writes
  SOUL.md + skills to the target profile, info reads the manifest
  back, a bogus source produces a clear error, and `hermes profile
  pack` is now rejected by argparse as expected.

* feat(profile-dist): distribution-aware list/show/delete + installed_at + env preview

Polish pass on top of the git-only scope cut. Five additions, all small,
wiring into existing commands rather than adding new surface.

1. `installed_at` timestamp on the manifest
   * Stamped automatically inside plan_install() on both fresh install
     and update — ISO-8601 UTC, seconds resolution.
   * Surfaced in `hermes profile info` as `Installed:    <ts>`.
   * Lets users tell "installed 6 months ago, needs update" from
     "installed yesterday" without guessing from file mtimes.

2. `hermes profile list` grows a `Distribution` column
   * Plain profiles: "—"
   * Distribution profiles: "<name>@<version>" (e.g. `telemetry@1.2.3`)
   * ProfileInfo gains three optional fields — distribution_name,
     distribution_version, distribution_source — populated by a new
     _read_distribution_meta() helper that swallows manifest read errors
     so a broken distribution.yaml in one profile can't break `list`
     for the others.

3. `hermes profile show` and `hermes profile delete` surface
   distribution provenance
   * show: `Distribution: name@version` + `Installed from: <source>`
     plus a pointer to `hermes profile info <name>` for the full
     manifest.
   * delete: same lines in the pre-confirmation preview, so a user
     deleting "telemetry" can see it came from
     `github.com/kyle/telemetry-distribution` before they type
     `telemetry` to confirm. No change to the confirmation gate itself —
     deletion semantics are identical to plain profiles.

4. Install preview checks env vars against the current environment
   * Replaces the "Env vars you'll need to set:" header with a simpler
     "Env vars:" block.
   * Each required var is labeled:
     - `✓ set` — already in `os.environ` OR present as a key in the
       target profile's existing .env (update case).
     - `needs setting` — required but not found in either place.
     - `—` — optional.
   * Mirrors pip's "Requirement already satisfied" UX: no unnecessary
     nagging about keys the user already has configured.

5. Docs: private distributions
   * New "Private distributions" section in
     website/docs/reference/profile-commands.md explaining that we
     shell out to the user's `git` binary, so SSH keys / credential
     helpers / GitHub CLI stored creds all work transparently. One
     paragraph, two examples.
   * `hermes profile info` section updated to mention `Installed:`.

Module-level hoist:
* `from datetime import datetime, timezone` was previously lazy-imported
  inside plan_install(). Hoisted to module scope so tests can monkeypatch
  `hermes_cli.profile_distribution.datetime` to freeze time.

Tests (+7):
* TestInstalledAtStamp.test_install_stamps_installed_at — format check
  (4-digit year, 'T', +00:00 suffix).
* TestInstalledAtStamp.test_update_refreshes_installed_at — freezes
  datetime.now() to 2099-01-01 and confirms update writes a new stamp.
* TestProfileInfoDistribution.test_installed_distribution_shows_in_list
  — ProfileInfo.distribution_{name,version,source} populated after install.
* TestProfileInfoDistribution.test_plain_profile_has_no_distribution_fields
  — plain profiles have None.
* TestProfileInfoDistribution.test_malformed_manifest_does_not_break_list
  — broken distribution.yaml in one profile doesn't break list_profiles().

Validation:
* 163/163 tests pass (56 distribution + 102 profile regression +
  5 new from this commit — up from 158).
* docs-lint: 0 errors.
* E2E verified: install preview shows ✓/needs-setting per env var,
  `profile list` shows Distribution column, `profile show` + `delete`
  preview mentions source URL, `info` shows Installed: timestamp.

* fix(profile-dist): clean errors + warn when overwriting plain profiles

Two small polish fixes found during collision sweeps of the PR:

1. ValueError from validate_profile_name now caught cleanly
   * A distribution.yaml whose 'name' field can't be used as a profile
     identifier (spaces, path traversal, etc.) raises ValueError from
     hermes_cli.profiles.validate_profile_name, which was escaping as a
     raw Python traceback from 'hermes profile install/update/info'.
   * Broadened the except clause in all three handlers to catch
     (DistributionError, ValueError) — users now see:
       Error: Invalid profile name '../../etc/passwd'. Must match
              [a-z0-9][a-z0-9_-]{0,63}
     instead of a stack trace.

2. Install preview distinguishes plain profile overwrite from
   distribution re-install
   * When plan.target_dir exists and IS a distribution (has
     distribution.yaml), preview still shows the mild
       (profile exists — will overwrite distribution-owned files only)
   * When plan.target_dir exists but is a HAND-BUILT plain profile (no
     distribution.yaml), preview now shows a loud warning:
       ⚠ Profile exists but is NOT a distribution.  Installing here will
         overwrite its SOUL.md, skills/, cron/, and mcp.json.
         Your memories, sessions, auth.json, and .env will be preserved,
         but any hand-edits to distribution-owned files will be lost.
   * Users who type 'hermes profile install foo --force' against a
     profile they hand-built now see what they're signing up for. User
     data is still safe (memories, sessions, auth, .env are in
     USER_OWNED_EXCLUDE), but custom SOUL/skills get stomped.

Tests (+2):
* TestErrorSurfaces.test_bad_profile_name_raises_valueerror_not_traceback
* TestErrorSurfaces.test_path_traversal_name_rejected

Validation:
* 165/165 tests pass (was 163).
* E2E: bad manifest names produce 'Error: Invalid profile name ...'
  with no traceback; installing over a plain profile shows the warning;
  re-installing over an existing distribution shows the normal
  overwrite message.
* Bad HTTPS URLs still produce 'Error: git clone failed: ...' — git
  itself generates a clean enough message that no wrapper is needed.
* 'install .' works correctly from any cwd.

* fix(profiles): reject reserved names at validate time

Before: `hermes profile create hermes` / `profile install` / `profile rename`
all silently accepted reserved names like `hermes`, `test`, `tmp`, `root`,
`sudo`. The profile directory was created; only alias creation failed (via
check_alias_collision), leaving a confusingly-named profile on disk — e.g.
`~/.hermes/profiles/hermes/` sitting next to `~/.hermes/` itself.

The reserved set already exists (_RESERVED_NAMES, introduced alongside alias
collision detection). This commit moves the check up one layer to
validate_profile_name so every entry point — create, install, import,
rename, dashboard web API — shares the same gate.

The error message points the user at the cause without being cryptic:
  Error: Profile name 'hermes' is reserved — it collides with either the
  Hermes installation itself or a common system binary.  Pick a different
  name.

`default` continues to pass through (it's a special alias for ~/.hermes).
_HERMES_SUBCOMMANDS (`chat`, `model`, `gateway`, etc.) stays at
alias-collision time only — those are fine as bare profile names with
`--no-alias`.

Tests (+5): test_reserved_names_rejected parametrized over the full
_RESERVED_NAMES set, matching the existing pattern in TestValidateProfileName.

No existing test uses a reserved name as a profile identifier (greppped
create_profile("hermes|test|tmp|root|sudo") — zero hits).

Validation:
* 170/170 tests pass in the profile suites.
* E2E: `profile create hermes`, `profile install` with manifest
  name=hermes, and `profile install ... --name hermes` all produce the
  same clean `Error: Profile name 'hermes' is reserved ...` with rc=1
  and no traceback. Normal names (`mybot`) still work.
2026-05-08 10:04:32 -07:00
Teknium
45d860d424 fix(msgraph): stream download_to_file body instead of buffering
The prior implementation routed download_to_file through the shared
_request() path, which uses httpx.AsyncClient.request() inside a
context manager that closes before aiter_bytes() iterates. The body
was read into memory first and the chunked write loop replayed it
from buffer. On small test payloads this was invisible; on real
Teams meeting recordings (hundreds of MB) it would force the full
artifact into RAM per download.

Rewrites download_to_file to open its own AsyncClient and use
client.stream(), keeping the context open across the aiter_bytes
iteration so the body is actually streamed chunk-by-chunk to disk.
Retry/token-refresh/Retry-After semantics are preserved by handling
them inline on the stream path. Partial .part files are cleaned up
on transport errors and on exhausted retries.

Adds three tests: large-payload streaming verifies the chunk loop
runs multiple times (discriminator: 512 KiB at chunk_size=65536
yields 8 chunks under streaming, 1 under buffering), transient-5xx
retry recovers after a single retry, and exhausted-retry cleans up
the partial file.
2026-05-08 09:27:26 -07:00
Dilee
b878f89f66 test(msgraph): cover concurrent token cache reuse 2026-05-08 09:27:26 -07:00
Dilee
a152c706b7 feat(msgraph): add auth and client foundation 2026-05-08 09:27:26 -07:00
Teknium
839cdd1b05 fix(approval): cron jobs must not be treated as gateway context
The new _is_gateway_approval_context() widened the gateway classification
to any call with HERMES_SESSION_PLATFORM bound via contextvars. But
cron/scheduler.py binds that same contextvar for delivery routing on
cron jobs that originate from a gateway platform (telegram/discord/etc.),
so those jobs were getting routed through submit_pending with no
listener — blocking indefinitely instead of honoring approvals.cron_mode.

Short-circuit on HERMES_CRON_SESSION before any gateway check. Cron is
always governed by cron_mode config, regardless of where the job was
scheduled from.

Adds regression coverage in TestCronWithGatewayOrigin and records the
contributor email mapping for scripts/release.py.
2026-05-08 07:30:14 -07:00
Zhicheng Han
526c0e018a feat(api-server): expose run approval events 2026-05-08 07:30:14 -07:00
Teknium
674fad1483
fix(goals): Ctrl+C during /goal loop auto-pauses the goal (#21888)
Reported: Ctrl+C during an active /goal loop felt like it did nothing —
the agent would interrupt the current turn, then immediately queue another
continuation and keep going until the session ended or the 20-turn budget
ran out.

Root cause: cli.py's _maybe_continue_goal_after_turn() ran in the finally:
block around self.chat(...) unconditionally. Whether the turn completed
normally, got interrupted, or returned an empty string, the judge ran on
whatever was in conversation_history and — because the judge is fail-open
— a "continue" verdict pushed another CONTINUATION_PROMPT onto
_pending_input. Ctrl+C was invisible to the hook.

Fix:
- chat() now captures result['interrupted'] onto self._last_turn_interrupted
  (resets to False at entry so early-returns don't leak prior state).
- _maybe_continue_goal_after_turn() checks the flag first: on interrupt,
  auto-pause via mgr.pause(reason='user-interrupted (Ctrl+C)') and print
  a one-liner pointing the user at /goal resume or /goal clear. No judge
  call, no continuation enqueued.
- Also added an empty-response guard that mirrors gateway/run.py's
  _handle_message logic (empty reply → transient failure → skip judging
  so we don't trip the consecutive-parse-failures backstop unnecessarily).

The goal stays in the DB as paused, so /goal resume recovers it after
the user has sorted out whatever made them cancel. /goal clear still
works as before for a full stop.

Tests: tests/cli/test_cli_goal_interrupt.py covers:
  - interrupted turn pauses + doesn't queue + judge is NOT called
  - paused goal is resumable
  - empty / whitespace / missing assistant reply skips judging
  - healthy turn still enqueues continuation / marks done
  - chat() resets _last_turn_interrupted at entry (anti-leak guard)

All 55 existing goal tests still pass.
2026-05-08 06:53:13 -07:00
Shannon Sands
80775d7585 test(auth): assert Nous refresh rotation payload 2026-05-08 04:17:42 -07:00
Shannon Sands
b32461f6e8 fix(auth): send Nous refresh token via header 2026-05-08 04:17:42 -07:00
Teknium
486b14b423
feat(cron): routing intent — deliver=all fans out to every connected channel (#21495)
Adds one reserved token to the cron `deliver` field:

- `all` — expand to every platform with a configured home channel

Resolves at fire time, not create time, so a job created before Telegram
was wired up picks it up once `TELEGRAM_HOME_CHANNEL` is set. Composes
with existing targets: `origin,all`, `all,telegram:-100:17`.

Inspired by Vellum Assistant's reminder routing-intent system.

## Changes
- cron/scheduler.py: _expand_routing_tokens + integrate into _resolve_delivery_targets
- tools/cronjob_tools.py: schema description updated
- tests/cron/test_scheduler.py: TestRoutingIntents (5 cases)
- website/docs/user-guide/features/cron.md: docs + table rows

## Validation
- tests/cron/test_scheduler.py -k 'Routing or Deliver' → 57 passed
2026-05-08 04:17:21 -07:00
kshitijk4poor
81928f03ab refactor(gmi): move User-Agent to profile.default_headers
The previous revision of this PR added six GMI-specific branches
(`elif base_url_host_matches(..., 'api.gmi-serving.com')`) across
run_agent.py and agent/auxiliary_client.py, plus a _HERMES_UA_HEADERS
constant in auxiliary_client.py.

ProviderProfile already has a `default_headers: dict[str, str]` field
commented as 'Client-level quirks (set once at client construction)'.
Other plugins (ai-gateway, kimi-coding) already use it. Two of the four
auxiliary_client sites we previously patched already had a generic
`else: profile.default_headers` fallback that picked it up (so did
both run_agent sites).

This revision:

* Sets `default_headers={'User-Agent': 'HermesAgent/<ver>'}` on the
  GMI profile in plugins/model-providers/gmi/__init__.py.
* Reverts all six GMI-specific branches in run_agent.py and
  auxiliary_client.py.
* Adds the generic profile-fallback `else` block to the two
  auxiliary_client sites (`_to_async_client`, `resolve_provider_client`)
  that didn't have it yet. This benefits every provider whose profile
  declares default_headers, not just GMI — e.g. Vercel AI Gateway's
  HTTP-Referer/X-Title now flow through the async client path too.
* Replaces the GMI-specific URL-branch tests with a profile-level
  assertion and keeps the run_agent integration test (with
  `provider='gmi'` so the fallback picks up the profile).

Net diff vs main: +82/-0 across 5 files, touching only the GMI plugin,
two generic fallback blocks in auxiliary_client.py, AUTHOR_MAP, and
tests. No core files change.

Based on #20907 by @isaachuangGMICLOUD.
2026-05-08 03:22:11 -07:00
Teknium
307c85e5c1 fix(goals): auto-pause when judge model returns unparseable output
Weak judge models (e.g. deepseek-v4-flash) return empty strings or prose
when asked for the strict {done, reason} JSON verdict. The old code
failed-open to continue on every such turn, burning the entire turn
budget with log lines like

  judge returned empty response
  judge reply was not JSON: "Let me analyze whether the goal..."

and /goal clear could not stop it mid-loop without /stop.

After N=3 consecutive *parse* failures (transport/API errors don't
count — those are transient), the loop auto-pauses and prints:

  ⏸ Goal paused — the judge model (3 turns) isn't returning the
  required JSON verdict. Route the judge to a stricter model in
  ~/.hermes/config.yaml:
    auxiliary:
      goal_judge:
        provider: openrouter
        model: google/gemini-3-flash-preview
  Then /goal resume to continue.

The counter resets on any usable reply (both "done"/"continue" and
API errors) and persists across GoalManager reloads so cross-session
resumes carry the correct state.

Also fixes test_goal_verdict_send.py sharing a hardcoded session_id
across tests — the shared id only worked because the previous
_post_turn_goal_continuation was a never-awaited coroutine. Now that
PR #19160 made it properly awaited, the xdist test-leakage bug
surfaced. Each test gets a unique session_id via uuid suffix.
2026-05-07 17:33:09 -07:00
JC
03ddff8897 fix(gateway): defer goal status notices until after response delivery
Route goal status notices through the platform adapter send API and register post-delivery callbacks so completed-goal notices appear after the final assistant response. Also cancel queued synthetic goal continuations on /goal pause and /goal clear while preserving normal queued user messages.
2026-05-07 17:33:09 -07:00
Austin Pickett
7f92e5506e
Merge pull request #20942 from NousResearch/austin/fix/personality
fix(tui): preserve session when switching personality
2026-05-07 18:54:29 -04:00
teknium
292f468366 fix(mcp): unwrap platforms key in channels_list
channels_list was iterating directory.items() directly, yielding
("updated_at", str) and ("platforms", dict) pairs — neither passed
the isinstance(entries_list, list) check, so the inner loop never ran
and every call returned count=0 even when channel_directory.json was
populated.

The writer (gateway/channel_directory.py) wraps the payload as
{"updated_at": ..., "platforms": {...}}; every other reader in the
codebase unwraps via directory.get("platforms", {}). This aligns
channels_list with that convention.

Also tightens the existing test_channels_with_directory test, which
bypassed the bug by asserting against _load_channel_directory() directly
instead of calling channels_list. It now calls the tool end-to-end and
a new test_channels_with_directory_platform_filter covers the filter
path. Both tests fail against the pre-fix code.

Closes #21474

Co-authored-by: chrisworksai <262485129+chrisworksai@users.noreply.github.com>
2026-05-07 13:41:16 -07:00
Blake Johnson
9076a2e74e fix(agent): keep Nous GPT-5 fallback on chat completions 2026-05-07 13:04:42 -07:00
Teknium
24d48ffb82
feat(kanban): add specify — auxiliary LLM fleshes out triage tasks (#21435)
* feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks

The Triage column shipped with a placeholder 'a specifier will flesh
out the spec', but the specifier itself was never built. This wires
it up as a dedicated CLI verb.

`hermes kanban specify <id>` calls the auxiliary LLM (configured under
`auxiliary.triage_specifier`) to expand a rough one-liner into a
concrete spec — tightened title plus a body with Goal / Approach /
Acceptance criteria / Out-of-scope sections — then atomically flips
`status: triage -> todo` and recomputes ready so parent-free tasks
go straight to the dispatcher on the same tick.

Surface:

  hermes kanban specify <task_id>               # single task
  hermes kanban specify --all [--tenant T]      # sweep triage column
  hermes kanban specify ... --author NAME       # audit-comment author
  hermes kanban specify ... --json              # one JSON line per task

Design choices:

  - Parent gating is preserved. specify_triage_task flips to 'todo',
    then recompute_ready promotes to 'ready' only when parents are
    done — same rule as a normal parent-gated todo.
  - No daemon, no background watcher. Every invocation is explicit —
    keeps cost predictable and doesn't fight the dispatcher loop.
  - Response parse is lenient: strict JSON preferred, markdown-fence
    tolerated, raw-body fallback on malformed JSON so the LLM can't
    strand a task in triage.
  - All failure modes (no aux client, API error, task moved out of
    triage mid-call) return SpecifyOutcome(ok=False, reason=...) so
    --all continues past individual failures.

Changes:

  hermes_cli/kanban_db.py    + specify_triage_task()
  hermes_cli/kanban_specify.py  NEW (~220 LOC — prompt, parse, call)
  hermes_cli/kanban.py       + specify subcommand + _cmd_specify
  hermes_cli/config.py       + auxiliary.triage_specifier task slot
  website/docs/user-guide/features/kanban.md  specify + config notes
  website/docs/reference/cli-commands.md      CLI reference entry
  tests/hermes_cli/test_kanban_specify_db.py    NEW (10 tests)
  tests/hermes_cli/test_kanban_specify.py       NEW (20 tests)

Validation: 30/30 targeted tests pass. E2E: triage task -> specify ->
ends in 'ready' with events [created, specified, promoted] and the
audit comment recorded under the configured author.

* feat(kanban): wire specifier into dashboard and gateway slash

Follow-ups to the initial PR #21435 — closes the two gaps I'd left as
post-merge: dashboard button and first-class gateway surface.

Dashboard (plugins/kanban/dashboard/)
  - POST /tasks/:id/specify  NEW endpoint. Thin wrapper around
    kanban_specify.specify_task(). Returns the CLI outcome shape
    ({ok, task_id, reason, new_title}); ok=false with a human reason
    is a 200, not a 4xx, so the UI can render it inline without
    treating 'no aux client configured' as a crash.
  - Runs sync in FastAPI's threadpool because the LLM call can take
    tens of seconds on reasoning models.
  - Pins HERMES_KANBAN_BOARD around the specify call so the module's
    argless kb.connect() lands on the right board.
  - dist/index.js: doSpecify callback threaded through the drawer →
    TaskDetail → StatusActions prop chain.  Specify button appears
    ONLY when task.status === 'triage' (elsewhere the backend would
    reject anyway — hide the button to keep the action row clean).
    Busy state (Specifying…) + inline success/error banner under the
    button using the response.reason text.
  - dist/style.css: tiny hermes-kanban-msg-ok / -err classes using
    existing --color vars so themes reskin cleanly.

Gateway slash (/kanban specify)
  - Already works via the existing run_slash → build_parser →
    kanban_command pipeline. No code change needed — slash commands
    inherit the argparse tree automatically. Added coverage:
    test_run_slash_specify_end_to_end (create --triage, specify, verify
    promotion + retitle) and test_run_slash_specify_help_is_reachable.

Tests
  - tests/plugins/test_kanban_dashboard_plugin.py: 3 new tests for the
    REST endpoint — happy path, non-triage rejection as ok=false 200,
    missing aux client as ok=false 200.
  - tests/hermes_cli/test_kanban_cli.py: 2 new slash-surface tests.

Docs
  - website/docs/user-guide/features/kanban.md: dashboard action row
    description mentions  Specify + all three surfaces. REST table
    gains /tasks/:id/specify. Slash examples include /kanban specify.

Validation: 340/340 targeted tests pass. E2E via TestClient: create a
triage task over REST → POST /specify with mocked aux client → task
moves to 'ready' column on /board with new title and body applied.
2026-05-07 13:04:41 -07:00
adybag14-cyber
732a6c45fa feat: add termux doctor fallback guidance for blocked extras 2026-05-07 13:04:08 -07:00
adybag14-cyber
dc5ef1ac8e fix: add termux-all install profile and safe fallbacks 2026-05-07 13:04:08 -07:00
adybag14-cyber
da18fd084a fix: strengthen termux install network prerequisites 2026-05-07 13:04:08 -07:00
adybag14-cyber
54c0b10d14 fix(update): add heartbeat during dependency install 2026-05-07 13:04:08 -07:00
Abd0r
04193cf71c feat(web): add Brave Search (free tier) and DDGS search providers
Both implement WebSearchProvider via tools/web_providers/ — matching the
existing SearXNG pattern (PR #5c906d702). Search-only; pair with any
extract provider via web.extract_backend.

- tools/web_providers/brave_free.py — Brave Search API (free tier, 2k
  queries/mo). Uses BRAVE_SEARCH_API_KEY as X-Subscription-Token.
- tools/web_providers/ddgs.py — DuckDuckGo via the ddgs Python package.
  No API key; gated on package importability.
- tools/web_tools.py: both backends added to _get_backend() config list
  and auto-detect chain (trails paid providers), _is_backend_available,
  web_search_tool dispatch, web_extract_tool + web_crawl_tool search-only
  refusals, check_web_api_key, and the __main__ diagnostic. Introduces
  _ddgs_package_importable() helper so tests can monkeypatch a single
  symbol for the ddgs availability check.
- hermes_cli/tools_config.py: picker entries for both providers; ddgs
  gets a post_setup handler that runs `pip install ddgs`.
- hermes_cli/config.py: BRAVE_SEARCH_API_KEY in OPTIONAL_ENV_VARS.
- scripts/release.py: AUTHOR_MAP entry for @Abd0r.
- tests: 14 new tests (brave-free) + 15 new tests (ddgs) covering
  provider unit behavior, backend wiring, and search-only refusals.

Salvages the brave-free + ddgs portion of PR #19796. Not included: the
in-line helpers in web_tools.py (replaced with provider modules to match
the shipped architecture), the lynx-based extract path (these backends
should refuse extract with a clear error — users pair with a real
extract provider), and scripts/start-llama-server.sh (unrelated).

Co-authored-by: Abd0r <223003280+Abd0r@users.noreply.github.com>
2026-05-07 09:59:17 -07:00
xxxigm
cdc0a47dd5 test(hermes_constants): cover parse_reasoning_effort() 2026-05-07 09:59:07 -07:00
Teknium
7e2af0c2e8 feat(acp): pass image file attachments through as image_url parts
Extends PR #21400's resource inlining with image-specific handling: ACP
resource_link and embedded blob resources with an image/* mime (or image
file suffix when mime is missing) now emit an OpenAI image_url part
with a base64 data URL, so vision models actually see the image
instead of a [Binary file omitted] note. Non-image resources keep the
existing text-inlining behavior.

Adds 3 tests: local PNG via resource_link, JPEG mime inferred from
suffix when client omits mimeType, and embedded blob PNG.
2026-05-07 09:24:32 -07:00
HenkDz
733e297b8a fix(acp): inline file attachment resources 2026-05-07 09:24:32 -07:00
Teknium
2564132a1f
fix(telegram): preserve thread_id=1 for forum General typing indicator (#21390)
The May 5 refactor in d5357f816 made _message_thread_id_for_typing()
symmetric with _message_thread_id_for_send() by mapping the General
topic (thread id "1") to None upfront for both. That's correct for
sendMessage — Telegram rejects message_thread_id=1 on sends and the
topic must be omitted — but it's wrong for sendChatAction.

Observed behavior (confirmed via before/after Telegram wire traces):
  Before d5357f816: thread_id=1 → message_thread_id=1 → bubble visible in General
  After  d5357f816: thread_id=1 → message_thread_id=None → no visible typing

Omitting message_thread_id on sendChatAction does NOT fall back to
the General topic's view in a forum-enabled supergroup; the bubble
ends up hidden from the client's General-topic pane entirely. For
any user on a forum-group, the typing indicator stopped appearing.

Fix: drop the symmetric "1 → None" mapping from the typing resolver.
sendMessage still maps 1 → None via _message_thread_id_for_send (that
side was never broken). The asymmetry is real and required by
Telegram's API — document it in the resolver docstring.

Partial revert of d5357f816; restores the behavior from 0cf7d570e
("fix(telegram): restore typing indicator and thread routing for
forum General topic"). Does not re-introduce the retry-without-thread
fallback that 41545f7ec scoped down for DM topics — with the resolver
fixed, the first call already hits the right wire shape.

Test updated from test_send_typing_general_topic_uses_none_thread_id
(which encoded the broken contract) to
test_send_typing_preserves_general_topic_thread_id, asserting the
single correct call with message_thread_id=1. 10 other tests in the
file untouched and passing.
2026-05-07 08:39:21 -07:00
Teknium
812ce0b987
fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385)
When empty-response terminal scaffolding fires on a tool-result turn,
_drop_trailing_empty_response_scaffolding left the live history ending at
a bare 'tool' message. The next user input then landed as [...tool, user],
a protocol-invalid sequence that OpenRouter/Opus and other providers
silently fail on (returns empty content). That retriggered the empty-retry
recovery every turn, and recovery flags never hit SQLite (no column for
them), so history kept looking broken on every reload.

Two fixes:

1. Scaffolding strip rewinds the orphan assistant(tool_calls)+tool pair
   after popping sentinels. Only fires when scaffolding flags were
   actually present, so mid-iteration tool loops are untouched.

2. _repair_message_sequence runs right before every API call as a
   defensive belt: drops stray tool messages with unknown tool_call_ids,
   merges consecutive user messages so no user input is lost. Does NOT
   rewind assistant(tool_calls)+tool+user — that pattern is valid when
   the user redirected before the model got its continuation turn.

Repro: session 20260507_044111_fa7e65. Opus-4.7/OpenRouter returned
content-less response after a 42KB execute_code output, nudge+retry
chain exhausted (no fallback configured), terminal sentinel appended,
scaffolding stripped leaving bare tool tail, user typed 'wtf happened..'
and landed as tool→user violation. Every subsequent turn collapsed in
<50ms with the same 3-retry empty chain because the API request itself
was malformed.

Verified live via HTTP mock: pre-fix reproduced 5 api_calls/0.15s exit
'empty_response_exhausted'; post-fix 1 api_call/0.10s exit
'text_response(finish_reason=stop)'. Three-turn session flows cleanly
through the scenario. Full run_agent suite: 1242 passed (0 regressions,
2 pre-existing concurrent_interrupt failures unrelated).
2026-05-07 08:35:10 -07:00
Teknium
1d2029b2b7
fix(update): reset-failed before every fallback restart so the gateway can't get stranded (#21371)
cmd_update's auto-restart path could leave the gateway dead after a
transient failure in systemd's own auto-restart window.  Reproduced
on Ubuntu 25.10 + systemd 257: after update, gateway drains and exits 75,
systemd's first respawn 60s later fails (status=200/CHDIR with
"No such file or directory" on a WorkingDirectory that demonstrably
exists), the unit ends up in RestartMaxDelaySec=300 backoff, and
cmd_update's fallback 'systemctl restart' never recovers it — leaving
users with a permanently silent gateway until they manually run
'systemctl reset-failed'.

The fix mirrors the recovery pattern 'hermes gateway restart'
(systemd_restart) got in PR #20949: always reset-failed before
restart, on both the initial fallback and the retry.  Also rewrites
the final failure message to tell the user to reset-failed +
restart (not just restart, which is the step that already failed
twice).
2026-05-07 08:34:12 -07:00
Teknium
04918345ea
fix(cron): initialize MCP servers before constructing the cron AIAgent (#21354)
cron/scheduler.py:run_job() constructed AIAgent(...) without ever calling
discover_mcp_tools(). The CLI and gateway paths do this at startup; cron
jobs inherited none of it and the user's configured mcp_servers were
invisible inside every cron run.

Insert discover_mcp_tools() right before AIAgent(), wrapped in try/except
so a broken MCP server can't kill an otherwise-working cron job. The call
is idempotent: register_mcp_servers() short-circuits on already-connected
servers, so subsequent ticks in the same scheduler process pay ~0ms.
Scoped to the LLM path only; no_agent script jobs skip it entirely.

Closes #4219.
2026-05-07 07:53:03 -07:00
WideLee
4de3ef38b1 feat(qqbot): wire native tool-approval UX via inline keyboards
Makes the in-tree QQ inline keyboards actually light up when the agent
blocks on a dangerous-command approval. Matches the cross-adapter
gateway contract already implemented by Discord, Telegram, Slack,
Matrix, and Feishu.

Gateway/run.py's _approval_notify_sync checks type(adapter).send_exec_approval
and falls back to a text prompt when it's missing. Without this wiring,
QQ users stared at plain '/approve' text even though the adapter shipped
button primitives.

### send_exec_approval(chat_id, command, session_key, description, metadata)

Matches the signature the gateway calls with. Builds an ApprovalRequest
(command_preview, description, timeout) and delegates to send_approval_request.
Uses the last inbound msg_id as reply_to so QQ accepts the passive
message. The 'metadata' parameter is accepted for contract parity but
intentionally unused — QQ doesn't have thread_id/DM-targeting overrides.

### send_update_prompt(chat_id, prompt, default, session_key, metadata)

Signature updated to match the cross-adapter contract used by
'hermes update --gateway' watcher. Renders a 'Update Needs Your Input'
prompt with the optional default hint and a Yes/No keyboard. Replaces
the earlier 3-arg helper that wasn't wired anywhere.

### Default interaction dispatcher

_default_interaction_dispatch() auto-registered as the adapter's
interaction callback in __init__. Routes:

- approve:<session_key>:<decision> → tools.approval.resolve_gateway_approval
  Button → choice mapping:
    allow-once  → 'once'
    allow-always → 'always'
    deny        → 'deny'
  (QQ's 3-button mobile layout deliberately collapses 'session' + 'always'
  into one button; /approve session text fallback remains available.)
- update_prompt:<answer> → atomic write of y/n to ~/.hermes/.update_response
  (the detached 'hermes update --gateway' watcher polls this file)
- anything else → logged and dropped

Resolve exceptions are caught and logged — never propagate into the WS
loop. Callers can override via set_interaction_callback() to route
clicks elsewhere or pass None to drop them entirely.

### Net effect

QQ users now get native tap-to-approve UX on dangerous-command prompts
and update-confirmation prompts, without having to type /approve or /deny
as text. The adapter hooks into tools.approval the same way every other
button-capable platform does.

### Tests

14 new tests cover:
- Default callback installed on __init__
- send_exec_approval / send_update_prompt exist as class methods (so the
  gateway's type-probe detects them)
- allow-once/always/deny each map to the correct resolve choice
- update_prompt:y / update_prompt:n each write atomically to the response
  file (via monkeypatched get_hermes_home)
- Unknown button_data / empty button_data / resolve exceptions are harmless
- send_exec_approval honours last_msg_id reply-to and accepts metadata
- send_update_prompt delegates with correct content + keyboard

Full qqbot suite: 144 passed (72 pre-existing + 72 from this salvage arc).
Also ran tools/test_approval.py alongside — no regressions (276 passed
combined).

Co-authored-by: WideLee <limkuan24@gmail.com>
2026-05-07 07:48:15 -07:00
Teknium
a1fe5f473d
fix(cron): scan assembled prompt including skill content (#3968) (#21350)
_scan_cron_prompt ran at cron create/update time on the user-supplied
prompt but skill content loaded inside _build_job_prompt at runtime
was never scanned. Combined with non-interactive auto-approval, a
malicious skill carrying an injection payload could execute with full
tool access every tick.

- cron/scheduler.py: new CronPromptInjectionBlocked exception and
  _scan_assembled_cron_prompt helper. _build_job_prompt now routes
  both return paths (with skills / without skills) through the helper,
  raising on match. run_job catches the exception and returns a clean
  (False, blocked_doc, "", error) tuple so the operator sees a BLOCKED
  delivery with the scanner result and an audit hint, rather than a
  scheduler crash or a silent skip.
- tests/cron/test_cron_prompt_injection_skill.py: 10 regression tests.
  Unit coverage on _scan_assembled_cron_prompt (clean/injection/exfil/
  invisible-unicode). End-to-end coverage via _build_job_prompt with
  planted skills (injection payload, env exfil, zero-width space,
  clean control, missing-skill-doesn't-crash). Fixture patches
  tools.skills_tool.SKILLS_DIR / HERMES_HOME so planted skills are
  visible. Importantly uses the current cron.scheduler module object
  (not a top-level import) so tests don't break when other fixtures
  reload cron.scheduler — CronPromptInjectionBlocked identity depends
  on which module object defined it.
2026-05-07 07:44:10 -07:00
maciekczech
162ad3dd16 fix(kanban): filter dashboard board by selected tenant 2026-05-07 07:39:57 -07:00
maciekczech
f4de3810ef test(kanban): cover dashboard select filter wiring 2026-05-07 07:39:57 -07:00
Teknium
74c9c0eec9
fix(mcp): gate utility stubs on server-advertised capabilities (#21347)
For every connected MCP server we register four "utility" tool schemas
(mcp_<server>_list_resources, read_resource, list_prompts, get_prompt).
The existing gate was `hasattr(server.session, method)` — but
`mcp.ClientSession` defines all four methods on the class regardless of
what the remote server supports, so the gate never filtered anything.
Tools-only servers (e.g. @upstash/context7-mcp which advertises only
`tools`) ended up with 4 dead stubs; every model call to them returned
JSON-RPC -32601 Method not found, which made the model conclude the
server was broken even when the real tools worked.

Capture the `InitializeResult` returned by `await session.initialize()`
on the `MCPServerTask`, then gate each utility schema on the
corresponding `capabilities` sub-object (resources / prompts). A
legacy `hasattr` fallback runs when `initialize_result` is missing
(older test fixtures / not-yet-captured code paths) so pre-existing
behavior is preserved.

Verified against real `mcp.types.InitializeResult` pydantic models:
- Context7 shape (tools only) → 0 utility stubs registered (was 4)
- Resources-only server → 2 stubs (list_resources, read_resource)
- Prompts-only server → 2 stubs (list_prompts, get_prompt)
- Fully capable server → all 4 stubs

Closes #18051.

Co-authored-by: nikolay-bratanov <nikolay-bratanov@users.noreply.github.com>
2026-05-07 07:39:50 -07:00
teknium1
898b6d7d55 fix(webhook): widen INSECURE_NO_AUTH loopback check + tests + docs
Follow-up to the previous commit:
- Add _is_loopback_host() helper covering 127.0.0.1, localhost, ::1,
  ip6-localhost, ip6-loopback (case-insensitive). Empty/None host is
  treated as non-loopback since unset usually means public default bind.
- Fix mixed-indent comment in the safety rail (comment now aligned with
  the if-block) and collapse the nested-if into one condition.
- Add TestInsecureNoAuthSafetyRail covering rejection on 0.0.0.0, a LAN
  IP, and empty host; allowance on 127.0.0.1/localhost; plus unit-level
  parametrized coverage of _is_loopback_host for spellings we can't bind
  in the hermetic test env (::1, ip6-localhost, ip6-loopback).
- Pin test_connect_starts_server + test_webhook_deliver_only defaults
  to 127.0.0.1 so they keep passing under the new rail.
- Document the behavior in website/docs/user-guide/messaging/webhooks.md.
2026-05-07 07:38:43 -07:00
WideLee
5b121c6e35 feat(qqbot): process attachments in quoted (reply) messages
When a user replies while quoting another message, QQ sets
'message_type = 103' and pushes the referenced message's content +
attachments inside 'msg_elements[0]'. The old adapter ignored
msg_elements entirely, so:

- Bare quote-replies (no user text) surfaced nothing to the LLM.
- Quoted images/files/voice were never downloaded or described.
- Quoted voice messages specifically produced no transcript — the model
  had no way to see what the user was referring to when saying 'about
  this voice note…'.

This commit adds _process_quoted_context(d) which extracts msg_elements,
unions their attachments, and runs them through the SAME
_process_attachments pipeline as the main message body. Quoted voice
gets an STT transcript (tried via QQ's asr_refer_text first, then the
configured STT provider); quoted images get cached just like main-body
images; quoted files surface with their original filename intact (not
the CDN URL hash).

The quoted content is prepended to the user's text as a '[Quoted message]:'
block so the LLM sees the full referential context on one turn.
Images-only quotes surface a '[Quoted message]: (image)' marker so the
model knows an image was referenced even if no text came with it.

All four inbound handlers (_handle_c2c_message, _handle_group_message,
_handle_guild_message, _handle_dm_message) now call the helper uniformly
— one merge pattern, not four divergent implementations.

Filename preservation is carried by _process_attachments' existing
'[Attachment: {filename or ct}]' line; nothing else needed for that.

12 new tests under TestProcessQuotedContext and TestMergeQuoteInto cover:

- Non-quote messages short-circuit to empty
- message_type=103 with no msg_elements is harmless
- Text-only quotes render with '[Quoted message]:' prefix
- Voice attachments in the quote flow through STT
- File attachments in the quote preserve the original filename
- Image attachments surface cached paths + media types
- Images-only quote still emits a marker
- Multiple msg_elements are concatenated
- Malformed message_type values return empty
- _merge_quote_into prepends with a blank-line separator

Full qqbot suite: 130 passed (72 existing + 19 chunked + 27 keyboards
+ 12 quoted).

Co-authored-by: WideLee <limkuan24@gmail.com>
2026-05-07 07:36:30 -07:00
WideLee
de584cd1dd feat(qqbot): add inline-keyboard approvals and update prompts
The QQ Bot v2 API supports inline keyboards on outbound messages. When a
user taps a button, the platform dispatches an INTERACTION_CREATE
gateway event; the bot ACKs it via PUT /interactions/{id} and decodes
the button's data payload to route the click.

This commit adds:

New module gateway/platforms/qqbot/keyboards.py

- Inline-keyboard dataclasses (InlineKeyboard, KeyboardRow, KeyboardButton,
  KeyboardButtonAction, KeyboardButtonRenderData, KeyboardButtonPermission)
  that serialize to the JSON shape the QQ API expects.
- build_approval_keyboard(session_key) — 3-button layout:
   允许一次 /  始终允许 /  拒绝, all sharing group_id='approval'
  so clicking one greys out the rest.
- build_update_prompt_keyboard() — Yes/No keyboard for update confirms.
- parse_approval_button_data() / parse_update_prompt_button_data() —
  decode the button_data payload from INTERACTION_CREATE.
  approve:<session_key>:<decision>  (decision = allow-once|allow-always|deny)
  update_prompt:<answer>            (answer = y|n)
- build_approval_text(ApprovalRequest) — markdown renderer for the
  surrounding message body (exec-approval and plugin-approval variants,
  with severity icons 🔴/🔵/🟡).
- parse_interaction_event(raw) → InteractionEvent dataclass — normalizes
  the nested raw payload (id / scene / openids / button_data / etc.).

Adapter changes (gateway/platforms/qqbot/adapter.py)

- _dispatch_payload routes INTERACTION_CREATE → _on_interaction.
- _on_interaction parses the event, ACKs via PUT /interactions/{id}, then
  invokes a user-registered interaction callback. Exceptions from the
  callback are caught and logged (never propagate into the WS loop).
- set_interaction_callback(cb) lets gateway wiring register a routing
  handler that inspects button_data and resolves the corresponding
  pending approval / update prompt.
- _send_c2c_text / _send_group_text now accept an optional keyboard kwarg
  and append it to the outbound body.
- send_with_keyboard(chat_id, content, keyboard, reply_to=None) — public
  helper that sends a single short message with a keyboard attached.
  Does NOT chunk-split (a keyboard message has one interactive surface).
  Guild chats are rejected non-retryably — they don't support keyboards.
- send_approval_request(chat_id, ApprovalRequest, reply_to=None) +
  send_update_prompt(chat_id, content, reply_to=None) — convenience
  wrappers over send_with_keyboard.

Tests

27 new unit tests under TestApprovalButtonData, TestUpdatePromptButtonData,
TestBuildApprovalKeyboard, TestBuildUpdatePromptKeyboard, TestBuildApprovalText,
TestInteractionEventParsing, and TestAdapterInteractionDispatch. Cover:

- Button-data round-trip (build → parse returns original session/decision)
- Keyboard JSON shape + mutual-exclusion group_id
- Exec vs plugin approval text templates + severity icons
- Interaction event parsing (c2c / group / guild scene codes)
- _on_interaction end-to-end: ACK invoked, callback receives parsed event,
  callback exceptions are swallowed, missing id skips ACK, no registered
  callback is harmless.

Full qqbot suite: 118 passed (72 existing + 19 chunked + 27 keyboards).

Co-authored-by: WideLee <limkuan24@gmail.com>
2026-05-07 07:36:30 -07:00
WideLee
9feaeb632b feat(qqbot): add chunked upload with structured error types
The v2 'single POST /v2/{users|groups}/{id}/files' upload path is capped
at ~10 MB inline (base64 'file_data' or 'url'). For larger files the QQ
platform provides a three-step flow:

  1. POST /upload_prepare           → upload_id + pre-signed COS part URLs
  2. PUT each part to its COS URL → POST /upload_part_finish
  3. POST /files with {upload_id}   → file_info token

This commit adds a new gateway/platforms/qqbot/chunked_upload.py module
that implements the flow, wires it into QQAdapter._send_media for local
files (URL uploads keep the existing inline path), and introduces
structured exceptions so the caller can surface actionable error text:

- UploadDailyLimitExceededError  (biz_code 40093002, non-retryable)
- UploadFileTooLargeError        (file exceeds the platform limit)

Both carry file_name / file_size_human / limit_human so the model can
compose user-friendly replies instead of seeing opaque HTTP codes.

The part_finish 40093001 retryable-error loop respects the server-
provided retry_timeout (capped at 10 minutes locally) with a 1 s
polling interval. COS PUTs retry transient failures up to 2 times
with exponential backoff. complete_upload retries up to 2 times.

Covers files up to the platform's ~100 MB per-file limit; before this
the adapter silently rejected anything over ~10 MB.

19 new unit tests under TestChunkedUpload* cover the happy path,
prepare-response parsing, helper functions, part retries, COS PUT
retries, group vs c2c routing, and the structured-error mapping.

Co-authored-by: WideLee <limkuan24@gmail.com>
2026-05-07 07:36:30 -07:00
Teknium
ac51c4c1ad
feat(kanban): per-task max_retries override (#20263 follow-up, supersedes #20972) (#21330)
Adds a per-task override for the consecutive-failure circuit breaker,
so individual tasks can opt out of the global ``kanban.failure_limit``
without dragging everyone else with them.

Resolution order (now three tiers):
  1. per-task ``max_retries`` (new, this commit)
  2. caller-supplied ``failure_limit`` — the gateway threads
     ``kanban.failure_limit`` from config here
  3. ``DEFAULT_FAILURE_LIMIT`` (2)

Changes:
- ``tasks.max_retries INTEGER`` column + migration for existing DBs
  (NULL = no override, matches pre-column behavior).
- ``Task.max_retries`` field + ``from_row`` plumbing.
- ``create_task(..., max_retries=N)`` kwarg.
- ``_record_task_failure`` reads the per-task value first and records
  ``limit_source`` + ``effective_limit`` on the ``gave_up`` event so
  operators can see which tier won.
- CLI: ``hermes kanban create --max-retries N`` (rejects ``< 1``).
- CLI: ``hermes kanban show`` surfaces the effective threshold +
  source (``(task)``, ``(config kanban.failure_limit)``, ``(default)``).
- CLI: ``_task_to_dict`` includes ``max_retries`` in ``--json`` output.

Key design choice vs. the earlier #20972 attempt:
- No new config key. The existing ``kanban.failure_limit`` (landed in
  #21183) is the dispatcher-tier source — no silent break for users
  who already tuned it.
- No ``!=`` sentinel for "is config set" (which would misfire when
  config equals the default). The tier-winner is determined purely
  by "is per-task override set" — the dispatcher always wins when
  per-task is NULL, regardless of whether the caller passed the
  default or a configured value.

E2E verified across four scenarios: default-only (trips at 2),
config-only (trips at caller's value), per-task-only beats default
(trips at task value), per-task beats larger config (trips at task
value). ``gave_up`` event metadata correctly records ``limit_source``
and ``effective_limit`` in all cases.

Tests:
- ``test_per_task_max_retries_overrides_dispatcher_limit`` — task=1
  beats caller=10.
- ``test_per_task_max_retries_allows_more_than_default`` — task=5
  does not trip at caller=default of 2.
- ``test_max_retries_none_falls_through_to_dispatcher_limit`` — None
  honors caller's config value (4), records ``limit_source=dispatcher``.

Full kanban trio (db + core + cli + tools + dashboard-plugin): 342
passed, no regressions.

Supersedes: #20972 (@jelrod27) — credit in PR close comment.
Ref: #20263 (tangentially — the reporter asked about adapter API
drift, not retry caps, but the CLI discussion there is what
surfaced the original ask).
2026-05-07 07:29:02 -07:00
Teknium
145e8ec237
fix(pairing): enforce lockout on approve_code, not just generate_code (#10195) (#21325)
PairingStore.approve_code() didn't consult _is_locked_out(), so after
MAX_FAILED_ATTEMPTS bad approvals the lockout flag was set but a valid
code still got accepted — any pending code (legitimately issued or
attacker-obtained) could be approved during the 1-hour lockout window,
nullifying the brute-force protection.

- gateway/pairing.py: lockout check runs in approve_code() right after
  _cleanup_expired, before the pending lookup. Returns None on lockout.
- tests/gateway/test_pairing.py: test_lockout_blocks_code_approval pins
  the regression — reporter's exact reproducer (generate valid code,
  exhaust attempts with WRONGCODE, try to approve valid code) must
  return None and leave is_approved == False. Also pins recovery: once
  lockout expires, the still-pending code approves normally.
- hermes_cli/pairing.py: _cmd_approve distinguishes the two None cases.
  On lockout, prints 'Platform locked out... clears in N minutes. To
  reset sooner, delete the _lockout:<platform> entry from
  _rate_limits.json' instead of the misleading 'Code not found or
  expired' message. 29/29 pairing tests pass; E2E-verified with
  reporter's exact Python reproducer.
2026-05-07 07:18:21 -07:00
qWaitCrypto
62c2f5d8d2 fix(mcp): coerce numeric tool args defensively 2026-05-07 07:17:12 -07:00