hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-09 03:11:58 +00:00

Author	SHA1	Message	Date
Teknium	850413f120	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-05-08 11:07:38 -07:00
Teknium	b8d7e0e6d3	fix(msgraph_webhook): harden auth surface + IP allowlisting + response hygiene Defense-in-depth polish on top of the webhook listener before it becomes a real attack surface once the pipeline starts creating subscriptions and Graph starts POSTing to the configured public URL. - Timing-safe clientState comparison. Previously used `==` on strings; switches to hmac.compare_digest so a mismatch does not leak how many leading characters matched. client_state is documented as a strong shared secret (openssl rand -hex 32 in the setup docs), so a timing-safe primitive is the right call. - Split GET and POST handlers. Graph validates a subscription by sending GET with validationToken in the query; anything else on GET is now a 400 so the endpoint cannot be probed or mistakenly used for data exfil. Previously a bare GET fell through to the POST path and blew up on request.json() with a confusing 400. - Empty response bodies on success. 202 is returned with no body so internal counters (accepted / duplicates / scheduled) do not leak to any caller that can reach the endpoint; counters remain observable via /health for operators. 403 on every-item-bad-clientState batches (so forged POSTs stop retrying), 400 on malformed / unknown-resource batches (sender configuration issue). - Optional source-IP allowlist. New `allowed_source_cidrs` extra field (list or comma-separated string) and `MSGRAPH_WEBHOOK_ALLOWED_SOURCE_CIDRS` env var let operators restrict the webhook to Microsoft Graph's published webhook source ranges in production. Empty = allow all, preserving dev-tunnel / localhost workflows. Invalid CIDRs are logged and ignored rather than crashing. Also gates the handshake endpoint so disallowed IPs cannot probe it. - Tests updated for the new response contract (empty-body 202, auth-only 403, config-error 400) and extended to cover: bare GET rejection, POST-with-validationToken handshake tolerance, timing-safe compare actually invoked via hmac.compare_digest spy, malformed body / missing value array, IP allowlist accept/reject paths, handshake IP allowlist, invalid CIDR entries, comma-string CIDR list parsing. 52/52 passed (was 40). Full gateway suite: 5049 passed / 1 pre-existing failure in test_discord_free_response (unrelated, reproduces on clean origin/main).	2026-05-08 10:29:58 -07:00
Dilee	26a59e4f6c	fix(msgraph): normalize webhook dedupe and resource matching	2026-05-08 10:29:58 -07:00
Dilee	2a215de9af	fix(msgraph): bound webhook receipt dedupe cache	2026-05-08 10:29:58 -07:00
Dilee	46a6f39024	feat(msgraph): add webhook listener platform	2026-05-08 10:29:58 -07:00
Teknium	f209a35859	feat(profile): shareable profile distributions via git (#20831 ) * feat(profile): shareable profile distributions (pack/install/update/info) Closes #20456. Turns a profile into a portable, versioned artifact. Packs SOUL.md, config, skills, cron, and an env-var manifest into a tar.gz that others can install from a local path, URL, or git repo. Updates re-pull the distribution while preserving user data (memories, sessions, auth.json, .env) and the user's config.yaml overrides. New subcommands (under hermes profile, no parallel tree): hermes profile pack <name> [-o FILE] hermes profile install <source> [--name N] [--alias] [--force] [-y] hermes profile update <name> [--force-config] [-y] hermes profile info <name> Manifest (distribution.yaml at the profile root): name, version, hermes_requires, author, env_requires, distribution_owned. Security: - Installer shows manifest + env-var requirements before mutating disk; confirmation required unless -y. - auth.json and .env are never packed (same exclude set as profile export). - Cron jobs are packed but NOT auto-scheduled — user is pointed at 'hermes -p <name> cron list' to review. - Archive extraction rejects path traversal (../ members). - Alias creation is opt-in via --alias. Update semantics: - Distribution-owned paths (SOUL.md, skills/, cron/, mcp.json, manifest): replaced from the new archive. - config.yaml: preserved by default; --force-config to overwrite. - User-owned paths (memories/, sessions/, auth.json, .env, state.db, logs/, workspace/, plans/, home/, _cache/, local/): never touched. Version pin: hermes_requires accepts >=, <=, ==, !=, >, < or a bare version (treated as >=). Install fails with a clear error when the running Hermes version doesn't satisfy the spec. Sources supported by 'install': - Local .tar.gz / .tgz archive - Local directory - HTTP(S) URL pointing to a .tar.gz (uses httpx, already a dep) - Git URL (github.com/user/repo, https://..., git@..., ssh://, git://) Tests: 43 new unit tests (manifest parsing, version checks, env template, pack/install/update round-trip, config-preservation, security). E2E validated via real CLI invocations against an isolated HERMES_HOME covering pack, install with confirmation, update preservation, update --force-config, decline-preview, duplicate-install rejection, and version-requirement rejection. * refactor(profile-dist): git-only — drop tar.gz/HTTP transports and pack Scope-cut on top of the original distribution PR: a profile distribution is now exclusively a git repository (or a local directory during development). The tar.gz / HTTP archive transports and the matching `hermes profile pack` subcommand have been removed. Why: * GitHub tags, branches, and commits are already the right versioning primitive. Tag pushes do for us what 'pack + upload' did. * `hermes profile export` / `import` already cover local backup and restore; they are not a distribution format and stay untouched. * One transport means one install/update code path, one doc page, and one mental model. The extra source types doubled the surface for no real user win — GitHub auto-attaches release tarballs, and `git bundle` / `git clone --mirror` cover the airgap case. Changes: * hermes_cli/profile_distribution.py — removed pack_profile, _fetch_tar_archive (_http_fetch), _safe_extract, _archive_roots, _safe_parts, _find_dist_root, tarfile/io/urlparse imports. The new _stage_source has two arms: git URL → clone, local directory → use in place. * hermes_cli/main.py — removed the 'pack' subparser and action handler. Install help text updated to match the reduced source list. * tests/hermes_cli/test_profile_distribution.py — rewritten around a local-directory staging fixture. The install/update/describe suites now build a distribution tree on disk directly and install from it, which is what a real git clone produces after .git is stripped. Dropped TestPack, TestFindDistRoot, and the tar-specific security test. New tests cover _looks_like_git_url, env_example emission, hermes_requires enforcement, and 'installer does not import credentials if an author mistakenly leaks them in the staging tree'. * website/docs/reference/profile-commands.md — 'Distribution commands' section rewritten around git. Added a 'Publishing a distribution' section. export/import stay documented as local backup/restore. * website/docs/reference/cli-commands.md — dropped 'pack' from the profile subcommand table. * website/package.json — 'lint:diagrams' now passes --exclude-code-blocks to ascii-guard. Without it, markdown tables and box-drawing diagrams inside fenced code blocks were being misidentified as malformed ASCII boxes, blocking the PR's docs-site-checks CI with 8 false-positive errors. Validation: * Targeted suite: tests/hermes_cli/test_profile_distribution.py — 56/56 pass (down from 43 — reorganized to cover the new local-dir paths). * Regression: test_profiles.py + test_profile_export_credentials.py 102/102 still pass. export/import behaviour unchanged. * Docs lint: ascii-guard lint --exclude-code-blocks docs returns 0 errors (was 8 on the PR before the flag bump). * E2E: ran the real `hermes profile install`/`info` against a local staging dir under an isolated HERMES_HOME — install writes SOUL.md + skills to the target profile, info reads the manifest back, a bogus source produces a clear error, and `hermes profile pack` is now rejected by argparse as expected. * feat(profile-dist): distribution-aware list/show/delete + installed_at + env preview Polish pass on top of the git-only scope cut. Five additions, all small, wiring into existing commands rather than adding new surface. 1. `installed_at` timestamp on the manifest * Stamped automatically inside plan_install() on both fresh install and update — ISO-8601 UTC, seconds resolution. * Surfaced in `hermes profile info` as `Installed: <ts>`. * Lets users tell "installed 6 months ago, needs update" from "installed yesterday" without guessing from file mtimes. 2. `hermes profile list` grows a `Distribution` column * Plain profiles: "—" * Distribution profiles: "<name>@<version>" (e.g. `telemetry@1.2.3`) * ProfileInfo gains three optional fields — distribution_name, distribution_version, distribution_source — populated by a new _read_distribution_meta() helper that swallows manifest read errors so a broken distribution.yaml in one profile can't break `list` for the others. 3. `hermes profile show` and `hermes profile delete` surface distribution provenance * show: `Distribution: name@version` + `Installed from: <source>` plus a pointer to `hermes profile info <name>` for the full manifest. * delete: same lines in the pre-confirmation preview, so a user deleting "telemetry" can see it came from `github.com/kyle/telemetry-distribution` before they type `telemetry` to confirm. No change to the confirmation gate itself — deletion semantics are identical to plain profiles. 4. Install preview checks env vars against the current environment * Replaces the "Env vars you'll need to set:" header with a simpler "Env vars:" block. * Each required var is labeled: - `✓ set` — already in `os.environ` OR present as a key in the target profile's existing .env (update case). - `needs setting` — required but not found in either place. - `—` — optional. * Mirrors pip's "Requirement already satisfied" UX: no unnecessary nagging about keys the user already has configured. 5. Docs: private distributions * New "Private distributions" section in website/docs/reference/profile-commands.md explaining that we shell out to the user's `git` binary, so SSH keys / credential helpers / GitHub CLI stored creds all work transparently. One paragraph, two examples. * `hermes profile info` section updated to mention `Installed:`. Module-level hoist: * `from datetime import datetime, timezone` was previously lazy-imported inside plan_install(). Hoisted to module scope so tests can monkeypatch `hermes_cli.profile_distribution.datetime` to freeze time. Tests (+7): * TestInstalledAtStamp.test_install_stamps_installed_at — format check (4-digit year, 'T', +00:00 suffix). * TestInstalledAtStamp.test_update_refreshes_installed_at — freezes datetime.now() to 2099-01-01 and confirms update writes a new stamp. * TestProfileInfoDistribution.test_installed_distribution_shows_in_list — ProfileInfo.distribution_{name,version,source} populated after install. * TestProfileInfoDistribution.test_plain_profile_has_no_distribution_fields — plain profiles have None. * TestProfileInfoDistribution.test_malformed_manifest_does_not_break_list — broken distribution.yaml in one profile doesn't break list_profiles(). Validation: * 163/163 tests pass (56 distribution + 102 profile regression + 5 new from this commit — up from 158). * docs-lint: 0 errors. * E2E verified: install preview shows ✓/needs-setting per env var, `profile list` shows Distribution column, `profile show` + `delete` preview mentions source URL, `info` shows Installed: timestamp. * fix(profile-dist): clean errors + warn when overwriting plain profiles Two small polish fixes found during collision sweeps of the PR: 1. ValueError from validate_profile_name now caught cleanly * A distribution.yaml whose 'name' field can't be used as a profile identifier (spaces, path traversal, etc.) raises ValueError from hermes_cli.profiles.validate_profile_name, which was escaping as a raw Python traceback from 'hermes profile install/update/info'. * Broadened the except clause in all three handlers to catch (DistributionError, ValueError) — users now see: Error: Invalid profile name '../../etc/passwd'. Must match [a-z0-9][a-z0-9_-]{0,63} instead of a stack trace. 2. Install preview distinguishes plain profile overwrite from distribution re-install * When plan.target_dir exists and IS a distribution (has distribution.yaml), preview still shows the mild (profile exists — will overwrite distribution-owned files only) * When plan.target_dir exists but is a HAND-BUILT plain profile (no distribution.yaml), preview now shows a loud warning: ⚠ Profile exists but is NOT a distribution. Installing here will overwrite its SOUL.md, skills/, cron/, and mcp.json. Your memories, sessions, auth.json, and .env will be preserved, but any hand-edits to distribution-owned files will be lost. * Users who type 'hermes profile install foo --force' against a profile they hand-built now see what they're signing up for. User data is still safe (memories, sessions, auth, .env are in USER_OWNED_EXCLUDE), but custom SOUL/skills get stomped. Tests (+2): * TestErrorSurfaces.test_bad_profile_name_raises_valueerror_not_traceback * TestErrorSurfaces.test_path_traversal_name_rejected Validation: * 165/165 tests pass (was 163). * E2E: bad manifest names produce 'Error: Invalid profile name ...' with no traceback; installing over a plain profile shows the warning; re-installing over an existing distribution shows the normal overwrite message. * Bad HTTPS URLs still produce 'Error: git clone failed: ...' — git itself generates a clean enough message that no wrapper is needed. * 'install .' works correctly from any cwd. * fix(profiles): reject reserved names at validate time Before: `hermes profile create hermes` / `profile install` / `profile rename` all silently accepted reserved names like `hermes`, `test`, `tmp`, `root`, `sudo`. The profile directory was created; only alias creation failed (via check_alias_collision), leaving a confusingly-named profile on disk — e.g. `~/.hermes/profiles/hermes/` sitting next to `~/.hermes/` itself. The reserved set already exists (_RESERVED_NAMES, introduced alongside alias collision detection). This commit moves the check up one layer to validate_profile_name so every entry point — create, install, import, rename, dashboard web API — shares the same gate. The error message points the user at the cause without being cryptic: Error: Profile name 'hermes' is reserved — it collides with either the Hermes installation itself or a common system binary. Pick a different name. `default` continues to pass through (it's a special alias for ~/.hermes). _HERMES_SUBCOMMANDS (`chat`, `model`, `gateway`, etc.) stays at alias-collision time only — those are fine as bare profile names with `--no-alias`. Tests (+5): test_reserved_names_rejected parametrized over the full _RESERVED_NAMES set, matching the existing pattern in TestValidateProfileName. No existing test uses a reserved name as a profile identifier (greppped create_profile("hermes\|test\|tmp\|root\|sudo") — zero hits). Validation: * 170/170 tests pass in the profile suites. * E2E: `profile create hermes`, `profile install` with manifest name=hermes, and `profile install ... --name hermes` all produce the same clean `Error: Profile name 'hermes' is reserved ...` with rc=1 and no traceback. Normal names (`mybot`) still work.	2026-05-08 10:04:32 -07:00
Teknium	45d860d424	fix(msgraph): stream download_to_file body instead of buffering The prior implementation routed download_to_file through the shared _request() path, which uses httpx.AsyncClient.request() inside a context manager that closes before aiter_bytes() iterates. The body was read into memory first and the chunked write loop replayed it from buffer. On small test payloads this was invisible; on real Teams meeting recordings (hundreds of MB) it would force the full artifact into RAM per download. Rewrites download_to_file to open its own AsyncClient and use client.stream(), keeping the context open across the aiter_bytes iteration so the body is actually streamed chunk-by-chunk to disk. Retry/token-refresh/Retry-After semantics are preserved by handling them inline on the stream path. Partial .part files are cleaned up on transport errors and on exhausted retries. Adds three tests: large-payload streaming verifies the chunk loop runs multiple times (discriminator: 512 KiB at chunk_size=65536 yields 8 chunks under streaming, 1 under buffering), transient-5xx retry recovers after a single retry, and exhausted-retry cleans up the partial file.	2026-05-08 09:27:26 -07:00
Dilee	b878f89f66	test(msgraph): cover concurrent token cache reuse	2026-05-08 09:27:26 -07:00
Dilee	a152c706b7	feat(msgraph): add auth and client foundation	2026-05-08 09:27:26 -07:00
Teknium	839cdd1b05	fix(approval): cron jobs must not be treated as gateway context The new _is_gateway_approval_context() widened the gateway classification to any call with HERMES_SESSION_PLATFORM bound via contextvars. But cron/scheduler.py binds that same contextvar for delivery routing on cron jobs that originate from a gateway platform (telegram/discord/etc.), so those jobs were getting routed through submit_pending with no listener — blocking indefinitely instead of honoring approvals.cron_mode. Short-circuit on HERMES_CRON_SESSION before any gateway check. Cron is always governed by cron_mode config, regardless of where the job was scheduled from. Adds regression coverage in TestCronWithGatewayOrigin and records the contributor email mapping for scripts/release.py.	2026-05-08 07:30:14 -07:00
Zhicheng Han	526c0e018a	feat(api-server): expose run approval events	2026-05-08 07:30:14 -07:00
Teknium	674fad1483	fix(goals): Ctrl+C during /goal loop auto-pauses the goal (#21888 ) Reported: Ctrl+C during an active /goal loop felt like it did nothing — the agent would interrupt the current turn, then immediately queue another continuation and keep going until the session ended or the 20-turn budget ran out. Root cause: cli.py's _maybe_continue_goal_after_turn() ran in the finally: block around self.chat(...) unconditionally. Whether the turn completed normally, got interrupted, or returned an empty string, the judge ran on whatever was in conversation_history and — because the judge is fail-open — a "continue" verdict pushed another CONTINUATION_PROMPT onto _pending_input. Ctrl+C was invisible to the hook. Fix: - chat() now captures result['interrupted'] onto self._last_turn_interrupted (resets to False at entry so early-returns don't leak prior state). - _maybe_continue_goal_after_turn() checks the flag first: on interrupt, auto-pause via mgr.pause(reason='user-interrupted (Ctrl+C)') and print a one-liner pointing the user at /goal resume or /goal clear. No judge call, no continuation enqueued. - Also added an empty-response guard that mirrors gateway/run.py's _handle_message logic (empty reply → transient failure → skip judging so we don't trip the consecutive-parse-failures backstop unnecessarily). The goal stays in the DB as paused, so /goal resume recovers it after the user has sorted out whatever made them cancel. /goal clear still works as before for a full stop. Tests: tests/cli/test_cli_goal_interrupt.py covers: - interrupted turn pauses + doesn't queue + judge is NOT called - paused goal is resumable - empty / whitespace / missing assistant reply skips judging - healthy turn still enqueues continuation / marks done - chat() resets _last_turn_interrupted at entry (anti-leak guard) All 55 existing goal tests still pass.	2026-05-08 06:53:13 -07:00
Shannon Sands	80775d7585	test(auth): assert Nous refresh rotation payload	2026-05-08 04:17:42 -07:00
Shannon Sands	b32461f6e8	fix(auth): send Nous refresh token via header	2026-05-08 04:17:42 -07:00
Teknium	486b14b423	feat(cron): routing intent — deliver=all fans out to every connected channel (#21495 ) Adds one reserved token to the cron `deliver` field: - `all` — expand to every platform with a configured home channel Resolves at fire time, not create time, so a job created before Telegram was wired up picks it up once `TELEGRAM_HOME_CHANNEL` is set. Composes with existing targets: `origin,all`, `all,telegram:-100:17`. Inspired by Vellum Assistant's reminder routing-intent system. ## Changes - cron/scheduler.py: _expand_routing_tokens + integrate into _resolve_delivery_targets - tools/cronjob_tools.py: schema description updated - tests/cron/test_scheduler.py: TestRoutingIntents (5 cases) - website/docs/user-guide/features/cron.md: docs + table rows ## Validation - tests/cron/test_scheduler.py -k 'Routing or Deliver' → 57 passed	2026-05-08 04:17:21 -07:00
kshitijk4poor	81928f03ab	refactor(gmi): move User-Agent to profile.default_headers The previous revision of this PR added six GMI-specific branches (`elif base_url_host_matches(..., 'api.gmi-serving.com')`) across run_agent.py and agent/auxiliary_client.py, plus a _HERMES_UA_HEADERS constant in auxiliary_client.py. ProviderProfile already has a `default_headers: dict[str, str]` field commented as 'Client-level quirks (set once at client construction)'. Other plugins (ai-gateway, kimi-coding) already use it. Two of the four auxiliary_client sites we previously patched already had a generic `else: profile.default_headers` fallback that picked it up (so did both run_agent sites). This revision: * Sets `default_headers={'User-Agent': 'HermesAgent/<ver>'}` on the GMI profile in plugins/model-providers/gmi/__init__.py. * Reverts all six GMI-specific branches in run_agent.py and auxiliary_client.py. * Adds the generic profile-fallback `else` block to the two auxiliary_client sites (`_to_async_client`, `resolve_provider_client`) that didn't have it yet. This benefits every provider whose profile declares default_headers, not just GMI — e.g. Vercel AI Gateway's HTTP-Referer/X-Title now flow through the async client path too. * Replaces the GMI-specific URL-branch tests with a profile-level assertion and keeps the run_agent integration test (with `provider='gmi'` so the fallback picks up the profile). Net diff vs main: +82/-0 across 5 files, touching only the GMI plugin, two generic fallback blocks in auxiliary_client.py, AUTHOR_MAP, and tests. No core files change. Based on #20907 by @isaachuangGMICLOUD.	2026-05-08 03:22:11 -07:00
Teknium	307c85e5c1	fix(goals): auto-pause when judge model returns unparseable output Weak judge models (e.g. deepseek-v4-flash) return empty strings or prose when asked for the strict {done, reason} JSON verdict. The old code failed-open to continue on every such turn, burning the entire turn budget with log lines like judge returned empty response judge reply was not JSON: "Let me analyze whether the goal..." and /goal clear could not stop it mid-loop without /stop. After N=3 consecutive parse failures (transport/API errors don't count — those are transient), the loop auto-pauses and prints: ⏸ Goal paused — the judge model (3 turns) isn't returning the required JSON verdict. Route the judge to a stricter model in ~/.hermes/config.yaml: auxiliary: goal_judge: provider: openrouter model: google/gemini-3-flash-preview Then /goal resume to continue. The counter resets on any usable reply (both "done"/"continue" and API errors) and persists across GoalManager reloads so cross-session resumes carry the correct state. Also fixes test_goal_verdict_send.py sharing a hardcoded session_id across tests — the shared id only worked because the previous _post_turn_goal_continuation was a never-awaited coroutine. Now that PR #19160 made it properly awaited, the xdist test-leakage bug surfaced. Each test gets a unique session_id via uuid suffix.	2026-05-07 17:33:09 -07:00
JC	03ddff8897	fix(gateway): defer goal status notices until after response delivery Route goal status notices through the platform adapter send API and register post-delivery callbacks so completed-goal notices appear after the final assistant response. Also cancel queued synthetic goal continuations on /goal pause and /goal clear while preserving normal queued user messages.	2026-05-07 17:33:09 -07:00
Austin Pickett	7f92e5506e	Merge pull request #20942 from NousResearch/austin/fix/personality fix(tui): preserve session when switching personality	2026-05-07 18:54:29 -04:00
teknium	292f468366	fix(mcp): unwrap platforms key in channels_list channels_list was iterating directory.items() directly, yielding ("updated_at", str) and ("platforms", dict) pairs — neither passed the isinstance(entries_list, list) check, so the inner loop never ran and every call returned count=0 even when channel_directory.json was populated. The writer (gateway/channel_directory.py) wraps the payload as {"updated_at": ..., "platforms": {...}}; every other reader in the codebase unwraps via directory.get("platforms", {}). This aligns channels_list with that convention. Also tightens the existing test_channels_with_directory test, which bypassed the bug by asserting against _load_channel_directory() directly instead of calling channels_list. It now calls the tool end-to-end and a new test_channels_with_directory_platform_filter covers the filter path. Both tests fail against the pre-fix code. Closes #21474 Co-authored-by: chrisworksai <262485129+chrisworksai@users.noreply.github.com>	2026-05-07 13:41:16 -07:00
Blake Johnson	9076a2e74e	fix(agent): keep Nous GPT-5 fallback on chat completions	2026-05-07 13:04:42 -07:00
Teknium	24d48ffb82	feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks (#21435 ) * feat(kanban): add `specify` — auxiliary LLM fleshes out triage tasks The Triage column shipped with a placeholder 'a specifier will flesh out the spec', but the specifier itself was never built. This wires it up as a dedicated CLI verb. `hermes kanban specify <id>` calls the auxiliary LLM (configured under `auxiliary.triage_specifier`) to expand a rough one-liner into a concrete spec — tightened title plus a body with Goal / Approach / Acceptance criteria / Out-of-scope sections — then atomically flips `status: triage -> todo` and recomputes ready so parent-free tasks go straight to the dispatcher on the same tick. Surface: hermes kanban specify <task_id> # single task hermes kanban specify --all [--tenant T] # sweep triage column hermes kanban specify ... --author NAME # audit-comment author hermes kanban specify ... --json # one JSON line per task Design choices: - Parent gating is preserved. specify_triage_task flips to 'todo', then recompute_ready promotes to 'ready' only when parents are done — same rule as a normal parent-gated todo. - No daemon, no background watcher. Every invocation is explicit — keeps cost predictable and doesn't fight the dispatcher loop. - Response parse is lenient: strict JSON preferred, markdown-fence tolerated, raw-body fallback on malformed JSON so the LLM can't strand a task in triage. - All failure modes (no aux client, API error, task moved out of triage mid-call) return SpecifyOutcome(ok=False, reason=...) so --all continues past individual failures. Changes: hermes_cli/kanban_db.py + specify_triage_task() hermes_cli/kanban_specify.py NEW (~220 LOC — prompt, parse, call) hermes_cli/kanban.py + specify subcommand + _cmd_specify hermes_cli/config.py + auxiliary.triage_specifier task slot website/docs/user-guide/features/kanban.md specify + config notes website/docs/reference/cli-commands.md CLI reference entry tests/hermes_cli/test_kanban_specify_db.py NEW (10 tests) tests/hermes_cli/test_kanban_specify.py NEW (20 tests) Validation: 30/30 targeted tests pass. E2E: triage task -> specify -> ends in 'ready' with events [created, specified, promoted] and the audit comment recorded under the configured author. * feat(kanban): wire specifier into dashboard and gateway slash Follow-ups to the initial PR #21435 — closes the two gaps I'd left as post-merge: dashboard button and first-class gateway surface. Dashboard (plugins/kanban/dashboard/) - POST /tasks/:id/specify NEW endpoint. Thin wrapper around kanban_specify.specify_task(). Returns the CLI outcome shape ({ok, task_id, reason, new_title}); ok=false with a human reason is a 200, not a 4xx, so the UI can render it inline without treating 'no aux client configured' as a crash. - Runs sync in FastAPI's threadpool because the LLM call can take tens of seconds on reasoning models. - Pins HERMES_KANBAN_BOARD around the specify call so the module's argless kb.connect() lands on the right board. - dist/index.js: doSpecify callback threaded through the drawer → TaskDetail → StatusActions prop chain. ✨ Specify button appears ONLY when task.status === 'triage' (elsewhere the backend would reject anyway — hide the button to keep the action row clean). Busy state (Specifying…) + inline success/error banner under the button using the response.reason text. - dist/style.css: tiny hermes-kanban-msg-ok / -err classes using existing --color vars so themes reskin cleanly. Gateway slash (/kanban specify) - Already works via the existing run_slash → build_parser → kanban_command pipeline. No code change needed — slash commands inherit the argparse tree automatically. Added coverage: test_run_slash_specify_end_to_end (create --triage, specify, verify promotion + retitle) and test_run_slash_specify_help_is_reachable. Tests - tests/plugins/test_kanban_dashboard_plugin.py: 3 new tests for the REST endpoint — happy path, non-triage rejection as ok=false 200, missing aux client as ok=false 200. - tests/hermes_cli/test_kanban_cli.py: 2 new slash-surface tests. Docs - website/docs/user-guide/features/kanban.md: dashboard action row description mentions ✨ Specify + all three surfaces. REST table gains /tasks/:id/specify. Slash examples include /kanban specify. Validation: 340/340 targeted tests pass. E2E via TestClient: create a triage task over REST → POST /specify with mocked aux client → task moves to 'ready' column on /board with new title and body applied.	2026-05-07 13:04:41 -07:00
adybag14-cyber	732a6c45fa	feat: add termux doctor fallback guidance for blocked extras	2026-05-07 13:04:08 -07:00
adybag14-cyber	dc5ef1ac8e	fix: add termux-all install profile and safe fallbacks	2026-05-07 13:04:08 -07:00
adybag14-cyber	da18fd084a	fix: strengthen termux install network prerequisites	2026-05-07 13:04:08 -07:00
adybag14-cyber	54c0b10d14	fix(update): add heartbeat during dependency install	2026-05-07 13:04:08 -07:00
Abd0r	04193cf71c	feat(web): add Brave Search (free tier) and DDGS search providers Both implement WebSearchProvider via tools/web_providers/ — matching the existing SearXNG pattern (PR #`5c906d702`). Search-only; pair with any extract provider via web.extract_backend. - tools/web_providers/brave_free.py — Brave Search API (free tier, 2k queries/mo). Uses BRAVE_SEARCH_API_KEY as X-Subscription-Token. - tools/web_providers/ddgs.py — DuckDuckGo via the ddgs Python package. No API key; gated on package importability. - tools/web_tools.py: both backends added to _get_backend() config list and auto-detect chain (trails paid providers), _is_backend_available, web_search_tool dispatch, web_extract_tool + web_crawl_tool search-only refusals, check_web_api_key, and the __main__ diagnostic. Introduces _ddgs_package_importable() helper so tests can monkeypatch a single symbol for the ddgs availability check. - hermes_cli/tools_config.py: picker entries for both providers; ddgs gets a post_setup handler that runs `pip install ddgs`. - hermes_cli/config.py: BRAVE_SEARCH_API_KEY in OPTIONAL_ENV_VARS. - scripts/release.py: AUTHOR_MAP entry for @Abd0r. - tests: 14 new tests (brave-free) + 15 new tests (ddgs) covering provider unit behavior, backend wiring, and search-only refusals. Salvages the brave-free + ddgs portion of PR #19796. Not included: the in-line helpers in web_tools.py (replaced with provider modules to match the shipped architecture), the lynx-based extract path (these backends should refuse extract with a clear error — users pair with a real extract provider), and scripts/start-llama-server.sh (unrelated). Co-authored-by: Abd0r <223003280+Abd0r@users.noreply.github.com>	2026-05-07 09:59:17 -07:00
xxxigm	cdc0a47dd5	test(hermes_constants): cover parse_reasoning_effort()	2026-05-07 09:59:07 -07:00
Teknium	7e2af0c2e8	feat(acp): pass image file attachments through as image_url parts Extends PR #21400's resource inlining with image-specific handling: ACP resource_link and embedded blob resources with an image/* mime (or image file suffix when mime is missing) now emit an OpenAI image_url part with a base64 data URL, so vision models actually see the image instead of a [Binary file omitted] note. Non-image resources keep the existing text-inlining behavior. Adds 3 tests: local PNG via resource_link, JPEG mime inferred from suffix when client omits mimeType, and embedded blob PNG.	2026-05-07 09:24:32 -07:00
HenkDz	733e297b8a	fix(acp): inline file attachment resources	2026-05-07 09:24:32 -07:00
Teknium	2564132a1f	fix(telegram): preserve thread_id=1 for forum General typing indicator (#21390 ) The May 5 refactor in `d5357f816` made _message_thread_id_for_typing() symmetric with _message_thread_id_for_send() by mapping the General topic (thread id "1") to None upfront for both. That's correct for sendMessage — Telegram rejects message_thread_id=1 on sends and the topic must be omitted — but it's wrong for sendChatAction. Observed behavior (confirmed via before/after Telegram wire traces): Before `d5357f816`: thread_id=1 → message_thread_id=1 → bubble visible in General After `d5357f816`: thread_id=1 → message_thread_id=None → no visible typing Omitting message_thread_id on sendChatAction does NOT fall back to the General topic's view in a forum-enabled supergroup; the bubble ends up hidden from the client's General-topic pane entirely. For any user on a forum-group, the typing indicator stopped appearing. Fix: drop the symmetric "1 → None" mapping from the typing resolver. sendMessage still maps 1 → None via _message_thread_id_for_send (that side was never broken). The asymmetry is real and required by Telegram's API — document it in the resolver docstring. Partial revert of `d5357f816`; restores the behavior from `0cf7d570e` ("fix(telegram): restore typing indicator and thread routing for forum General topic"). Does not re-introduce the retry-without-thread fallback that `41545f7ec` scoped down for DM topics — with the resolver fixed, the first call already hits the right wire shape. Test updated from test_send_typing_general_topic_uses_none_thread_id (which encoded the broken contract) to test_send_typing_preserves_general_topic_thread_id, asserting the single correct call with message_thread_id=1. 10 other tests in the file untouched and passing.	2026-05-07 08:39:21 -07:00
Teknium	812ce0b987	fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385 ) When empty-response terminal scaffolding fires on a tool-result turn, _drop_trailing_empty_response_scaffolding left the live history ending at a bare 'tool' message. The next user input then landed as [...tool, user], a protocol-invalid sequence that OpenRouter/Opus and other providers silently fail on (returns empty content). That retriggered the empty-retry recovery every turn, and recovery flags never hit SQLite (no column for them), so history kept looking broken on every reload. Two fixes: 1. Scaffolding strip rewinds the orphan assistant(tool_calls)+tool pair after popping sentinels. Only fires when scaffolding flags were actually present, so mid-iteration tool loops are untouched. 2. _repair_message_sequence runs right before every API call as a defensive belt: drops stray tool messages with unknown tool_call_ids, merges consecutive user messages so no user input is lost. Does NOT rewind assistant(tool_calls)+tool+user — that pattern is valid when the user redirected before the model got its continuation turn. Repro: session 20260507_044111_fa7e65. Opus-4.7/OpenRouter returned content-less response after a 42KB execute_code output, nudge+retry chain exhausted (no fallback configured), terminal sentinel appended, scaffolding stripped leaving bare tool tail, user typed 'wtf happened..' and landed as tool→user violation. Every subsequent turn collapsed in <50ms with the same 3-retry empty chain because the API request itself was malformed. Verified live via HTTP mock: pre-fix reproduced 5 api_calls/0.15s exit 'empty_response_exhausted'; post-fix 1 api_call/0.10s exit 'text_response(finish_reason=stop)'. Three-turn session flows cleanly through the scenario. Full run_agent suite: 1242 passed (0 regressions, 2 pre-existing concurrent_interrupt failures unrelated).	2026-05-07 08:35:10 -07:00
Teknium	1d2029b2b7	fix(update): reset-failed before every fallback restart so the gateway can't get stranded (#21371 ) cmd_update's auto-restart path could leave the gateway dead after a transient failure in systemd's own auto-restart window. Reproduced on Ubuntu 25.10 + systemd 257: after update, gateway drains and exits 75, systemd's first respawn 60s later fails (status=200/CHDIR with "No such file or directory" on a WorkingDirectory that demonstrably exists), the unit ends up in RestartMaxDelaySec=300 backoff, and cmd_update's fallback 'systemctl restart' never recovers it — leaving users with a permanently silent gateway until they manually run 'systemctl reset-failed'. The fix mirrors the recovery pattern 'hermes gateway restart' (systemd_restart) got in PR #20949: always reset-failed before restart, on both the initial fallback and the retry. Also rewrites the final failure message to tell the user to reset-failed + restart (not just restart, which is the step that already failed twice).	2026-05-07 08:34:12 -07:00
Teknium	04918345ea	fix(cron): initialize MCP servers before constructing the cron AIAgent (#21354 ) cron/scheduler.py:run_job() constructed AIAgent(...) without ever calling discover_mcp_tools(). The CLI and gateway paths do this at startup; cron jobs inherited none of it and the user's configured mcp_servers were invisible inside every cron run. Insert discover_mcp_tools() right before AIAgent(), wrapped in try/except so a broken MCP server can't kill an otherwise-working cron job. The call is idempotent: register_mcp_servers() short-circuits on already-connected servers, so subsequent ticks in the same scheduler process pay ~0ms. Scoped to the LLM path only; no_agent script jobs skip it entirely. Closes #4219.	2026-05-07 07:53:03 -07:00
WideLee	4de3ef38b1	feat(qqbot): wire native tool-approval UX via inline keyboards Makes the in-tree QQ inline keyboards actually light up when the agent blocks on a dangerous-command approval. Matches the cross-adapter gateway contract already implemented by Discord, Telegram, Slack, Matrix, and Feishu. Gateway/run.py's _approval_notify_sync checks type(adapter).send_exec_approval and falls back to a text prompt when it's missing. Without this wiring, QQ users stared at plain '/approve' text even though the adapter shipped button primitives. ### send_exec_approval(chat_id, command, session_key, description, metadata) Matches the signature the gateway calls with. Builds an ApprovalRequest (command_preview, description, timeout) and delegates to send_approval_request. Uses the last inbound msg_id as reply_to so QQ accepts the passive message. The 'metadata' parameter is accepted for contract parity but intentionally unused — QQ doesn't have thread_id/DM-targeting overrides. ### send_update_prompt(chat_id, prompt, default, session_key, metadata) Signature updated to match the cross-adapter contract used by 'hermes update --gateway' watcher. Renders a 'Update Needs Your Input' prompt with the optional default hint and a Yes/No keyboard. Replaces the earlier 3-arg helper that wasn't wired anywhere. ### Default interaction dispatcher _default_interaction_dispatch() auto-registered as the adapter's interaction callback in __init__. Routes: - approve:<session_key>:<decision> → tools.approval.resolve_gateway_approval Button → choice mapping: allow-once → 'once' allow-always → 'always' deny → 'deny' (QQ's 3-button mobile layout deliberately collapses 'session' + 'always' into one button; /approve session text fallback remains available.) - update_prompt:<answer> → atomic write of y/n to ~/.hermes/.update_response (the detached 'hermes update --gateway' watcher polls this file) - anything else → logged and dropped Resolve exceptions are caught and logged — never propagate into the WS loop. Callers can override via set_interaction_callback() to route clicks elsewhere or pass None to drop them entirely. ### Net effect QQ users now get native tap-to-approve UX on dangerous-command prompts and update-confirmation prompts, without having to type /approve or /deny as text. The adapter hooks into tools.approval the same way every other button-capable platform does. ### Tests 14 new tests cover: - Default callback installed on __init__ - send_exec_approval / send_update_prompt exist as class methods (so the gateway's type-probe detects them) - allow-once/always/deny each map to the correct resolve choice - update_prompt:y / update_prompt:n each write atomically to the response file (via monkeypatched get_hermes_home) - Unknown button_data / empty button_data / resolve exceptions are harmless - send_exec_approval honours last_msg_id reply-to and accepts metadata - send_update_prompt delegates with correct content + keyboard Full qqbot suite: 144 passed (72 pre-existing + 72 from this salvage arc). Also ran tools/test_approval.py alongside — no regressions (276 passed combined). Co-authored-by: WideLee <limkuan24@gmail.com>	2026-05-07 07:48:15 -07:00
Teknium	a1fe5f473d	fix(cron): scan assembled prompt including skill content (#3968 ) (#21350 ) _scan_cron_prompt ran at cron create/update time on the user-supplied prompt but skill content loaded inside _build_job_prompt at runtime was never scanned. Combined with non-interactive auto-approval, a malicious skill carrying an injection payload could execute with full tool access every tick. - cron/scheduler.py: new CronPromptInjectionBlocked exception and _scan_assembled_cron_prompt helper. _build_job_prompt now routes both return paths (with skills / without skills) through the helper, raising on match. run_job catches the exception and returns a clean (False, blocked_doc, "", error) tuple so the operator sees a BLOCKED delivery with the scanner result and an audit hint, rather than a scheduler crash or a silent skip. - tests/cron/test_cron_prompt_injection_skill.py: 10 regression tests. Unit coverage on _scan_assembled_cron_prompt (clean/injection/exfil/ invisible-unicode). End-to-end coverage via _build_job_prompt with planted skills (injection payload, env exfil, zero-width space, clean control, missing-skill-doesn't-crash). Fixture patches tools.skills_tool.SKILLS_DIR / HERMES_HOME so planted skills are visible. Importantly uses the current cron.scheduler module object (not a top-level import) so tests don't break when other fixtures reload cron.scheduler — CronPromptInjectionBlocked identity depends on which module object defined it.	2026-05-07 07:44:10 -07:00
maciekczech	162ad3dd16	fix(kanban): filter dashboard board by selected tenant	2026-05-07 07:39:57 -07:00
maciekczech	f4de3810ef	test(kanban): cover dashboard select filter wiring	2026-05-07 07:39:57 -07:00
Teknium	74c9c0eec9	fix(mcp): gate utility stubs on server-advertised capabilities (#21347 ) For every connected MCP server we register four "utility" tool schemas (mcp_<server>_list_resources, read_resource, list_prompts, get_prompt). The existing gate was `hasattr(server.session, method)` — but `mcp.ClientSession` defines all four methods on the class regardless of what the remote server supports, so the gate never filtered anything. Tools-only servers (e.g. @upstash/context7-mcp which advertises only `tools`) ended up with 4 dead stubs; every model call to them returned JSON-RPC -32601 Method not found, which made the model conclude the server was broken even when the real tools worked. Capture the `InitializeResult` returned by `await session.initialize()` on the `MCPServerTask`, then gate each utility schema on the corresponding `capabilities` sub-object (resources / prompts). A legacy `hasattr` fallback runs when `initialize_result` is missing (older test fixtures / not-yet-captured code paths) so pre-existing behavior is preserved. Verified against real `mcp.types.InitializeResult` pydantic models: - Context7 shape (tools only) → 0 utility stubs registered (was 4) - Resources-only server → 2 stubs (list_resources, read_resource) - Prompts-only server → 2 stubs (list_prompts, get_prompt) - Fully capable server → all 4 stubs Closes #18051. Co-authored-by: nikolay-bratanov <nikolay-bratanov@users.noreply.github.com>	2026-05-07 07:39:50 -07:00
teknium1	898b6d7d55	fix(webhook): widen INSECURE_NO_AUTH loopback check + tests + docs Follow-up to the previous commit: - Add _is_loopback_host() helper covering 127.0.0.1, localhost, ::1, ip6-localhost, ip6-loopback (case-insensitive). Empty/None host is treated as non-loopback since unset usually means public default bind. - Fix mixed-indent comment in the safety rail (comment now aligned with the if-block) and collapse the nested-if into one condition. - Add TestInsecureNoAuthSafetyRail covering rejection on 0.0.0.0, a LAN IP, and empty host; allowance on 127.0.0.1/localhost; plus unit-level parametrized coverage of _is_loopback_host for spellings we can't bind in the hermetic test env (::1, ip6-localhost, ip6-loopback). - Pin test_connect_starts_server + test_webhook_deliver_only defaults to 127.0.0.1 so they keep passing under the new rail. - Document the behavior in website/docs/user-guide/messaging/webhooks.md.	2026-05-07 07:38:43 -07:00
WideLee	5b121c6e35	feat(qqbot): process attachments in quoted (reply) messages When a user replies while quoting another message, QQ sets 'message_type = 103' and pushes the referenced message's content + attachments inside 'msg_elements[0]'. The old adapter ignored msg_elements entirely, so: - Bare quote-replies (no user text) surfaced nothing to the LLM. - Quoted images/files/voice were never downloaded or described. - Quoted voice messages specifically produced no transcript — the model had no way to see what the user was referring to when saying 'about this voice note…'. This commit adds _process_quoted_context(d) which extracts msg_elements, unions their attachments, and runs them through the SAME _process_attachments pipeline as the main message body. Quoted voice gets an STT transcript (tried via QQ's asr_refer_text first, then the configured STT provider); quoted images get cached just like main-body images; quoted files surface with their original filename intact (not the CDN URL hash). The quoted content is prepended to the user's text as a '[Quoted message]:' block so the LLM sees the full referential context on one turn. Images-only quotes surface a '[Quoted message]: (image)' marker so the model knows an image was referenced even if no text came with it. All four inbound handlers (_handle_c2c_message, _handle_group_message, _handle_guild_message, _handle_dm_message) now call the helper uniformly — one merge pattern, not four divergent implementations. Filename preservation is carried by _process_attachments' existing '[Attachment: {filename or ct}]' line; nothing else needed for that. 12 new tests under TestProcessQuotedContext and TestMergeQuoteInto cover: - Non-quote messages short-circuit to empty - message_type=103 with no msg_elements is harmless - Text-only quotes render with '[Quoted message]:' prefix - Voice attachments in the quote flow through STT - File attachments in the quote preserve the original filename - Image attachments surface cached paths + media types - Images-only quote still emits a marker - Multiple msg_elements are concatenated - Malformed message_type values return empty - _merge_quote_into prepends with a blank-line separator Full qqbot suite: 130 passed (72 existing + 19 chunked + 27 keyboards + 12 quoted). Co-authored-by: WideLee <limkuan24@gmail.com>	2026-05-07 07:36:30 -07:00
WideLee	de584cd1dd	feat(qqbot): add inline-keyboard approvals and update prompts The QQ Bot v2 API supports inline keyboards on outbound messages. When a user taps a button, the platform dispatches an INTERACTION_CREATE gateway event; the bot ACKs it via PUT /interactions/{id} and decodes the button's data payload to route the click. This commit adds: New module gateway/platforms/qqbot/keyboards.py - Inline-keyboard dataclasses (InlineKeyboard, KeyboardRow, KeyboardButton, KeyboardButtonAction, KeyboardButtonRenderData, KeyboardButtonPermission) that serialize to the JSON shape the QQ API expects. - build_approval_keyboard(session_key) — 3-button layout: ✅ 允许一次 / ⭐ 始终允许 / ❌ 拒绝, all sharing group_id='approval' so clicking one greys out the rest. - build_update_prompt_keyboard() — Yes/No keyboard for update confirms. - parse_approval_button_data() / parse_update_prompt_button_data() — decode the button_data payload from INTERACTION_CREATE. approve:<session_key>:<decision> (decision = allow-once\|allow-always\|deny) update_prompt:<answer> (answer = y\|n) - build_approval_text(ApprovalRequest) — markdown renderer for the surrounding message body (exec-approval and plugin-approval variants, with severity icons 🔴/🔵/🟡). - parse_interaction_event(raw) → InteractionEvent dataclass — normalizes the nested raw payload (id / scene / openids / button_data / etc.). Adapter changes (gateway/platforms/qqbot/adapter.py) - _dispatch_payload routes INTERACTION_CREATE → _on_interaction. - _on_interaction parses the event, ACKs via PUT /interactions/{id}, then invokes a user-registered interaction callback. Exceptions from the callback are caught and logged (never propagate into the WS loop). - set_interaction_callback(cb) lets gateway wiring register a routing handler that inspects button_data and resolves the corresponding pending approval / update prompt. - _send_c2c_text / _send_group_text now accept an optional keyboard kwarg and append it to the outbound body. - send_with_keyboard(chat_id, content, keyboard, reply_to=None) — public helper that sends a single short message with a keyboard attached. Does NOT chunk-split (a keyboard message has one interactive surface). Guild chats are rejected non-retryably — they don't support keyboards. - send_approval_request(chat_id, ApprovalRequest, reply_to=None) + send_update_prompt(chat_id, content, reply_to=None) — convenience wrappers over send_with_keyboard. Tests 27 new unit tests under TestApprovalButtonData, TestUpdatePromptButtonData, TestBuildApprovalKeyboard, TestBuildUpdatePromptKeyboard, TestBuildApprovalText, TestInteractionEventParsing, and TestAdapterInteractionDispatch. Cover: - Button-data round-trip (build → parse returns original session/decision) - Keyboard JSON shape + mutual-exclusion group_id - Exec vs plugin approval text templates + severity icons - Interaction event parsing (c2c / group / guild scene codes) - _on_interaction end-to-end: ACK invoked, callback receives parsed event, callback exceptions are swallowed, missing id skips ACK, no registered callback is harmless. Full qqbot suite: 118 passed (72 existing + 19 chunked + 27 keyboards). Co-authored-by: WideLee <limkuan24@gmail.com>	2026-05-07 07:36:30 -07:00
WideLee	9feaeb632b	feat(qqbot): add chunked upload with structured error types The v2 'single POST /v2/{users\|groups}/{id}/files' upload path is capped at ~10 MB inline (base64 'file_data' or 'url'). For larger files the QQ platform provides a three-step flow: 1. POST /upload_prepare → upload_id + pre-signed COS part URLs 2. PUT each part to its COS URL → POST /upload_part_finish 3. POST /files with {upload_id} → file_info token This commit adds a new gateway/platforms/qqbot/chunked_upload.py module that implements the flow, wires it into QQAdapter._send_media for local files (URL uploads keep the existing inline path), and introduces structured exceptions so the caller can surface actionable error text: - UploadDailyLimitExceededError (biz_code 40093002, non-retryable) - UploadFileTooLargeError (file exceeds the platform limit) Both carry file_name / file_size_human / limit_human so the model can compose user-friendly replies instead of seeing opaque HTTP codes. The part_finish 40093001 retryable-error loop respects the server- provided retry_timeout (capped at 10 minutes locally) with a 1 s polling interval. COS PUTs retry transient failures up to 2 times with exponential backoff. complete_upload retries up to 2 times. Covers files up to the platform's ~100 MB per-file limit; before this the adapter silently rejected anything over ~10 MB. 19 new unit tests under TestChunkedUpload* cover the happy path, prepare-response parsing, helper functions, part retries, COS PUT retries, group vs c2c routing, and the structured-error mapping. Co-authored-by: WideLee <limkuan24@gmail.com>	2026-05-07 07:36:30 -07:00
Teknium	ac51c4c1ad	feat(kanban): per-task max_retries override (#20263 follow-up, supersedes #20972 ) (#21330 ) Adds a per-task override for the consecutive-failure circuit breaker, so individual tasks can opt out of the global ``kanban.failure_limit`` without dragging everyone else with them. Resolution order (now three tiers): 1. per-task ``max_retries`` (new, this commit) 2. caller-supplied ``failure_limit`` — the gateway threads ``kanban.failure_limit`` from config here 3. ``DEFAULT_FAILURE_LIMIT`` (2) Changes: - ``tasks.max_retries INTEGER`` column + migration for existing DBs (NULL = no override, matches pre-column behavior). - ``Task.max_retries`` field + ``from_row`` plumbing. - ``create_task(..., max_retries=N)`` kwarg. - ``_record_task_failure`` reads the per-task value first and records ``limit_source`` + ``effective_limit`` on the ``gave_up`` event so operators can see which tier won. - CLI: ``hermes kanban create --max-retries N`` (rejects ``< 1``). - CLI: ``hermes kanban show`` surfaces the effective threshold + source (``(task)``, ``(config kanban.failure_limit)``, ``(default)``). - CLI: ``_task_to_dict`` includes ``max_retries`` in ``--json`` output. Key design choice vs. the earlier #20972 attempt: - No new config key. The existing ``kanban.failure_limit`` (landed in #21183) is the dispatcher-tier source — no silent break for users who already tuned it. - No ``!=`` sentinel for "is config set" (which would misfire when config equals the default). The tier-winner is determined purely by "is per-task override set" — the dispatcher always wins when per-task is NULL, regardless of whether the caller passed the default or a configured value. E2E verified across four scenarios: default-only (trips at 2), config-only (trips at caller's value), per-task-only beats default (trips at task value), per-task beats larger config (trips at task value). ``gave_up`` event metadata correctly records ``limit_source`` and ``effective_limit`` in all cases. Tests: - ``test_per_task_max_retries_overrides_dispatcher_limit`` — task=1 beats caller=10. - ``test_per_task_max_retries_allows_more_than_default`` — task=5 does not trip at caller=default of 2. - ``test_max_retries_none_falls_through_to_dispatcher_limit`` — None honors caller's config value (4), records ``limit_source=dispatcher``. Full kanban trio (db + core + cli + tools + dashboard-plugin): 342 passed, no regressions. Supersedes: #20972 (@jelrod27) — credit in PR close comment. Ref: #20263 (tangentially — the reporter asked about adapter API drift, not retry caps, but the CLI discussion there is what surfaced the original ask).	2026-05-07 07:29:02 -07:00
Teknium	145e8ec237	fix(pairing): enforce lockout on approve_code, not just generate_code (#10195 ) (#21325 ) PairingStore.approve_code() didn't consult _is_locked_out(), so after MAX_FAILED_ATTEMPTS bad approvals the lockout flag was set but a valid code still got accepted — any pending code (legitimately issued or attacker-obtained) could be approved during the 1-hour lockout window, nullifying the brute-force protection. - gateway/pairing.py: lockout check runs in approve_code() right after _cleanup_expired, before the pending lookup. Returns None on lockout. - tests/gateway/test_pairing.py: test_lockout_blocks_code_approval pins the regression — reporter's exact reproducer (generate valid code, exhaust attempts with WRONGCODE, try to approve valid code) must return None and leave is_approved == False. Also pins recovery: once lockout expires, the still-pending code approves normally. - hermes_cli/pairing.py: _cmd_approve distinguishes the two None cases. On lockout, prints 'Platform locked out... clears in N minutes. To reset sooner, delete the _lockout:<platform> entry from _rate_limits.json' instead of the misleading 'Code not found or expired' message. 29/29 pairing tests pass; E2E-verified with reporter's exact Python reproducer.	2026-05-07 07:18:21 -07:00
qWaitCrypto	62c2f5d8d2	fix(mcp): coerce numeric tool args defensively	2026-05-07 07:17:12 -07:00
Ramón Fernández	44cd79e798	feat(plugins/google_chat): Google Chat platform adapter as a bundled plugin Adds Google Chat as a new gateway platform, shipped under plugins/platforms/google_chat/ following the canonical bundled-plugin pattern (Teams, IRC). Rewired from the original PR #18425 to use the new env_enablement_fn + cron_deliver_env_var plugin interfaces landed in the preceding commit, so the adapter touches ZERO core files. What it does: - Inbound DM + group messages via Cloud Pub/Sub pull subscription (no public URL needed), with attachments (PDFs, images, audio, video) downloaded through an SSRF-guarded Google-host allowlist. - Outbound text replies with the 'Hermes is thinking…' patch-in-place pattern — no tombstones. - Native file attachment delivery via per-user OAuth. Google Chat's media.upload endpoint rejects service-account auth, so each user runs /setup-files once in their own DM to grant chat.messages.create for themselves; the adapter then uploads as them. Tokens stored per email at ~/.hermes/google_chat_user_tokens/<email>.json. - Thread isolation: side-threads get isolated sessions, top-level DM messages share one continuous session. Persistent thread-count store survives gateway restart. - Supervisor reconnect with exponential backoff. - Multi-user out of the box. How it plugs in (no core edits): - env_enablement_fn seeds PlatformConfig.extra with project_id, subscription_name, service_account_json, and the home_channel dict (which the core hook turns into a HomeChannel dataclass). Reads GOOGLE_CHAT_PROJECT_ID (falls back to GOOGLE_CLOUD_PROJECT), GOOGLE_CHAT_SUBSCRIPTION_NAME (falls back to GOOGLE_CHAT_SUBSCRIPTION), GOOGLE_CHAT_SERVICE_ACCOUNT_JSON (falls back to GOOGLE_APPLICATION_CREDENTIALS), GOOGLE_CHAT_HOME_CHANNEL. - cron_deliver_env_var='GOOGLE_CHAT_HOME_CHANNEL' gets cron delivery for free — cron/scheduler.py consults the platform registry for any name not in its hardcoded built-in sets. - plugin.yaml's rich requires_env / optional_env blocks auto-populate OPTIONAL_ENV_VARS via the new hermes_cli/config.py injector, so 'hermes config' UI surfaces them with description / url / prompt / password metadata. - Module-level Platform('google_chat') call in adapter.py triggers the Platform._missing_() registration so Platform.GOOGLE_CHAT attribute access works without an enum entry. Distribution: ships inside the existing hermes-agent package. Users opt in via 'pip install hermes-agent[google_chat]' and follow the 8-step GCP walkthrough at website/docs/user-guide/messaging/google_chat.md. Test coverage: 153 tests in tests/gateway/test_google_chat.py, all passing. Spans platform registration, env config loading, Pub/Sub envelope routing, outbound send + chunking + typing patch-in-place, attachment send paths, SSRF guard, thread/session model, supervisor reconnect, authorization, per-user OAuth, and the new plugin-registry cron delivery wiring. Credit: adapter + OAuth + tests + docs authored by @donramon77 (PR #18425). Rewire onto the new plugin hooks + salvage commit by Teknium. Co-Authored-By: Ramón Fernández <112875006+donramon77@users.noreply.github.com>	2026-05-07 07:15:44 -07:00
Teknium	c8e3e39185	fix(mcp): surface image tool results as MEDIA tags instead of dropping them (#21328 ) MCP tool results can include ImageContent blocks (screenshots from Playwright/Blockbench/Puppeteer etc). The tool result handler only extracted block.text, so image blocks were silently dropped and the agent saw an empty or text-only response — losing the actual payload. Add _cache_mcp_image_block() that base64-decodes the block, validates the bytes via gateway.platforms.base.cache_image_from_bytes (which sniffs for PNG/JPEG/WebP signatures and rejects non-images), writes to the shared `~/.hermes/cache/images/` dir, and returns a MEDIA:<path> tag. The handler appends that tag to the result parts so downstream gateway adapters render the image inline. Logs and drops on malformed base64 / non-image payload rather than raising — a single bad block shouldn't kill the tool call. Distilled from #17915 (c3115644151) and #10848 (gnanirahulnutakki), both too stale to cherry-pick (branches diverged enough to revert dozens of unrelated fixes). Went with #10848's approach of plumbing through Hermes' existing MEDIA tag / cache_image_from_bytes infrastructure rather than #17915's raw tempfile path, because it integrates with the remote-backend mount system and messaging adapters that already handle MEDIA tags natively. Co-authored-by: c3115644151 <c3115644151@users.noreply.github.com> Co-authored-by: gnanirahulnutakki <gnanirahulnutakki@users.noreply.github.com>	2026-05-07 07:14:16 -07:00
Teknium	dd2dc2bddf	fix(mcp): forward OAuth auth and bump sse_read_timeout on SSE transport (#21323 ) * fix(mcp): re-raise CancelledError explicitly in MCPServerTask.run On Python 3.11+, `asyncio.CancelledError` inherits from `BaseException` (not `Exception`), so the broad `except Exception as exc:` in `MCPServerTask.run`'s transport loop did NOT catch it. Task cancellation from gateway restart / explicit `task.cancel()` silently escaped past the reconnect logic — the MCP server task died without going through the shutdown/reconnect code paths that check `_shutdown_event`. Add an explicit `except asyncio.CancelledError: raise` before the broad catch so cancellation propagation is self-documenting rather than an accident of exception hierarchy, and future sibling-site work (e.g. distinguishing shutdown-cancel from transport-cancel) has an obvious hook. Behavior on pre-3.8 Pythons where CancelledError WAS an Exception subclass is also corrected: the old path would have caught it and treated it as a connection failure worth retrying. Closes #9930. * fix(mcp): forward OAuth auth and bump sse_read_timeout on SSE transport Two surgical correctness bugs in the SSE branch of MCPServerTask._run_http, distilled from @amiller's PR #5981 that couldn't be cherry-picked wholesale (branch too stale). 1. sse_read_timeout was set to the tool timeout (default 60s). That's the wrong dimension — it governs how long sse_client will wait between events on the SSE stream, not per-call latency. SSE servers routinely hold the stream idle for minutes between events; a 60s read timeout drops the connection after the first slow stretch (Router Teamwork, Supermemory on Cloudflare Workers idle-disconnect at ~60s). Bump to 300s to match the Streamable HTTP path's httpx read timeout. 2. OAuth auth was built via get_manager().get_or_build_provider() but never forwarded to sse_client. SSE MCP servers behind OAuth 2.1 PKCE would silently fail with 401s on every request. Keepalive (the other half of #5981) intentionally left for a follow-up — it's a real improvement but a bigger change, and these two are obvious corrections to ship now. Credits to @amiller. Co-authored-by: Andrew Miller <socrates1024@gmail.com> --------- Co-authored-by: Andrew Miller <socrates1024@gmail.com>	2026-05-07 07:08:04 -07:00
xxxigm	d5fcc83922	fix(tests): avoid asyncio DeprecationWarning in event loop fixture on 3.12+	2026-05-07 07:05:05 -07:00

1 2 3 4 5 ...

3390 commits