hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-18 04:41:56 +00:00

Author	SHA1	Message	Date
Teknium	35f25523c6	docs(tools): add video_generate / video_gen toolset to user-facing tool docs (#27050 ) The video_gen toolset and its video_generate tool shipped without user-facing reference docs. toolsets-reference.md and the dev-guide plugin page were already in, but reference/tools-reference.md had no video_gen section at all and user-guide/features/tools.md's Media row didn't list video_generate. - reference/tools-reference.md: add a video_gen section after video, including backend list (xAI Grok-Imagine, FAL.ai Veo/Pixverse/Kling), unified text-to-video / image-to-video surface note, link to the dev-guide plugin page, and the video_generate tool row. Add video_generate to the standalone-tools quick-counts line. - user-guide/features/tools.md: extend Media row with video_generate and video_analyze plus an opt-in caveat.	2026-05-16 11:53:13 -07:00
Teknium	5f91b1a48b	feat(skills): add osint-investigation optional skill (closes #355 ) (#26729 ) * feat(skills): add osint-investigation optional skill (closes #355) Phase-1 public-records OSINT investigation framework adapted from ShinMegamiBoson/OpenPlanter (MIT). Lives in optional-skills/research/. Six data-source wiki entries (FEC, SEC EDGAR, USAspending, Senate LD, OFAC SDN, ICIJ Offshore Leaks), each following the 9-section template: summary, access, schema, coverage, cross-reference keys, data quality, acquisition, legal, references. Six stdlib-only acquisition scripts that emit normalized CSV, plus three analysis scripts: - entity_resolution.py — three-tier match (exact / fuzzy / token overlap) with explicit confidence per row - timing_analysis.py — permutation test for donation/contract timing correlation, joins through cross-links - build_findings.py — assembles structured findings.json with evidence chains pointing back to source rows Validation: full pipeline runs end-to-end on synthetic fixtures. Entity resolution found 24 cross-matches with 0 false positives on a 5-row / 4-row test set. Timing analysis on 5 donations clustered near 3 awards returned p=0.000, effect size 2.41 SD. Findings JSON correctly tags HIGH-severity timing pattern. All 9 scripts pass --help and py_compile. Docs site page auto-generated by website/scripts/generate-skill-docs.py; sidebar + catalog entries updated by the same generator. * fix(osint-investigation): live API fixes from end-to-end sweep Live-tested the skill on a real public-citizen query and found three bugs the synthetic E2E missed. All three are now fixed and re-verified. 1. FEC fetch hung on contributor name searches. The combination of two_year_transaction_period + sort=date + contributor_name puts the OpenFEC query plan on a slow path that the upstream gateway times out (25s+). Switched to min_date/max_date with no explicit sort. Renamed --candidate to --contributor (the original name was misleading: FEC searches by donor, not by candidate; --candidate is kept as a deprecated alias). Added --state filter for narrowing. 2. ICIJ Offshore Leaks reconcile endpoint returns 404. ICIJ removed the Open Refine reconciliation API. Rewrote fetch_icij_offshore.py to download the official bulk CSV ZIP (~70 MB, public, no auth) and search it locally. Cached under $HERMES_OSINT_CACHE/icij/ (default ~/.cache/hermes-osint/icij/) for 30 days, --force-refresh to refetch. Verified live: 'PUTIN' query returns 5 Panama Papers officer matches in 0.5s after first download. 3. SEC EDGAR silently returned 0 when the company-name resolver matched an individual Form 3/4/5 filer (insider trading disclosures). Now surfaces 'Resolved company X → CIK Y (Z)' on stderr, prints a filing-type histogram when the type filter wipes results, and explicitly warns when the matched CIK appears to be an individual filer rather than a corporate registrant. Bonus: _http.py was retrying 429 responses with exponential backoff plus honoring (often-missing) Retry-After headers, which compounded into multi-second hangs per page when the upstream key was over quota. Changed to fail-fast on 429 with a clear, actionable error showing the upstream's quota message. Verified: 0.3s fast-fail vs the previous 60s hang on DEMO_KEY rate-limit exhaustion. Updated SKILL.md, fec.md, and icij-offshore.md to match the new CLI flags and ICIJ bulk-cache flow. Regenerated the docusaurus page via website/scripts/generate-skill-docs.py. Live sweep results across all 6 sources for 'Dillon Rolnick, New York': - OFAC SDN: 0 matches ✓ (correctly not sanctioned) - USAspending: 0 matches ✓ (correctly not a federal contractor) - Senate LDA: 0 matches ✓ (correctly not a lobbying client) - SEC EDGAR: warns it resolved to 'Rolnick Michael' (CIK 0001845264) who is an individual Form 3 filer, not a corporate registrant - ICIJ: 0 matches ✓ (correctly not in any offshore leak) - FEC: rate-limited (DEMO_KEY); fails fast with clear quota message * feat(osint-investigation): expand to 12 sources covering identity, property, courts, archives, news Phase-2 expansion per Teknium feedback that the original 6-source skill (federal financial/regulatory only) wasn't a complete OSINT toolkit. Adds 6 more sources covering the major omissions a real investigation would reach for first. New sources (6 fetch scripts + 6 wiki entries): 1. NYC ACRIS — Real property records (deeds, mortgages, liens) via the city's Socrata API. Search by party name or property address. Joins Parties to Master to populate doc_type, dates, borough, and amount. Coverage: 5 NYC boroughs, ~70M party records, 1966-present. 2. OpenCorporates — Global corporate registry covering 130+ jurisdictions (~200M companies). Free API token at https://opencorporates.com/api_accounts/new raises the rate limit; HTML fallback works without one (limited fields). 3. CourtListener (Free Law Project) — federal + state court opinions (~10M back to colonial era) + PACER dockets via RECAP. Anonymous v4 search works; COURTLISTENER_TOKEN raises rate limits. 4. Wayback Machine CDX — historical web captures (~900B+). Used both for surveillance-of-record (when did this site change?) and as a content-recovery layer when other sources point to dead URLs. 5. Wikipedia + Wikidata — narrative bio + structured facts. Wikipedia OpenSearch for article matching, REST summary for extracts, Wikidata Action API (wbgetentities) for claims. Avoids the SPARQL Query Service which is aggressively rate-limited. 6. GDELT 2.0 DOC API — global news monitoring in 100+ languages, ~2015-present. Auto-retries with 6s backoff on the standard 1-req-per-5-sec throttle. Other changes in this commit: - SEC EDGAR no longer raises SystemExit when the company-name resolver finds no CIK; writes an empty CSV with header so the rest of a pipeline can keep moving and the warning is just on stderr. - _http.py User-Agent updated per Wikimedia policy: includes app name, version, and a 'set HERMES_OSINT_UA to identify yourself' instruction. - SKILL.md workflow now groups sources into two clusters (federal financial vs identity/property/courts/archives/news) with bash examples for each. 'When to use this skill' lists the broader set of investigation patterns the expanded sources unlock. Live sweep results on 'Dillon Rolnick, New York' across all 12 sources: ofac ✓ 0 (correctly clean) icij ✓ 0 (correctly not in any leak) usaspending ✓ 0 (correctly not a federal contractor) senate_lda ✓ 0 (correctly not a lobbying client) sec_edgar ✓ 0, warns: resolved to 'Rolnick Michael' (CIK 0001845264), individual Form 3 filer, NOT a corporate registrant fec — rate-limited (DEMO_KEY exhausted), fails fast with clear quota message nyc_acris ✓ 200 records named Rolnick across NYC; 48 records at 571 Hudson (the property the web identifies as his) opencorporates ✓ 0 (no API token configured; HTML fallback) courtlistener ✓ 0 for 'Dillon Rolnick'; 20 for 'Rolnick' generally; 5 for 'Microsoft' sanity check wayback ✓ 30 captures of nousresearch.com from 2011-present wikipedia ✓ 0 (correctly not notable enough); Bill Gates sanity returns full structured facts (occupation, employer, DOB, place of birth, country) gdelt ✓ 0 for 'Dillon Rolnick'; 5 for 'Nous Research' All 17 scripts compile clean and pass --help. Synthetic analysis pipeline regression still passes (entity_resolution 30 matches, timing p=0.000, findings 2). * feat(osint-investigation): remove FEC; DEMO_KEY rate-limits make it unreliable The FEC fetcher consistently failed the live sweep because the OpenFEC DEMO_KEY tier (40 calls/hour) exhausts on a single investigation, and the upstream returns slow-path query plans for unindexed contributor-name searches that the gateway times out. Without a real API key it's not usable; with one the user has to sign up at api.data.gov first. That's too much setup friction for a skill that should work out of the box. Removed: - scripts/fetch_fec.py - references/sources/fec.md Updated: - SKILL.md frontmatter description + tags - 'When NOT to use' now points users at https://www.fec.gov/data/ for federal donations - entity_resolution example switched from donor↔contractor to lobbying-client↔contractor (Senate LDA + USAspending pair) - timing_analysis example switched to lobbying-filings vs awards - 8 wiki entries had their 'FEC ↔ ...' cross-reference bullets removed 11 sources remain (5 federal financial + 6 identity/property/courts/ archives/news). All scripts compile, pass --help, and the synthetic analysis pipeline still passes on the new lobbying-shaped regression fixture (30 matches, p=0.000 on tight clustering, 2 findings).	2026-05-16 01:55:06 -07:00
Teknium	395e9dd9e2	feat: add supports_parallel_tool_calls for MCP servers (#26825 ) Port from openai/codex#17667: MCP servers can now opt-in to parallel tool execution by setting supports_parallel_tool_calls: true in their config. This allows tools from the same server to run concurrently within a single tool-call batch, matching the behavior already available for built-in tools like web_search and read_file. Previously all MCP tools were forced sequential because they weren't in the _PARALLEL_SAFE_TOOLS set. Now _should_parallelize_tool_batch checks is_mcp_tool_parallel_safe() which looks up the server's config flag. Config example: mcp_servers: docs: command: "docs-server" supports_parallel_tool_calls: true Changes: - tools/mcp_tool.py: Track parallel-safe servers in _parallel_safe_servers set, populated during register_mcp_servers(). Add is_mcp_tool_parallel_safe() public API. - run_agent.py: Add _is_mcp_tool_parallel_safe() lazy-import wrapper. Update _should_parallelize_tool_batch() to check MCP tools against server config. - 11 new tests covering the feature end-to-end. - Updated MCP docs and config reference.	2026-05-16 01:04:28 -07:00
Teknium	74d0b392e7	feat(x_search): gated X (Twitter) search tool with OAuth-or-API-key auth (#26763 ) * feat(x_search): gated X (Twitter) search tool with OAuth-or-API-key auth Salvages tools/x_search_tool.py from the closed PR #10786 (originally by @Jaaneek) and reworks its credential resolution so the tool registers when EITHER xAI credential path is available: * XAI_API_KEY (paid xAI API key) is set in ~/.hermes/.env or the env, OR * The user is signed in via xAI Grok OAuth — SuperGrok subscription — i.e. hermes auth add xai-oauth has been run Both paths route through xAI's built-in x_search Responses tool at https://api.x.ai/v1/responses. When both credentials exist OAuth wins, matching tools/xai_http.py's existing preference order (uses SuperGrok quota instead of paid API spend). The check_fn calls resolve_xai_http_credentials() which auto-refreshes the OAuth access token if it's within the refresh skew window, so a True return means the bearer is fetchable AND non-empty. Wiring - tools/x_search_tool.py — new tool, ~370 LOC. Schema gated by check_fn, bearer resolved per-call so revoked OAuth surfaces a clean tool_error rather than an HTTP 401. - toolsets.py — "x_search" toolset def. NOT added to _HERMES_CORE_TOOLS; users opt in via hermes tools. - hermes_cli/tools_config.py — CONFIGURABLE_TOOLSETS entry + TOOL_CATEGORIES block with two provider options (OAuth + API key) sharing the existing xai_grok post_setup hook for credential bootstrap. - hermes_cli/config.py — DEFAULT_CONFIG["x_search"] with model / timeout_seconds / retries. Additive nested key; no version bump. - tests/tools/test_x_search_tool.py — 13 tests covering HTTP shape, handle validation, citation extraction, 4xx/5xx/timeout handling, and the full credential-resolution matrix (OAuth-only, API-key-only, both-set, neither-set, resolver-raises, config overrides, registry registration). - website/docs/guides/xai-grok-oauth.md — adds X Search to the direct-to-xAI tools section with off-by-default note. - website/docs/user-guide/features/tools.md — new row in the tools table. Off by default — users enable via `hermes tools` → 🐦 X (Twitter) Search. Schema only appears to the model when xAI credentials are configured. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * docs(x_search): add dedicated feature page + reference entries - website/docs/user-guide/features/x-search.md (new) — full feature walkthrough: authentication, enablement, configuration, parameters, returned fields, example, troubleshooting, see-also links. - website/docs/reference/tools-reference.md — new "x_search" toolset section with parameter docs and credential gating note. - website/docs/reference/toolsets-reference.md — new row in the toolset catalog table. - website/sidebars.ts — wires the new feature page under Media & Web, after web-search. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-05-16 00:58:27 -07:00
teknium1	559c6ad94a	feat(skills): add optional pinggy-tunnel skill Zero-install localhost tunnels over SSH via Pinggy. Covers HTTP/HTTPS, TCP, TLS, access control (basic auth / bearer / IP whitelist), header manipulation (CORS, force-HTTPS), web debugger, Pro token mode, and four composite recipes (webhook receiver, MCP server exposure, local LLM endpoint share, dev-server quick-share with one-shot password). Closes #361	2026-05-15 22:15:14 -07:00
teknium1	afb97dbc53	docs: add Programmatic Integration overview (closes #360 ) Document the three protocols already available for driving hermes-agent from external programs — ACP, the TUI gateway JSON-RPC, and the OpenAI-compatible API server — with a 'which one should I use' guide and a Pi-style RPC command mapping table. Sidebar entry under Developer Guide -> Architecture.	2026-05-15 22:14:33 -07:00
Teknium	016c772e7f	feat(plugins): tool override flag for replacing built-in tools (closes #11049 ) (#26759 ) Plugins can now replace a built-in tool by passing override=True to ctx.register_tool(). Without it, the registry rejects any registration that would shadow an existing tool from a different toolset (unchanged default behavior). Unlocks the use case from #11049: drop-in replacement of browser/web backends without forking core. Composes with the existing pre_tool_call hook for runtime interception of any implementation. The override is audit-logged at INFO so it surfaces in agent.log.	2026-05-15 22:12:57 -07:00
teknium1	53637fb17d	chore(skills/darwinian-evolver): AUTHOR_MAP + docs regen	2026-05-15 21:56:07 -07:00
Teknium	c5dc9700eb	fix(windows): silence tirith-unavailable banner + skip install/spawn attempts on unsupported platforms (#26718 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-main (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details OSV-Scanner / Scan lockfiles (push) Waiting to run Details Tests / test (push) Waiting to run Details Tests / e2e (push) Waiting to run Details uv.lock check / uv lock --check (push) Waiting to run Details Tirith ships no Windows binary, so on every Windows CLI startup users saw a scary 'tirith security scanner enabled but not available' banner they could not act on. The banner suggested degraded security; in reality pattern-matching guards still run and the message was pure noise. Fix: - New public is_platform_supported() helper in tools/tirith_security.py that returns False when _detect_target() doesn't resolve (Windows, any non-x86_64/aarch64 arch). - ensure_installed(), _resolve_tirith_path(), and check_command_security() short-circuit on unsupported platforms: cache _resolved_path = _INSTALL_FAILED with reason 'unsupported_platform', skip PATH probes, skip the background download thread, skip the disk failure marker, and return allow with an empty summary from check_command_security so the spawn loop never fires. - Explicit user-configured tirith_path is still honored everywhere (a user who built tirith themselves under WSL keeps that path). - CLI banner in cli.py gated on is_platform_supported() — fires only on platforms where tirith should work but isn't installed. - Docs note tirith's supported-platform list and point Windows users at WSL. Tests: tests/tools/test_tirith_security.py +8 tests covering Linux x86_64, Darwin arm64, Windows, and unknown-arch verdicts plus the silent ensure_installed / check_command_security / _resolve_tirith_path fast-paths and the explicit-path override. test_tirith_security.py 75 passed (8 new + 67 pre-existing) test_command_guards.py 19 passed	2026-05-15 20:29:28 -07:00
Teknium	a31191c3f5	fix(docs): unique sidebar keys for duplicate skill categories (#26726 ) The per-skill sidebar tree from PR #26646 emitted category entries with only a label. Docusaurus derives translation keys from the label (sidebar.docs.category.<label>), and categories that exist in both Bundled and Optional (productivity, mcp, mlops, research, email, software-development, dogfood) collided on identical keys — failing i18n extraction and the Deploy Site build. Result: source had the sidebar fix but no per-skill page rendered with a sidebar in production. Add a 'key: skills-<source>-<category>' attribute to each generated category dict so Bundled vs Optional get distinct translation keys. Regenerated sidebars.ts via the script. Local docusaurus build passes.	2026-05-15 20:29:20 -07:00
emozilla	86a368d832	remove pip installation method from docs	2026-05-15 22:14:41 -04:00
Teknium	dc4cde278b	feat(docs): show per-skill pages in the left sidebar (#26646 ) Individual skill pages (e.g. /docs/user-guide/skills/bundled/productivity/notion) had no sidebar rendered — the sidebar config only listed the two catalog index pages. That was an intentional choice from an earlier 'too many entries would drown product docs' concern, but the effect is that a user landing on any skill page (via search, share link, or the catalog table) loses navigation entirely and can't see related skills. Wire build_sidebar_items() (which was already computed and discarded) back into the sidebar. Structure: Skills ├── Bundled skills catalog (catalog table, was already there) ├── Optional skills catalog (catalog table, was already there) ├── Bundled │ ├── apple/ │ │ ├── apple-apple-notes │ │ └── ... │ └── ... (one collapsed category per skill category) └── Optional └── ... (same) Categories are collapsed by default so the top-level Skills entry doesn't explode visually. Users browsing one skill see siblings in the same category; the catalogs remain the at-a-glance entry point. Also includes drift the regen script naturally produces on top of current main: - creative-comfyui v5.0.0 → v5.1.0 page (author + new ref file) - devops-kanban-worker SKILL.md updates - new pages for optional skills that lacked generated docs: hyperliquid, finance-stocks, software-development/rest-graphql-debug - updated optional-skills-catalog row for those Validation: - npx docusaurus build (en locale) succeeded — only pre-existing warnings - inspected built productivity-notion/index.html: sidebar tree present, sibling productivity skills (airtable, linear, etc.) all linked	2026-05-15 17:04:30 -07:00
Teknium	42070ecefb	feat(skills/notion): overhaul for Notion Developer Platform (May 2026) (#26612 ) * feat(skills/notion): overhaul for Notion Developer Platform (May 2026) Notion shipped its Developer Platform on May 13, 2026: ntn CLI, Workers, Markdown API, bidirectional webhooks, agent tools. The existing skill only covered curl + integration token CRUD, so it didn't surface any of the new ergonomics — particularly the /markdown endpoints (much easier for agents to consume) and the ntn CLI for headless API + Workers management. This rewrite (v1.0.0 -> v2.0.0): - Splits setup into Path A (HTTP, cross-platform incl. Windows), Path B (ntn CLI on macOS/Linux, with NOTION_API_TOKEN env var for headless), and Path C (Windows fallback — HTTP API or WSL2; native ntn is 'coming soon'). - Keeps the full curl reference (still the only Windows-compatible path). - Adds /markdown endpoints — GET and PATCH page-as-markdown, plus POST /v1/pages with a markdown body param. Agent-friendly, no CLI required. - Adds ntn CLI cheat sheet for raw API shorthand, file uploads, and workspace flags. - Adds Notion Workers section: scaffold, tool/webhook capability shapes, lifecycle commands. Gated on Business/Enterprise plans + macOS/Linux. - Adds Notion-flavored Markdown reference (callouts, toggles, columns, mentions, colors) for the /markdown endpoints. - Adds a 'choose the right path' decision table at the bottom. - Notes the new efficient Notion MCP server as an optional wiring path. Auto-generated docs page regenerated via website/scripts/generate-skill-docs.py. * docs(skills-catalog): update notion description for v2.0.0	2026-05-15 14:58:23 -07:00
Teknium	233d4170cf	docs(xai): link OAuth-over-SSH guide from xAI provider surfaces (#26610 ) Follow-up to #26592. The new docs/guides/oauth-over-ssh.md page was linked from the two SSH-specific sections of the xAI Grok OAuth guide but was missing from the surfaces a user is more likely to hit first: - guides/xai-grok-oauth.md 'See Also' — add the SSH guide at the top with a short qualifier so remote users notice it before clicking through. - integrations/providers.md xAI Grok OAuth callout — append the SSH guide link alongside the existing xAI OAuth guide link. - user-guide/configuration.md xai-oauth tip — same. Docs build: zero warnings on touched files.	2026-05-15 14:45:59 -07:00
alt-glitch	a480d345e6	docs: add hermes postinstall to installation + quickstart, fix update --check description - installation.md: add tip about `hermes postinstall` for upfront dep install - quickstart.md: show `hermes postinstall` in pip install flow - updating.md: fix --check description to mention PyPI path for pip installs	2026-05-15 14:45:43 -07:00
alt-glitch	164a77dec9	docs: add pip install path to installation, quickstart, updating, and CLI reference Document pip install hermes-agent as a first-class install option. Clarify that PyPI releases track tagged versions (major/minor), not every commit on main — git installer is for bleeding-edge.	2026-05-15 14:45:43 -07:00
Teknium	3b9368a0c4	fix(auth): point SSH OAuth users at the tunnel they actually need (#26592 ) Two loopback-redirect OAuth flows (xAI Grok, Spotify) silently fail when Hermes runs on a remote host: the auth server redirects to 127.0.0.1:<port> on the user's laptop, not on the remote box. The --no-browser flag only suppresses webbrowser.open() — it doesn't change the bind address. Symptom xAI surfaces is 'Could not establish connection. We couldn't reach your app.', followed by a 'xAI authorization timed out waiting for the local callback' on the CLI side. Changes - hermes_cli/auth.py: new _print_loopback_ssh_hint() helper, called from _xai_oauth_loopback_login() and _spotify_login() right after they print the redirect URI. Silent off SSH; on SSH prints the exact 'ssh -N -L <port>:127.0.0.1:<port>' command using the actually-bound port (not the hardcoded constant — the listener auto-bumps when the preferred port is busy), a provider-specific docs URL, and a link to the new shared guide. - website/docs/guides/oauth-over-ssh.md (new): single source of truth for the tunnel pattern — TL;DR command, jump-box / ProxyJump variant, mosh+tmux+ControlMaster gotchas, troubleshooting. - website/docs/guides/xai-grok-oauth.md: fix the two sections that claimed --no-browser alone was enough; link to the shared guide. - website/docs/user-guide/features/spotify.md: expand the existing one-liner; link to the shared guide. - website/sidebars.ts: register the new page. - tests/hermes_cli/test_auth_loopback_ssh_hint.py: 7 unit tests covering SSH-vs-not, loopback-vs-not, malformed URIs, port echo, with and without provider docs URL.	2026-05-15 14:27:50 -07:00
Teknium	4ad5fa702f	docs(xai-oauth): add xai-oauth to provider enumeration pages (#26542 ) Follow-up to #26534 (xai-oauth provider). The new guide and integrations page were shipped with the salvage, but four reference/enumeration pages still listed every other OAuth provider without xai-oauth: - reference/cli-commands.md — `--provider` choices list - reference/environment-variables.md — HERMES_INFERENCE_PROVIDER values - user-guide/configuration.md — auxiliary-task provider list, OAuth tip block (mirrored from MiniMax OAuth), and provider table row - user-guide/features/fallback-providers.md — provider table	2026-05-15 12:33:12 -07:00
Jaaneek	1e4801b8d0	docs(xai-oauth): correct logout command (was hermes auth remove) The previous "Logging Out" section showed `hermes auth remove xai-oauth` with no positional target — argparse rejects that and the command does not clear the singleton OAuth state anyway. The correct command for the "clear everything" intent is `hermes auth logout xai-oauth`. Also point users at `hermes auth remove xai-oauth <target>` for single-pool-row deletion.	2026-05-15 12:11:32 -07:00
Jaaneek	b62c997973	feat(xai-oauth): add xAI Grok OAuth (SuperGrok Subscription) provider Adds a new authentication provider that lets SuperGrok subscribers sign in to Hermes with their xAI account via the standard OAuth 2.0 PKCE loopback flow, instead of pasting a raw API key from console.x.ai. Highlights ---------- * OAuth 2.0 PKCE loopback login against accounts.x.ai with discovery, state/nonce, and a strict CORS-origin allowlist on the callback. * Authorize URL carries `plan=generic` (required for non-allowlisted loopback clients) and `referrer=hermes-agent` for best-effort attribution in xAI's OAuth server logs. * Token storage in `auth.json` with file-locked atomic writes; JWT `exp`-based expiry detection with skew; refresh-token rotation synced both ways between the singleton store and the credential pool so multi-process / multi-profile setups don't tear each other's refresh tokens. * Reactive 401 retry: on a 401 from the xAI Responses API, the agent refreshes the token, swaps it back into `self.api_key`, and retries the call once. Guarded against silent account swaps when the active key was sourced from a different (manual) pool entry. * Auxiliary tasks (curator, vision, embeddings, etc.) route through a dedicated xAI Responses-mode auxiliary client instead of falling back to OpenRouter billing. * Direct HTTP tools (`tools/xai_http.py`, transcription, TTS, image-gen plugin) resolve credentials through a unified runtime → singleton → env-var fallback chain so xai-oauth users get them for free. * `hermes auth add xai-oauth` and `hermes auth remove xai-oauth N` are wired through the standard auth-commands surface; remove cleans up the singleton loopback_pkce entry so it doesn't silently reinstate. * `hermes model` provider picker shows "xAI Grok OAuth (SuperGrok Subscription)" and the model-flow falls back to pool credentials when the singleton is missing. Hardening --------- * Discovery and refresh responses validate the returned `token_endpoint` host against the same `.x.ai` allowlist as the authorization endpoint, blocking MITM persistence of a hostile endpoint. Discovery / refresh / token-exchange `response.json()` calls are wrapped to raise typed `AuthError` on malformed bodies (captive portals, proxy error pages) instead of leaking JSONDecodeError tracebacks. * `prompt_cache_key` is routed through `extra_body` on the codex transport (sending it as a top-level kwarg trips xAI's SDK with a TypeError). * Credential-pool sync-back preserves `active_provider` so refreshing an OAuth entry doesn't silently flip the active provider out from under the running agent. Testing ------- * New `tests/hermes_cli/test_auth_xai_oauth_provider.py` (~63 tests) covers JWT expiry, OAuth URL params (plan + referrer), CORS origins, redirect URI validation, singleton↔pool sync, concurrency races, refresh error paths, runtime resolution, and malformed-JSON guards. * Extended `test_credential_pool.py`, `test_codex_transport.py`, and `test_run_agent_codex_responses.py` cover the pool sync-back, `extra_body` routing, and 401 reactive refresh paths. * 165 tests passing on this branch via `scripts/run_tests.sh`.	2026-05-15 12:11:32 -07:00
teknium1	47614dbfca	chore: wire simplex docs into sidebar + AUTHOR_MAP - Adds plugins/platforms/simplex docs page to the messaging sidebar between LINE and Open WebUI. - Maps louismichalot@hotmail.com -> Mibayy in scripts/release.py so the attribution check on the salvage PR passes.	2026-05-15 01:41:30 -07:00
Mibayy	09d9724a09	feat(gateway): add SimpleX Chat platform plugin SimpleX Chat (https://simplex.chat) is a private, decentralised messenger with no persistent user IDs — every contact is identified by an opaque internal ID generated at connection time. This adds it as a Hermes gateway platform via the plugin system. The adapter connects to a local simplex-chat daemon via WebSocket, listens for inbound messages, and sends replies. Originally proposed in PR #2558 as a core-modifying integration; reshaped here as a self- contained plugin under plugins/platforms/simplex/ with no edits to any core file. Discovery is filesystem-based (scanned by gateway.config), and the platform identity is resolved on demand via Platform("simplex"). Plugin contract: - check_requirements() requires SIMPLEX_WS_URL AND the websockets package - validate_config() / is_connected() accept env or config.yaml input - _env_enablement() seeds PlatformConfig.extra (ws_url + home_channel) - _standalone_send() supports out-of-process cron delivery - interactive_setup() provides a stdin wizard for hermes gateway setup - register() wires the adapter into the registry with required_env, install_hint, cron_deliver_env_var, allowed_users_env, and a platform_hint for the LLM. Lazy dependency: the websockets Python package is imported inside the functions that need it. The plugin is importable and discoverable even when websockets is missing — check_requirements() simply returns False until `pip install websockets` is run. No new pyproject extras are introduced. Environment variables: SIMPLEX_WS_URL WebSocket URL of the daemon (required) SIMPLEX_ALLOWED_USERS Comma-separated allowed contact IDs SIMPLEX_ALLOW_ALL_USERS Set true to allow all contacts SIMPLEX_HOME_CHANNEL Default contact for cron delivery SIMPLEX_HOME_CHANNEL_NAME Human label for the home channel Closes #2557.	2026-05-15 01:41:30 -07:00
teknium1	85782a4ed7	feat(acp): hermes acp --setup-browser bootstraps browser tools for registry installs The Zed ACP Registry path (uvx --from 'hermes-agent[acp]==X' hermes-acp) gets a Python-only install. Browser tools depend on the agent-browser npm package + Chromium, neither of which are in the wheel. Without an explicit bootstrap, registry users have no path to working browser tools. Ship a bundled, idempotent bootstrap script (Linux/macOS bash + Windows PowerShell) inside acp_adapter/bootstrap/ as wheel package-data. New entry points: hermes acp --setup-browser # interactive; prompts before Chromium download hermes acp --setup-browser --yes # non-interactive hermes-acp --setup-browser The terminal-auth flow (hermes acp --setup) also offers the browser bootstrap as a follow-up after model selection, so first-run registry users get the option without knowing the flag exists. Key design choices: - npm install -g --prefix $NODE_PREFIX so we never need sudo. System Node on PATH is respected; only the install target is redirected to the user-writable Hermes-managed Node prefix. - tools/browser_tool.py::_browser_candidate_path_dirs() already walks $HERMES_HOME/node/bin, so installed binaries are discovered with no agent-side code change. - System Chrome/Chromium detection short-circuits the ~400 MB Playwright download when a suitable browser already exists. - Bash + PowerShell live as ONE copy each under acp_adapter/bootstrap/. Not duplicated under scripts/. install.sh and install.ps1 keep their inline browser blocks for the source-checkout path. E2E validated end-to-end: bash bootstrap_browser_tools.sh --skip-chromium → installs agent-browser into ~/.hermes/node/bin/ tools.browser_tool._find_agent_browser() → returns the installed path check_browser_requirements() → returns True (browser tools register) Tests: - tests/acp/test_entry.py: 11 tests covering --setup-browser dispatch (linux + windows + --yes forwarding + failure propagation), the terminal-auth follow-up prompt path, and a package-data wheel-shipping assertion that catches any future pyproject.toml regression. Docs: website/docs/user-guide/features/acp.md gains a 'Browser tools (optional)' subsection with the two-line install + what-it-does.	2026-05-15 01:38:24 -07:00
Teknium	05d9f641c0	docs(cron): worked recipes for the wakeAgent pre-run gate (#26229 ) Adds three pre-run gate recipes to the cron docs: - file-change gate (stat + mtime + state file) - external-flag gate (file presence) - SQL-count gate (user's own database, not state.db) These are the use cases @iankar8 proposed adding as a parallel 'trigger' subsystem in #2654. The existing `script` + `wakeAgent` gate already covers all three at $0 — this lands the patterns as documentation so users can find them, instead of adding a second gating mechanism to the cron subsystem.	2026-05-15 01:34:15 -07:00
kshitijk4poor	e0e4856d46	feat(skills-hub): add huggingface/skills as trusted default tap (#2549 ) Adds Hugging Face's official skill catalog to the default GitHub taps and classifies it as a trusted source alongside openai/skills and anthropics/skills. - tools/skills_guard.py: huggingface/skills -> TRUSTED_REPOS - tools/skills_hub.py: GitHubSource.DEFAULT_TAPS += huggingface/skills (skills/) - website/docs: list it under default taps + trusted-source examples Closes #2549. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-15 01:25:33 -07:00
teknium1	c8c6ce1731	feat(acp-registry): switch to uvx distribution, drop npm launcher The ACP Registry schema supports uvx as a first-class distribution method alongside npx and binary. Pointing the registry directly at the existing hermes-agent PyPI release removes: - the @nousresearch npm scope (we don't own it) - a separate npm publish step on every weekly release - 90 lines of Node launcher + tests in packages/hermes-agent-acp/ The Zed registry now installs Hermes via: uvx --from 'hermes-agent[acp]==<version>' hermes-acp This is the same command the npm launcher was shelling out to anyway, so end-user behavior is unchanged. Registry CI validates the PyPI URL + version-pin exact match automatically. Changes: - acp_registry/agent.json: distribution.npx -> distribution.uvx - delete packages/hermes-agent-acp/ entirely - scripts/release.py: drop npm-launcher bump paths, keep manifest lockstep - tests/acp/test_registry_manifest.py: assert uvx shape + version pin - tests/scripts/test_release_acp_registry.py: rewrite for uvx-only shape - docs (user-guide + dev-guide): drop all npm-launcher references - delete docs/plans/acp-registry-zed-integration.md (stale, npm-shaped) Validated against agentclientprotocol/registry agent.schema.json via jsonschema. hermes-agent==0.13.0 is already live on PyPI.	2026-05-14 22:27:09 -07:00
Siddharth Balyan	5af672c753	chore: remove Atropos RL environments and tinker-atropos integration (#26106 ) * chore: remove Atropos RL environments, tools, tests, skill, and tinker-atropos submodule Delete: - environments/ (43 files — base env, agent loop, tool call parsers, benchmarks) - rl_cli.py (standalone RL training CLI) - tools/rl_training_tool.py (all 10 rl_* tools) - tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support, test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_terminalbench2_env_security - optional-skills/mlops/hermes-atropos-environments/ - tinker-atropos git submodule + .gitmodules * chore: remove RL/Atropos references from Python source - toolsets.py: remove rl toolset block + update comment - model_tools.py: remove rl_tools group + update async bridging comment - hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS, setup block, and rl_training post-setup handler - tools/budget_config.py: remove RL environment reference in docstring - tests/test_model_tools.py: remove rl_tools from expected groups - tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference * chore: remove rl/yc-bench extras and tinker-atropos refs from pyproject.toml - Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb) - Remove yc-bench extra - Remove rl_cli from py-modules - Remove [tool.ty.src] exclude for tinker-atropos - Remove [tool.ruff] exclude for tinker-atropos - Regenerate uv.lock * chore: remove tinker-atropos from install/setup scripts - setup-hermes.sh: remove entire tinker-atropos submodule install block - scripts/install.sh: remove both tinker-atropos blocks (Termux + standard) - scripts/install.ps1: remove tinker-atropos block - nix/hermes-agent.nix: remove tinker-atropos pip install line * chore: remove RL references from cli-config.yaml.example * docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md * docs: remove RL/Atropos references from website - Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md - sidebars.ts: remove rl-training and environments sidebar entries - optional-skills-catalog.md: remove hermes-atropos-environments row - tools-reference.md: remove entire rl toolset section - toolsets-reference.md: remove rl row + update example - integrations/index.md: remove RL Training bullet - architecture.md: remove environments/ from tree + RL section - contributing.md: remove tinker-atropos setup - updating.md: remove tinker-atropos install + stale submodule update * chore: remove remaining RL/Atropos stragglers - hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs - hermes_cli/doctor.py: remove Submodules check section (tinker-atropos) - hermes_cli/setup.py: remove RL Training status check - hermes_cli/status.py: remove Tinker + WandB from API key status display - agent/display.py: remove both rl_* tool preview/activity blocks - website/docs: remove RL references from providers.md + env-variables.md - tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script * chore: remove RL training section from .env.example	2026-05-15 10:36:38 +05:30
mr-r0b0t	4c94396206	feat: add ACP registry metadata for Zed	2026-05-14 20:26:02 -07:00
teknium1	4695d2716f	fix(browser): honor pre-set AGENT_BROWSER_ARGS and document the bypass Follow-up to the sandbox-bypass env-var fix: - Update the opt-out gate so a user-provided AGENT_BROWSER_ARGS is also respected, not just the legacy AGENT_BROWSER_CHROME_FLAGS. Previously the gate only checked the broken legacy var, so a user who pre-set AGENT_BROWSER_ARGS would still get clobbered by Hermes's auto-injection. - Document AGENT_BROWSER_ARGS in .env.example, the browser feature page, and the env var reference, with notes about the auto-injection on AppArmor-restricted systems (Ubuntu 23.10+, DGX Spark, containers). - Add Anadi Jaggia to AUTHOR_MAP.	2026-05-14 19:02:17 -07:00
teknium1	4abfb6bc24	feat(discord): default history backfill on, expand to per-user + threads Follow-up to snav's PR #25463 contribution: flip default to on, broaden scope so backfill fires whenever require_mention gates the bot (not just shared-session channels). Why: - The mention-gate creates a session-transcript gap regardless of whether the channel is shared or per-user. In per-user sessions, Alice's session is still missing other participants' messages and her own pre-mention messages — backfill fills both gaps. - Threads naturally scope to thread-only history because discord.py's channel.history() on a thread returns only that thread's messages. - DMs still skip — every DM triggers the bot, so the session transcript is already complete. Changes: - hermes_cli/config.py: discord.history_backfill default → true - gateway/platforms/discord.py: drop the _is_shared gate, keep _is_dm skip and _needed_mention gate; env var DISCORD_HISTORY_BACKFILL default → 'true' - cli-config.yaml.example + website docs: update defaults and prose; add the DISCORD_HISTORY_BACKFILL / _LIMIT env var rows that were documented in the PR description but missing from the env-var table - tests/gateway/test_discord_free_response.py: - flip test_discord_per_user_channel_does_not_backfill → test_discord_per_user_channel_backfills_too (new behavior) - add test_discord_dm_does_not_backfill (DM skip is invariant) - give FakeThread a no-op history() so existing thread tests don't hit a fake discord.Forbidden when backfill now fires on threads too Tests: 160/160 in target files; 400/400 across all tests/gateway/ -k discord.	2026-05-14 15:50:57 -07:00
snav	e84fe483bc	feat(discord): channel history backfill for multi-user sessions Adds optional channel-context backfill for Discord shared-channel sessions so the agent can see recent messages it missed between its own turns (typically when require_mention=true filters out most traffic). Previously the agent only saw the @mention message that triggered it, which led to disorienting replies in active multi-user channels where the conversation context was invisible. With backfill enabled, a configurable number of recent messages are fetched per-turn and prepended to the trigger message as a context block, kept separate from sender-prefix logic so attribution remains clean. This re-opens the work from #13063 (approved by @OutThisLife on 2026-04-20, closed when I closed the branch to address the simpolism:main head-branch issue plus an ordering bug I caught later in live use). Filing against the freshly-rewritten problem statement in #13054 so the design is grounded in the failure mode rather than the implementation shape. The implementation follows the push-mode last-self-anchored design from the two options laid out in #13054. See the issue for the trade-off discussion vs pull-mode (#13120 was an earlier closed PR using that shape). Treating this as a reference implementation — happy to rewrite as last-trigger anchoring or as a hybrid with #13120 if maintainers prefer. Changes: - gateway/platforms/discord.py: - new `_discord_history_backfill()` / `_discord_history_backfill_limit()` helpers (config.extra > env > default), mirroring the existing `_discord_require_mention()` shape - new `_fetch_channel_context()` that scans `channel.history()` backwards from the trigger to the bot's last message (or limit), formats as `[Recent channel messages] / [name] msg / ...`, respects DISCORD_ALLOW_BOTS, skips system messages - per-channel `_last_self_message_id` cache to narrow the fetch window on hot paths (avoids full history scan when the bot has spoken recently) - IMPORTANT: passes `oldest_first=False` explicitly to `channel.history()`. discord.py 2.x silently flips the default to True when `after=` is supplied, which would select the EARLIEST N messages after our last response instead of the LATEST N before the trigger. In high-traffic windows this would return stale tool traces and drop the actual final answer the user is asking about. See regression test below. Caught in live use during a Codex tool-trace burst on May 13 2026. - gateway/config.py: discord_history_backfill + discord_history_backfill_limit settings + yaml→env bridge - gateway/platforms/base.py: channel_context field on MessageEvent - gateway/run.py: prepend channel_context after sender-prefix so the [sender name] tag applies to the trigger message alone, not to the backfill - hermes_cli/config.py: defaults for new discord.history_backfill and discord.history_backfill_limit keys - cli-config.yaml.example: documented defaults - tests/gateway/test_discord_free_response.py: 7 new tests covering cold-start backfill, self-message stop boundary, other-bot filtering, cache hot-path narrowing, stale-cache fallback, shared-channel + per-user backfill paths, and the ordering regression test (`test_fetch_channel_context_cache_uses_latest_window_when_after_set`) - tests/gateway/test_config.py: yaml→env bridge tests - tests/gateway/test_session.py: prefix-order edge cases - website/docs/user-guide/messaging/discord.md: env vars + config keys + usage docs Tested on Ubuntu 24.04 — empirically validated in my own multi-bot Discord research server for the past three weeks. Fixes #13054 Supersedes #13063 (closed)	2026-05-14 15:50:57 -07:00
Teknium	ccb5aae0d2	feat(proxy): local OpenAI-compatible proxy for OAuth providers (#25969 ) Adds 'hermes proxy start' — a local HTTP server that lets external apps (OpenViking, Karakeep, Open WebUI, ...) use a Hermes-managed provider subscription as their LLM endpoint. The proxy attaches the user's real OAuth-resolved credentials to each forwarded request, refreshing them automatically; the client can send any bearer (it gets stripped). Ships with one adapter — Nous Portal. The UpstreamAdapter ABC and registry in hermes_cli/proxy/adapters/ are designed for additional OAuth providers to plug in by name without server changes. Commands: hermes proxy start [--provider nous] [--host 127.0.0.1] [--port 8645] hermes proxy status hermes proxy providers Allowed Portal paths: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models. Anything else returns 404 with a clear error pointing at the allowed list. aiohttp is gated like gateway/platforms/api_server.py (try-import, clean runtime error if missing). No new core dependency. Tests: 24 unit tests + 1 separate E2E that spawns the real subprocess and verifies the upstream receives the right bearer with the client's header stripped.	2026-05-14 15:40:48 -07:00
Teknium	78b842c995	fix(install): support non-sudo service-user installs on apt distros (#25814 ) The Debian/Ubuntu branch of install_node_deps() ran 'npx playwright install --with-deps chromium' unconditionally. Playwright invokes sudo interactively to apt-install Chromium's system libraries, which blocks the installer for non-sudo users (systemd service accounts, unprivileged operator users) on an unsatisfiable password prompt. Changes: - install.sh: gate --with-deps behind a sudo capability check on the apt branch (matches the existing Arch/pacman branch pattern). Non-sudo users fall back to 'npx playwright install chromium' alone and the installer prints the exact 'sudo npx playwright install-deps chromium' command an administrator can run separately. - install.sh: add --skip-browser (alias --no-playwright) to skip the Playwright step entirely for headless installs that don't need browser automation. Mirrors the existing --no-venv / --skip-setup shape. - installation.md: add a 'Non-Sudo / System Service User Installs' section covering the admin/service-user split, the --skip-browser flag, and the ~/.local/bin PATH gotcha (the root cause of the 'No module named dotenv' error users hit when running the repo source 'hermes' script with system Python instead of the venv launcher). - test_install_sh_browser_install.py: regression coverage for the --skip-browser flag and the sudo-gate on the apt branch. Reported by @ssilver in Discord.	2026-05-14 09:05:31 -07:00
evgyur	1dd33988e2	docs: clarify media impact on session context	2026-05-14 07:58:20 -07:00
Alex	ddb8d8fa84	docs: update NovitaAI provider positioning (#25532 )	2026-05-14 01:31:12 -07:00
Alex-wuhu	1551ce46a4	docs: update NovitaAI description to "90+ models, pay-per-use"	2026-05-13 23:51:15 -07:00
Alex-wuhu	c76e879574	feat: add NovitaAI as LLM provider Add NovitaAI as a first-class provider with dedicated model selection flow, live pricing, and authoritative context length resolution. - Register provider in PROVIDER_REGISTRY, HERMES_OVERLAYS, and all alias/label maps (ID: novita, aliases: novita-ai, novitaai) - Add dedicated _model_flow_novita() with 3-tier model list fallback: Novita API → models.dev → static curated list - Fetch live pricing from /v1/models with correct unit conversion (input_token_price_per_m is 0.0001 USD per Mtok) - Add Novita-specific context length resolution (step 4b) in get_model_context_length(), prioritized over models.dev/OpenRouter - Register api.novita.ai in _URL_TO_PROVIDER to prevent early return from the custom-endpoint code path - Add models.dev mapping (novita → novita-ai) - Add default auxiliary model (deepseek/deepseek-v3-0324) - Add NOVITA_API_KEY to test isolation (conftest.py) - Update docs: providers page, env vars reference, CLI reference, .env.example, README, and landing page	2026-05-13 23:51:15 -07:00
freqyfreqy	8de26e280e	docs(lsp): replace "git worktree" with "git repository" in LSP docs The word "worktree" (a git subcommand feature for parallel checkouts) was used interchangeably with "repository" in the LSP docs, causing confusion. LSP only requires a git-initialized directory, not an actual worktree. Fixes two instances: section "When LSP runs" and the troubleshooting "Editing a file outside any git repo" heading.	2026-05-13 23:05:20 -07:00
domtriola	796c8a2d63	docs(user-guide): point tirith link to correct repo	2026-05-13 23:04:57 -07:00
snav	d863773c81	feat(discord): add thread_require_mention for multi-bot threads By default, once Hermes participates in a Discord thread (auto-created on @mention or replied in once) it auto-responds to every subsequent message in that thread without requiring further @mentions. That's the right default for one-on-one conversations and isolated channel threads. But it's a confirmed footgun in multi-bot threads. When a user invokes one bot per turn — addressing Codex first, then Hermes — every other bot in the thread also fires on every message, burning credits and spamming the channel. Author has hit this personally in active multi-bot research-team threads. Add a new `discord.thread_require_mention` config key (env: `DISCORD_THREAD_REQUIRE_MENTION`), default `false` to preserve existing behavior. When `true`, the in-thread mention shortcut is disabled and threads are gated the same way channels are. Explicit @mentions still pass through as expected. Mirrors the existing helper shape (config.extra > env > default) and the existing yaml→env bridge pattern used by `require_mention`. Changes: - gateway/platforms/discord.py: new `_discord_thread_require_mention()` helper; in_bot_thread shortcut now AND's with `not _discord_thread_require_mention()` - gateway/config.py: bridge `discord.thread_require_mention` from config.yaml to `DISCORD_THREAD_REQUIRE_MENTION` env var (mirrors the existing `require_mention` bridge two lines above) - hermes_cli/config.py: add `thread_require_mention: False` default to DEFAULT_CONFIG['discord'] - tests/gateway/test_discord_free_response.py: 4 new tests covering default behaviour (in-thread shortcut still works), enabled behaviour (mention required in threads), enabled+mentioned (mention still passes through), and yaml-via-config.extra path. Also clears DISCORD_* env vars in the `adapter` fixture so process-env state from the contributor's shell doesn't leak into per-test behaviour. - tests/gateway/test_config.py: 2 new tests covering the yaml→env bridge (both the apply-from-yaml and env-precedence-over-yaml paths) - website/docs/user-guide/messaging/discord.md: document the new env var + config key with multi-bot rationale; cross-link from `auto_thread` section Tested on Ubuntu 24.04.	2026-05-13 22:21:43 -07:00
kshitijk4poor	3633c8690b	refactor(plugins): add apply_yaml_config_fn registry hook Lets platform plugins own their YAML→env config bridge instead of forcing core gateway/config.py to know every platform's schema. The hook receives the full parsed config.yaml and the platform's own sub-dict, may mutate os.environ (env > YAML precedence preserved via the standard `not os.getenv(...)` guards), and may return a dict to merge into PlatformConfig.extra. It runs during load_gateway_config() after the existing generic shared-key loop and before _apply_env_overrides(), mirroring the env_enablement_fn dispatch pattern (#21306, #21331). Pure addition — no behavior change for existing platforms. Each of the eight platforms with hardcoded YAML→env blocks today (discord, telegram, whatsapp, slack, dingtalk, mattermost, matrix, feishu, ~252 LOC in gateway/config.py) can migrate in independent follow-up PRs; the hardcoded blocks remain functional in the meantime, and their `not os.getenv(...)` guards make them no-ops for any env var the hook already set. Test coverage: 10 new tests in tests/gateway/test_platform_registry.py covering field default, callable acceptance, env mutation, extras merge, both signature args, exception swallowing, missing/non-dict sections, and env > YAML precedence. Refs #3823, #24356. Closes #24836.	2026-05-13 22:20:30 -07:00
Teknium	d5775fe988	feat(codex-runtime): skip unavailable plugins during migration (#25437 ) Followup to PR #24182 — caught when scanning OpenClaw for recent codex fixes we hadn't considered. OpenClaw learned the hard way (#80815) that migrating plugins which codex itself reports as unavailable produces config that fails at activation time. Our /codex-runtime codex_app_server enable path queries codex's plugin/list and migrates everything where installed=true. We were trusting codex's installation state and ignoring its availability field. So a plugin that's installed=true but availability=UNAVAILABLE (broken local install) or REQUIRES_AUTH (OAuth expired or never completed) would get an [plugins."<n>@openai-curated"] entry in ~/.codex/config.toml — and the user's first codex turn after enabling the runtime would fail because codex refuses to activate it. Fix: filter on availability in _query_codex_plugins(). Only emit plugins where availability is empty (older codex versions without the field — preserve backward compat) or explicitly AVAILABLE. Tests: test_plugin_discovery_skips_unavailable_plugins — verifies 4 cases: - good-plugin (installed=True, availability=AVAILABLE) → migrated - broken-plugin (installed=True, availability=UNAVAILABLE) → skipped - auth-pending (installed=True, availability=REQUIRES_AUTH) → skipped - legacy-plugin (installed=True, no availability field) → migrated (older codex versions; preserve backward compat) Docs: Added bullet to 'What's NOT migrated' list in the docs page calling out the availability filter and why. Other OpenClaw codex PRs I reviewed but did NOT apply (with reasoning): - #81591 (load Codex for selectable models): we resolve runtime per-call already, no startup-time gating to fix - #81510 (cron compatibility): we documented cron as untested; their fix is for OpenClaw-specific cron orchestration shape - #81223 (rotate incompatible context-engine threads): we don't have a Lossless context engine equivalent - #80688 (constrain sandbox): we don't have an outer-sandbox concept - #80616 (release on turn_aborted): we already handle status= interrupted in turn/completed correctly - #80278 (expose activeModel in plugin SDK): not our surface - #80792 (default destructive_actions on): we don't expose that knob 56 codex-runtime migration tests still green (+1 new).	2026-05-13 22:20:27 -07:00
Teknium	6122a79aab	feat(slack): support !cmd as alternate prefix for slash commands in threads (#25355 ) Some checks are pending Deploy Site / deploy-vercel (push) Waiting to run Details Deploy Site / deploy-docs (push) Waiting to run Details Docker Build and Publish / build-amd64 (push) Waiting to run Details Docker Build and Publish / build-arm64 (push) Waiting to run Details Docker Build and Publish / merge (push) Blocked by required conditions Details Docker Build and Publish / move-main (push) Blocked by required conditions Details Docker Build and Publish / move-latest (push) Blocked by required conditions Details Lint (ruff + ty) / ruff + ty diff (push) Waiting to run Details Lint (ruff + ty) / ruff enforcement (blocking) (push) Waiting to run Details Lint (ruff + ty) / Windows footguns (blocking) (push) Waiting to run Details Nix / nix (macos-latest) (push) Waiting to run Details Nix / nix (ubuntu-latest) (push) Waiting to run Details Tests / test (push) Waiting to run Details Tests / e2e (push) Waiting to run Details Slack platform-blocks native slash commands inside thread replies ("/queue is not supported in threads. Sorry!") and there is no app-side setting to re-enable them. As a workaround, rewrite a leading '!' to '/' for any known gateway command before downstream processing — so '!queue', '!stop', '!model gpt-5.4' etc. work inside Slack threads (and anywhere else). Only the first token is checked against is_gateway_known_command(), so casual messages like '!nice work' pass through to the agent unchanged. Downstream pipeline (MessageType.COMMAND tagging, gateway dispatcher, thread reply routing) is unchanged. Adds 6 tests covering rewrite, args preservation, thread routing, casual-message passthrough, '@bot' suffix, and plain '/' still-works.	2026-05-13 18:58:14 -07:00
teknium1	66c70966cd	chore(skills/evm): tighten SKILL.md to modern format - description ≤60 chars (was 346) - platforms: [linux, macos, windows] — script is pure stdlib (urllib, json, argparse), no POSIX-only primitives - author: credit @Mibayy + @youssefea + @ethernet8023 + Hermes Agent (was just Mibayy) - regenerated auto-gen docs page	2026-05-13 17:18:39 -07:00
ethernet	e3fc081499	feat(skills): merge blockchain/base into blockchain/evm; salvage PR #2010 Salvages the closed PR #2010 (Mibayy's EVM multi-chain skill) and folds the existing optional-skills/blockchain/base/ skill into it, so we ship one unified EVM skill instead of two overlapping ones. Pulled in from base/: - 8 missing Base-specific tokens (AERO, DEGEN, TOSHI, BRETT, WELL, cbETH, cbBTC, wstETH, rETH) added to KNOWN_TOKENS['base'] — base/ had 11, evm/ only had 3 (USDC/DAI/WETH). - L1 data-fee pitfall note for rollups (Base, Arbitrum, Optimism, zkSync). - Batch-size chunking in rpc_batch (Base RPC caps batches at 10 calls per JSON-RPC request; adding more known tokens tripped that limit and broke 'wallet --chain base' with a 'list index out of range' error). Ported the chunking pattern from base/_rpc_batch_chunk. Latent bugs found and fixed while smoke-testing the merge: - cmd_multichain and cmd_allowance both iterated KNOWN_TOKENS[chain] with 'for contract, (symbol, _name) in known.items()' — but the dict shape is {symbol: contract_str}, not {addr: (sym, name)}. This raised 'too many values to unpack (expected 2)' on every non-zero balance. Now iterates as 'for symbol, contract in known.items()'. - Input validation: added is_valid_address / is_valid_txhash / require_address / require_txhash helpers and wired them into cmd_wallet, cmd_tx, cmd_token, cmd_activity, cmd_allowance, cmd_decode, cmd_contract, cmd_multichain. Fails fast with exit 2 on malformed input instead of burning an RPC round-trip on garbage. Documentation: - SKILL.md now flags that this skill supersedes optional-skills/blockchain/base. - Pitfalls expanded for ENS (single-endpoint dependency on ensideas.com), tx decoding (single-endpoint dependency on 4byte.directory), and rollup L1 fees. - Regenerated website/docs/user-guide/skills/optional/blockchain/ blockchain-evm.md and removed the old blockchain-base.md page; catalog updated. Removed: - optional-skills/blockchain/base/SKILL.md - optional-skills/blockchain/base/scripts/base_client.py - website/docs/user-guide/skills/optional/blockchain/blockchain-base.md Smoke-tested live against Base mainnet: stats, price, token, wallet (vitalik.eth — 3.12 ETH + 13.88 USDC + 4.23 DAI + 0.06 WETH on Base) and allowance (ethereum, 7 unlimited approvals to Uniswap/Permit2). Original PR #2010 author: Mibayy. Original base/ skill author: youssefea.	2026-05-13 17:18:39 -07:00
Teknium	091d8e1030	feat(codex-runtime): optional codex app-server runtime for OpenAI/Codex models (#24182 ) * feat(codex-runtime): scaffold optional codex app-server runtime Foundational commit for an opt-in alternate runtime that hands OpenAI/Codex turns to a 'codex app-server' subprocess instead of Hermes' tool dispatch. Default behavior is unchanged. Lands in three pieces: 1. agent/transports/codex_app_server.py — JSON-RPC 2.0 over stdio speaker for codex's app-server protocol (codex-rs/app-server). Spawn, init handshake, request/response, notification queue, server-initiated request queue (for approval round-trips), interrupt-friendly blocking reads. Tested against real codex 0.130.0 binary end-to-end during development. 2. hermes_cli/runtime_provider.py: - Adds 'codex_app_server' to _VALID_API_MODES. - Adds _maybe_apply_codex_app_server_runtime() helper, called at the end of _resolve_runtime_from_pool_entry(). Inert unless 'model.openai_runtime: codex_app_server' is set in config.yaml AND provider in {openai, openai-codex}. Other providers cannot be rerouted (anthropic, openrouter, etc. preserved). 3. tests/agent/transports/test_codex_app_server_runtime.py — 24 tests covering api_mode registration, the rewriter helper (default-off, case-insensitive, opt-in, non-eligible providers preserved), version parser, missing-binary handling, error class. Does NOT require codex CLI installed. This commit is wire-only: the api_mode is recognized but AIAgent does not yet branch on it. Followup commits add the session adapter, event projector, approval bridge, transcript projection (so memory/skill review still works), plugin migration, and slash command. Existing tests remain green: - tests/cli/test_cli_provider_resolution.py (29 passed) - tests/agent/test_credential_pool_routing.py (included above) * feat(codex-runtime): add codex item projector for memory/skill review The translator that lets Hermes' self-improvement loop keep working under the Codex runtime: converts codex 'item/' notifications into Hermes' standard {role, content, tool_calls, tool_call_id} message shape that agent/curator.py already knows how to read. Item taxonomy (matches codex-rs/app-server-protocol/src/protocol/v2/item.rs): - userMessage → {role: user, content} - agentMessage → {role: assistant, content: text} - reasoning → stashed in next assistant's 'reasoning' field - commandExecution → assistant tool_call(name='exec_command') + tool result - fileChange → assistant tool_call(name='apply_patch') + tool result - mcpToolCall → assistant tool_call(name='mcp.<server>.<tool>') + tool result - dynamicToolCall → assistant tool_call(name=<tool>) + tool result - plan/hookPrompt/etc → opaque assistant note, no fabricated tool_calls Invariants preserved: - Message role alternation never violated: each tool item produces at most one assistant + one tool message in that order, correlated by call_id. - Streaming deltas (item/<type>/outputDelta, item/agentMessage/delta) don't materialize messages — only item/completed does. Mirrors how Hermes already only writes the assistant message after streaming ends. - Tool call ids are deterministic (codex item id-based) so replays produce identical messages and prefix caches stay valid (AGENTS.md pitfall #16). - JSON args use sorted_keys for the same reason. Real wire formats verified against codex 0.130.0 by capturing live notifications from thread/shellCommand and including one as a fixture (COMMAND_EXEC_COMPLETED). 23 new tests, all green: - Streaming deltas don't materialize (3 paths) - Turn/thread frame events are silent - commandExecution: 5 tests including non-zero exit annotation + deterministic id stability across replays - agentMessage + reasoning attachment + reasoning consumption - fileChange: summary without inlined content - mcpToolCall: namespaced naming + error surfacing - userMessage: text fragments only (drops images/etc) - opaque items: no fabricated tool_calls - Helpers: deterministic id stability + sorted JSON args - Role alternation invariant across all four tool-shaped item types This commit is a pure addition. AIAgent integration (the wire that uses the projector) is the next commit. feat(codex-runtime): add session adapter + approval bridge The third self-contained module: CodexAppServerSession owns one Codex thread per Hermes session, drives turn/start, consumes streaming notifications via CodexEventProjector, handles server-initiated approval requests, and translates cancellation into turn/interrupt. The adapter has a single public per-turn method: result = session.run_turn(user_input='...', turn_timeout=600) # result.final_text → assistant text for the caller # result.projected_messages → list ready to splice into AIAgent.messages # result.tool_iterations → tick count for _iters_since_skill nudge # result.interrupted → True on Ctrl+C / deadline / interrupt # result.error → error string when the turn cannot complete # result.turn_id, thread_id → for sessions DB / resume Behavior: - ensure_started() spawns codex, does the initialize handshake, and issues thread/start with cwd + permissions profile. Idempotent. - run_turn() blocks until turn/completed, drains server-initiated requests (approvals) before reading notifications so codex never deadlocks waiting for us, projects every item/completed via the projector, and increments tool_iterations for the skill nudge gate. - request_interrupt() is thread-safe (threading.Event); the next loop iteration issues turn/interrupt and unwinds. - turn_timeout deadlock guard issues turn/interrupt and records an error if the turn never completes. - close() escalates terminate → kill via the underlying client. Approval bridge: Codex emits server-initiated requests for execCommandApproval and applyPatchApproval. The adapter translates Hermes' approval choice vocabulary onto codex's decision vocabulary: Hermes 'once' → codex 'approved' Hermes 'session' or 'always' → codex 'approvedForSession' Hermes 'deny' / anything else → codex 'denied' Routing precedence: 1. _ServerRequestRouting.auto_approve_* flags (cron / non-interactive) 2. approval_callback wired by the CLI (defers to tools.approval.prompt_dangerous_approval()) 3. Fail-closed denial when neither is wired Unknown server-request methods are answered with JSON-RPC error -32601 so codex doesn't hang waiting for us. Permission profile mapping mirrors AGENTS.md: Hermes 'auto' → codex 'workspace-write' Hermes 'approval-required' → codex 'read-only-with-approval' Hermes 'unrestricted/yolo' → codex 'full-access' 20 new tests, all green. Combined with prior commits this PR now has 67 tests across three modules: - test_codex_app_server_runtime.py: 24 (api_mode + transport surface) - test_codex_event_projector.py: 23 (item taxonomy projections) - test_codex_app_server_session.py: 20 (turn loop + approvals + interrupts) Full tests/agent/transports/ directory: 249/249 pass — no regressions to existing transport tests. Still no wire into AIAgent.run_conversation(); that integration commit is small and goes next. * feat(codex-runtime): wire codex_app_server runtime into AIAgent The integration commit. AIAgent.run_conversation() now early-returns to a new helper _run_codex_app_server_turn() when self.api_mode == 'codex_app_server', bypassing the chat_completions tool loop entirely. Three small surgical edits to run_agent.py (~105 LOC total): 1. Line ~1204 (constructor api_mode validation set): Add 'codex_app_server' so an explicit api_mode='codex_app_server' passed to AIAgent() isn't silently rewritten to 'chat_completions'. 2. Line ~12048 (run_conversation, just before the while loop): Early-return to _run_codex_app_server_turn() when self.api_mode is 'codex_app_server'. Placed AFTER all standard pre-loop setup — logging context, session DB, surrogate sanitization, _user_turn_count and _turns_since_memory increments, _ext_prefetch_cache, memory manager on_turn_start — so behavior outside the model-call loop is identical between paths. Default Hermes flow is unchanged when the flag is off. 3. End-of-class (line ~15497): New method _run_codex_app_server_turn(). Lazy-instantiates one CodexAppServerSession per AIAgent (reused across turns), runs the turn, splices projected_messages into messages, increments _iters_since_skill by tool_iterations (since the chat_completions loop normally does that per iteration), fires _spawn_background_review on the same cadence as the default path. Counter accounting: _turns_since_memory ← already incremented at run_conversation:11817 (gated on memory store configured) — codex helper does NOT touch it (would double-count). _user_turn_count ← already incremented at run_conversation:11793 — codex helper does NOT touch it. _iters_since_skill ← incremented in the chat_completions loop per tool iteration. Codex helper increments by turn.tool_iterations since the loop is bypassed. User message: ALREADY appended to messages by run_conversation pre-loop (line 11823) before the early-return reaches us. Helper does NOT append again. Regression test test_user_message_not_duplicated guards this. Approval callback wiring: Lazy-fetches tools.terminal_tool._get_approval_callback at session spawn time, passes to CodexAppServerSession. CLI threads with prompt_toolkit get interactive approvals; gateway/cron contexts get the codex-side fail-closed deny. Error path: Codex session exceptions become a 'partial' result with completed=False and a final_response that explicitly tells the user how to switch back: 'Codex app-server turn failed: ... Fall back to default runtime with /codex-runtime auto.' Same return-dict shape as the chat_completions path so all callers (gateway, CLI, batch_runner, ACP) work unchanged. 9 new integration tests in tests/run_agent/test_codex_app_server_integration.py: - api_mode='codex_app_server' is accepted on AIAgent construction - run_conversation returns the expected codex shape (final_response, codex_thread_id, codex_turn_id, completed, partial) - Projected messages are spliced into messages list - _iters_since_skill ticks per tool iteration - _user_turn_count delegated to standard flow (not double-counted) - User message appears exactly once (regression guard) - _spawn_background_review IS invoked (memory/skill review keeps working) - chat.completions.create is NEVER called (loop fully bypassed) - Session exception → partial result with /codex-runtime auto hint - Interrupted turn → partial result with error preserved Adjacent test runs confirm no regressions: - tests/run_agent/test_memory_nudge_counter_hydration.py: green - tests/run_agent/test_background_review.py: green - tests/run_agent/test_fallback_model.py: green - tests/agent/transports/: 249/249 green Still missing for full feature: /codex-runtime slash command, plugin migration helper, docs page, live e2e test gated on codex binary. Those are the remaining followup commits. * feat(codex-runtime): add /codex-runtime slash command (CLI + gateway) User-facing toggle for the optional codex app-server runtime. Follows the 'Adding a Slash Command (All Platforms)' pattern from AGENTS.md exactly: single CommandDef in the central registry → CLI handler → gateway handler → running-agent guard → all surfaces (autocomplete, /help, Telegram menu, Slack subcommands) update automatically. Surface: /codex-runtime — show current state + codex CLI status /codex-runtime auto — Hermes default runtime /codex-runtime codex_app_server — codex subprocess runtime /codex-runtime on / off — synonyms Files changed: hermes_cli/codex_runtime_switch.py (new): Pure-Python state machine shared by CLI and gateway. Parse args, read/write model.openai_runtime in the config dict, gate enabling behind a codex --version check (don't let users opt in to a runtime they have no binary for; print npm install hint instead). Returns a CodexRuntimeStatus dataclass that callers render however suits their surface. hermes_cli/commands.py: Single CommandDef entry, no aliases (codex-runtime is its own thing). cli.py: Dispatch in process_command() + _handle_codex_runtime() handler that delegates to the shared module and renders results via _cprint. gateway/run.py: Dispatch in _handle_message() + _handle_codex_runtime_command() that returns a string (gateway sends as message). On a successful change that requires a new session, _evict_cached_agent() forces the next inbound message to construct a fresh AIAgent with the new api_mode — avoids prompt-cache invalidation mid-session. gateway/run.py running-agent guard: /codex-runtime joins /model in the early-intercept block so a runtime flip mid-turn can't split a turn across two transports. Tests: tests/hermes_cli/test_codex_runtime_switch.py — 25 tests covering the state machine: arg parsing (10 cases incl. case-insensitive and synonyms), reading current runtime (5 cases incl. malformed configs), writing runtime (3 cases), apply() entry point covering read-only, no-op, codex-missing-blocked, codex-present-success, disable-no-binary-check, and persist-failure paths (8 cases). All green. Adjacent test suites confirm no regressions: - tests/hermes_cli/test_commands.py + test_codex_runtime_switch.py: 167/167 green - tests/agent/transports/: 283/283 green when combined with prior commits Still missing: plugin migration helper, docs page, live e2e test gated on codex binary. Followup commits. * feat(codex-runtime): auto-migrate Hermes MCP servers to ~/.codex/config.toml Translates the user's mcp_servers config from ~/.hermes/config.yaml into the TOML format codex's MCP client expects. Wired into the /codex-runtime codex_app_server enable path so users get their MCP tool surface in the spawned subprocess automatically. The migration runs on every enable. Failures are non-fatal — the runtime change still proceeds and the user gets a warning so they can fix the codex config manually. What translates (mapping verified against codex-rs/core/src/config/edit.rs): Hermes mcp_servers.<n>.command/args/env → codex stdio transport Hermes mcp_servers.<n>.url/headers → codex streamable_http transport Hermes mcp_servers.<n>.timeout → codex tool_timeout_sec Hermes mcp_servers.<n>.connect_timeout → codex startup_timeout_sec Hermes mcp_servers.<n>.cwd → codex stdio cwd Hermes mcp_servers.<n>.enabled: false → codex enabled = false What does NOT translate (warned + skipped per server): Hermes-specific keys (sampling, etc.) — codex's MCP client has no equivalent. Listed in the per-server skipped[] field of the report. What's NOT migrated (intentional): AGENTS.md — codex respects this file natively in its cwd. Hermes' own AGENTS.md (project-level) is already in the worktree, so codex picks it up without translation. No code needed. Idempotency design: All managed content lives between a 'managed by hermes-agent' marker and the next non-mcp_servers section header. _strip_existing_managed_block removes the prior managed region cleanly, preserving any user-added codex config (model, providers.openai, sandbox profiles, etc.) above or below. Files added: hermes_cli/codex_runtime_plugin_migration.py — pure-Python migration helper. Public API: migrate(hermes_config, codex_home=None, dry_run=False) returns MigrationReport with .migrated/.errors/ .skipped_keys_per_server. No external TOML dependency — minimal formatter handles strings/numbers/booleans/lists/inline-tables. tests/hermes_cli/test_codex_runtime_plugin_migration.py — 39 tests covering: - per-server translation (12): stdio/http/sse, cwd, timeouts, enabled flag, command+url precedence, sampling drop, unknown keys - TOML formatter (8): types, escaping, inline tables, error case - existing-block stripping (4): no marker, alone, with user content above, with user content below - end-to-end migrate() (8): empty, dry-run, round-trip, idempotent re-run, preserves user config, error reporting, invalid input, summary formatting Files changed: hermes_cli/codex_runtime_switch.py — apply() now calls migrate() in the codex_app_server enable branch. Migration failure logs a warning in the result message but does NOT fail the runtime change. Disable path (auto) explicitly skips migration. tests/hermes_cli/test_codex_runtime_switch.py — 3 new tests: test_enable_triggers_mcp_migration, test_disable_does_not_trigger_migration, test_migration_failure_does_not_block_enable. All 325 feature tests green: - tests/agent/transports/: 249 (incl. 67 new) - tests/run_agent/test_codex_app_server_integration.py: 9 - tests/hermes_cli/test_codex_runtime_switch.py: 28 (3 new) - tests/hermes_cli/test_codex_runtime_plugin_migration.py: 39 (new) * perf(codex-runtime): cache codex --version check within apply() Single /codex-runtime invocation could spawn 'codex --version' up to 3 times (state report, enable gate, success message). Each spawn is ~50ms, so the cumulative cost wasn't a crisis, but it was wasteful and turned a trivial slash command into something noticeably laggy on slower systems. Refactored to lazy-once via a closure over a nonlocal cache. First call spawns; subsequent calls in the same apply() reuse the result. Behavior unchanged — same return shape, same error handling, same install hint when codex is missing. Just one subprocess per call instead of three. Two regression-guard tests added: - test_binary_check_cached_within_apply: enable path → call_count == 1 - test_binary_check_cached_on_read_only_call: state-report path → call_count == 1 Total tests for /codex-runtime now 30 (was 28); all 143 codex-runtime tests still green. * fix(codex-runtime): correct protocol field names found via live e2e test Three real bugs caught only by running a turn end-to-end against codex 0.130.0 with a real ChatGPT subscription. Unit tests passed because they asserted on our own (incorrect) wire shapes; the wire format from codex-rs/app-server-protocol/src/protocol/v2/* is the source of truth and my initial reading of the README was incomplete. Bug 1: thread/start.permissions wire format Was sending {"profileId": "workspace-write"}. Real format per PermissionProfileSelectionParams enum (tagged union): {"type": "profile", "id": "workspace-write"} AND requires the experimentalApi capability declared during initialize. AND requires a matching [permissions] table in ~/.codex/config.toml or codex fails the request with 'default_permissions requires a [permissions] table'. Fix: stop overriding permissions on thread/start. Codex picks its default profile (read-only unless user configures otherwise), which matches what codex CLI users expect — they configure their default permission profile in ~/.codex/config.toml the standard way. Trying to be clever about profile selection broke every turn we tested. Live error before fix: 'Invalid request: missing field type' on every turn/start, even though our turn/start payload was correct — the field codex was complaining about was inside the permissions sub-object we shouldn't have been sending. Bug 2: server-request method names Was matching 'execCommandApproval' and 'applyPatchApproval'. Real names per common.rs ServerRequest enum: item/commandExecution/requestApproval item/fileChange/requestApproval item/permissions/requestApproval (new third method) Fix: match the documented names. Added handler for item/permissions/requestApproval that always declines — codex sometimes asks to escalate permissions mid-turn and silent acceptance would surprise users. Live symptom before fix: agent.log showed 'Unknown codex server request: item/commandExecution/requestApproval' and codex stalled because we replied with -32601 (unsupported method) instead of an approval decision. The agent reported back 'The write command was rejected' even though Hermes never showed the user an approval prompt. Bug 3: approval decision values Was sending decision strings 'approved'/'approvedForSession'/'denied'. Real values per CommandExecutionApprovalDecision enum (camelCase): accept, acceptForSession, decline, cancel (also AcceptWithExecpolicyAmendment and ApplyNetworkPolicyAmendment variants we don't currently use). Fix: rename _approval_choice_to_codex_decision return values; update auto_approve_* fallbacks; update fail-closed default from 'denied' to 'decline'. Test mapping table updated to match. Live test verified after fixes: $ hermes (with model.openai_runtime: codex_app_server) > Run the shell command: echo hermes-codex-livetest > .../proof.txt then read it back Approval prompt fired with 'Codex requests exec in <cwd>'. User chose 'Allow once'. Codex executed the command, wrote the file, read it back. Final response: 'Read back from proof.txt: hermes-codex-livetest'. File contents on disk match. agent.log confirms: codex app-server thread started: id=019e200e profile=workspace-write cwd=/tmp/hermes-codex-livetest/workspace All 20 session tests still green after wire-format updates. * fix(codex-runtime): correct apply_patch approval params + ship docs Live e2e revealed FileChangeRequestApprovalParams doesn't carry the changeset (just itemId, threadId, turnId, reason, grantRoot) — Codex's 'reason' field describes what the patch wants to do. Test config and display logic updated to use it. The first 'apply_patch (0 change(s))' display from the live test is now 'apply_patch: <reason>'. Adds website/docs/user-guide/features/codex-app-server-runtime.md covering enable/disable, prerequisites, approval UX, MCP migration behavior, permission profile delegation to ~/.codex/config.toml, known limitations, and the architecture diagram. Wired into the Automation category in sidebars.ts. Live e2e validation across the path matrix: ✓ thread/start handshake ✓ turn/start with text input ✓ commandExecution items + projection ✓ item/commandExecution/requestApproval → Hermes UI → response ✓ Approve once → command runs ✓ Deny → command rejected, codex falls back to read-only message ✓ Multi-turn (codex remembers prior turn's results) ✓ apply_patch via Codex's fileChange path ✓ item/fileChange/requestApproval → Hermes UI ✓ MCP server migration loads inside spawned codex (verified via 'use the filesystem MCP tool' prompt) ✓ /codex-runtime auto → codex_app_server toggle cycle ✓ Disable doesn't trigger migration ✓ Enable with codex CLI present succeeds + migrates ✓ Hermes-side interrupt path (turn/interrupt request issued cleanly even if codex finishes before the interrupt lands) Known live-validated limitations now documented in the docs page: - delegate_task subagents unavailable on this runtime - permission profile selection delegated to ~/.codex/config.toml - apply_patch approval prompt has no inline changeset (codex protocol doesn't expose it) 145/145 codex-runtime tests still green. * feat(codex-runtime): native plugin migration + UX polish (quirks 2/4/5/10/11) Major: migrate native Codex plugins (#7 in OpenClaw's PR list) Discovers installed curated plugins via codex's plugin/list RPC and writes [plugins."<name>@<marketplace>"] entries to ~/.codex/config.toml so they're enabled in the spawned Codex sessions. This is the 'YouTube-video-worthy' bit Pash highlighted: when a user has google-calendar, github, etc. installed in their Codex CLI, those plugins activate automatically when they enable Hermes' codex runtime. Implementation: - hermes_cli/codex_runtime_plugin_migration.py: new _query_codex_plugins() helper spawns 'codex app-server' briefly and walks plugin/list. Returns (plugins, error) — failures are non-fatal so MCP migration still works. - render_codex_toml_section() now takes plugins + permissions args. - migrate() defaults: discover_plugins=True, default_permission_profile= 'workspace-write'. Explicit None on either disables that side. - _strip_existing_managed_block() now also strips [plugins.] and [permissions]/[permissions.] sections inside the managed block, so re-runs replace plugins cleanly without touching codex's own config. Quirk fixes: #2 Default permissions profile written on enable. Without this, Codex's read-only default kicks in and EVERY write triggers an approval prompt. Now writes [permissions] default = 'workspace-write' so the runtime feels normal out of the box. Set default_permission_profile=None to opt out. #4 apply_patch approval prompt now shows what's changing. Codex's FileChangeRequestApprovalParams doesn't carry the changeset. Session adapter now caches the fileChange item from item/started notifications and looks it up by itemId when codex requests approval. Prompt shows '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of 'apply_patch (0 change(s))'. Side benefit: also drains pending notifications BEFORE handling a server request, so the projector and per-turn caches are up to date when the approval decision fires. Bounded to 8 notifications per loop iter to avoid starving codex's response. #5/#10 Exec approval prompt never shows empty cwd. When codex omits cwd in CommandExecutionRequestApprovalParams, fall back to the session's cwd. If somehow neither is available, show '<unknown>' explicitly instead of an empty string. Also surfaces 'reason' from the approval params when codex provides it — gives users more context on why codex wants to run something. #11 Banner indicates the codex_app_server runtime when active. New 'Runtime: codex app-server (terminal/file ops/MCP run inside codex)' line appears in the welcome banner only when the runtime is on. Default banner is unchanged. Tests: - 7 new tests in test_codex_runtime_plugin_migration.py covering plugin discovery (mocked), failure handling, dry-run skip, opt-out flag, idempotent re-runs, and permissions writing. - 3 new tests in test_codex_app_server_session.py covering the enriched approval prompts: cwd fallback, change summary on apply_patch, fallback when no item/started cache exists. - All 26 session tests + 46 migration tests green; 153 total in PR. * feat(codex-runtime): hermes-tools MCP callback + native plugin migration The big architectural addition: when codex_app_server runtime is on, Hermes registers its own tool surface as an MCP server in ~/.codex/config.toml so the codex subprocess can call back into Hermes for tools codex doesn't ship with — web_search, browser_, vision, image_generate, skills, TTS. Also: 'migrate native codex plugins' (Pash's YouTube-video-worthy bit) — when the user has plugins like Linear, GitHub, Gmail, Calendar, Canva installed via 'codex plugin', Hermes discovers them via plugin/list and writes [plugins.<name>@openai-curated] entries so they activate automatically. New module: agent/transports/hermes_tools_mcp_server.py FastMCP stdio server exposing 17 Hermes tools. Each call dispatches through model_tools.handle_function_call() — same code path as the Hermes default runtime. Run with: python -m agent.transports.hermes_tools_mcp_server [--verbose] Exposed: web_search, web_extract, browser_navigate / _click / _type / _press / _snapshot / _scroll / _back / _get_images / _console / _vision, vision_analyze, image_generate, skill_view, skills_list, text_to_speech. NOT exposed (deliberately): - terminal/shell/read_file/write_file/patch — codex has built-ins - delegate_task/memory/session_search/todo — _AGENT_LOOP_TOOLS in model_tools.py:493, require running AIAgent context. Documented as a limitation and surfaced in the slash command output. Migration changes (hermes_cli/codex_runtime_plugin_migration.py): - _query_codex_plugins() spawns 'codex app-server' briefly to walk plugin/list and pull installed openai-curated plugins. Failures are non-fatal — MCP migration still completes. - render_codex_toml_section() now takes plugins + permissions args AND wraps the managed block with a MIGRATION_END_MARKER comment so the stripper can reliably find both ends, even when the block contains top-level keys (default_permissions = ...). - migrate() defaults: discover_plugins=True, expose_hermes_tools=True, default_permission_profile=':workspace' (built-in codex profile name — must be prefixed with ':'). All three opt-out via explicit args. - _build_hermes_tools_mcp_entry() builds the codex stdio entry with HERMES_HOME and PYTHONPATH passthrough so a worktree-launched Hermes points the MCP subprocess at the same module layout. Live-caught wire bugs fixed during this turn: 1. Permission profile config key is top-level , NOT a [permissions] table. The [permissions] table is for user-defined* profiles with structured fields. Built-in profile names start with ':' (':workspace', ':read-only', ':danger-no-sandbox'). Was emitting which codex rejected with 'invalid type: string "X", expected struct PermissionProfileToml'. 2. Built-in profile is , NOT . Codex rejected with 'unknown built-in profile'. 3. Codex's MCP layer sends for tool-call confirmation. We weren't handling it, so codex stalled and returned 'MCP tool call was rejected'. Now: auto-accept for our own hermes-tools server (user already opted in by enabling the runtime), decline for third-party servers. Quirk fixes shipped (from the limitations list): #2 default permissions: workspace profile written on enable. No more approval prompt on every write. #4 apply_patch approval shows what's changing: cache fileChange items from item/started, look up by itemId when codex sends item/fileChange/requestApproval. Prompt: '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of '0 change(s)'. #5/#10 exec approval cwd never empty: fall back to session cwd, then '<unknown>'. Also surfaces 'reason' from codex when present. #11 banner shows 'Runtime: codex app-server' line when active so users understand why tool counts may not match what's reachable. Tests: - 5 new tests in test_codex_runtime_plugin_migration.py covering plugin discovery, expose_hermes_tools entry generation, idempotent re-runs, opt-out flag, permissions profile. - 3 new tests in test_codex_app_server_session.py covering enriched approval prompts (cwd fallback, fileChange summary). - 2 new tests for mcpServer/elicitation/request handling (accept hermes-tools, decline others). - New test file test_hermes_tools_mcp_server.py covering module surface, EXPOSED_TOOLS safety invariants (no shell/file_ops, no agent-loop tools), and main() error paths. - 166 codex-runtime tests total, all green. Live e2e validated against codex 0.130.0 + ChatGPT subscription: ✓ /codex-runtime codex_app_server enables, migrates filesystem MCP, registers hermes-tools, writes default_permissions = ':workspace' ✓ Banner shows 'Runtime: codex app-server' line in subsequent sessions ✓ Shell command runs without approval prompt (workspace profile works) ✓ Multi-turn — codex remembers prior turn's results ✓ apply_patch path via fileChange request approval ✓ web_search via hermes-tools MCP callback returns real Firecrawl results: 'OpenAI Codex CLI – Getting Started' end-to-end in 13s ✓ Disable cycle clean Docs updated: website/docs/user-guide/features/codex-app-server-runtime.md Full re-write covering native plugin migration, the hermes-tools callback architecture, the prerequisites change ('codex login is separate from hermes auth login codex'), the trade-off table now reflecting which Hermes tools work via callback, and the limitations list updated with what's actually unavailable on this runtime. * feat(codex-runtime): pin user-config preservation invariant for quirk #6 Quirk #6 from the limitations list — user MCP servers / overrides / codex-only sections in ~/.codex/config.toml that live OUTSIDE the hermes-managed block must survive re-migration verbatim. This already worked thanks to the MIGRATION_MARKER + MIGRATION_END_MARKER pair I added when fixing the default_permissions wire format (so the strip can find both ends of the managed region even with top-level keys like default_permissions). But it was an emergent property without a test pinning it. Now explicitly tested: - User MCP server above the managed block survives migration - User MCP server below the managed block survives migration - Both above + below survive a second re-migration - User content (model, providers, sandbox, otel, etc.) outside our region is left untouched Docs added a section "Editing ~/.codex/config.toml safely" explaining the marker contract — so users know they can add their own MCP servers, override permissions, configure codex-only options, etc. without fear of Hermes overwriting their work. 167 codex-runtime tests, all green. * docs(codex-runtime): clarify the actual tool surface — shell covers terminal/read/write/find Previous docs and PR description undersold what codex's built-in toolset actually provides. apply_patch alone made it sound like the runtime could only edit files in patch format — implying you'd lose terminal use, read_file, write_file, search/find. That was wrong. Codex's 'shell' tool runs arbitrary shell commands inside the sandbox, which covers everything you'd do in bash: cat/head/tail (read), echo> or heredocs (write), find/rg/grep (search), ls/cd (navigate), build/ test/git/etc. apply_patch is for structured multi-file edits on top of that. update_plan is its in-runtime todo. view_image loads images. And codex has its own web_search built in (in addition to the Firecrawl-backed one Hermes exposes via MCP callback). Docs now have a 'What tools the model actually has' section right after Why, breaking the surface into three clearly-labeled buckets: 1. Codex's built-in toolset (always on) — shell, apply_patch, update_plan, view_image, web_search; covers everything terminal- adjacent. 2. Native Codex plugins (auto-migrated from your codex plugin install) — Linear, GitHub, Gmail, Calendar, Outlook, Canva, etc. 3. Hermes tool callback (MCP server in ~/.codex/config.toml) — web_search/web_extract via Firecrawl, browser_, vision_analyze, image_generate, skill_view/skills_list, text_to_speech. Plus a 'What's NOT available' callout listing the four agent-loop tools (delegate_task, memory, session_search, todo) that need running AIAgent context and can't reach the codex runtime. Trade-offs table broken out: shell, apply_patch, update_plan, view_image, sandbox each get their own row with a one-line description so users can see at a glance what's available natively. Architecture diagram updated to list the codex built-ins by name instead of 'apply_patch + shell + sandbox'. No code changes — purely docs clarification. 167 codex-runtime tests still green. fix(codex-runtime): _spawn_background_review signature + review fork api_mode downgrade Two real bugs in the self-improvement loop integration that the previous test mocked away. Bug 1: wrong call signature The codex helper was calling self._spawn_background_review() with no args after every turn. That function actually requires: messages_snapshot=list (positional or keyword) review_memory=bool (at least one trigger must be True) review_skills=bool So the call would have raised TypeError at runtime — except the only test that exercised this path mocked _spawn_background_review entirely and just asserted spawn.called, so the wrong-arg shape never surfaced. Bug 2: review fork inherits codex_app_server api_mode The review fork is constructed with: api_mode = _parent_runtime.get('api_mode') So when the parent is codex_app_server, the review fork ALSO runs as codex_app_server. But the review fork's whole job is to call agent-loop tools (memory, skill_manage) which require Hermes' own dispatch — they short-circuit with 'must be handled by the agent loop' on the codex runtime. So the review fork would have run, decided to save something, called memory or skill_manage, and silently no-op'd. Fixed in run_agent.py:_spawn_background_review() — when the parent api_mode is 'codex_app_server', the review fork is downgraded to 'codex_responses' (same OAuth credentials, same openai-codex provider, but talks to OpenAI's Responses API directly so Hermes owns the loop). Also rewrote the codex helper's review wiring to match the chat_completions path: - Computes _should_review_memory in the pre-loop block (was already being computed; now passed through to the helper as an arg). - Computes _should_review_skills AFTER the codex turn returns + counters tick (line ~15432 pattern in chat_completions). - Calls _spawn_background_review(messages_snapshot=, review_memory=, review_skills=) only when at least one trigger fires. - Adds the external memory provider sync (_sync_external_memory_for_turn) that the chat_completions path runs after every turn. Tests: Replaced the broken test_background_review_invoked (which only asserted spawn.called) with three sharper tests: - test_background_review_NOT_invoked_below_threshold: single turn at default thresholds → no review fires (would have caught the original 'every turn calls spawn with no args' bug) - test_background_review_skill_trigger_fires_above_threshold: 10 tool_iterations at threshold=10 → review fires with messages_snapshot=list, review_skills=True, counter resets - test_background_review_signature_never_breaks: regression guard asserting positional args are always empty and kwargs include messages_snapshot New TestReviewForkApiModeDowngrade class: - test_codex_app_server_parent_downgrades_review_fork: drives the real _spawn_background_review function (no mock at that level), asserts the review_agent gets api_mode='codex_responses' when the parent was codex_app_server. Live-validated against real run_conversation: - Counter ticked from 0 to 5 after a 5-tool-iteration turn - _spawn_background_review fired exactly once with kwargs-only signature - review_skills=True, review_memory=False - messages_snapshot was 12 entries (5 assistant tool_calls + 5 tool results + 1 final assistant + initial system/user) - Counter reset to 0 after fire 170 codex-runtime tests, all green. Docs: added a Self-improvement loop section to the codex runtime page explaining both how the trigger logic stays equivalent and that the review fork is auto-downgraded to codex_responses for the agent-loop tools. Also clarified that apply_patch and update_plan ARE codex's built-in tools (the previous version made it sound like they were separate from 'codex's stuff' — they're not, all five tools listed in 'What tools the model actually has' section 1 are codex built-ins). * feat(codex-runtime): expose kanban tools through Hermes MCP callback Kanban workers spawn as separate hermes chat -q subprocesses that read the user's config.yaml. If model.openai_runtime: codex_app_server is set globally (which is the whole point of opt-in), every dispatched worker ALSO comes up on the codex runtime. That mostly works — codex's built-in shell + apply_patch + update_plan do the actual task work fine — but it had one critical break: the worker handoff tools (kanban_complete, kanban_block, kanban_comment, kanban_heartbeat) are Hermes-registered tools, not codex built-ins. On the codex runtime, codex builds its own tool list and these never reach the model, so the worker would do the work but not be able to report back, hanging until the dispatcher's timeout escalates it as zombie. Fix: add all 9 kanban tools to the EXPOSED_TOOLS list in the Hermes MCP callback. They dispatch statelessly through handle_function_call() just like web_search and the others — they read HERMES_KANBAN_TASK from env (set by the dispatcher), gate correctly (worker tools require the env var, orchestrator tools require it unset), and write to ~/.hermes/kanban.db. Why kanban tools work via stateless dispatch when delegate_task/memory/ session_search/todo don't: those four are listed in _AGENT_LOOP_TOOLS (model_tools.py:493) and short-circuit in handle_function_call() with 'must be handled by the agent loop' — they need to mutate AIAgent's mid-loop state. Kanban tools have no such requirement; they're pure side-effect functions against the kanban.db plus state_meta. Tools exposed: Worker handoff (require HERMES_KANBAN_TASK): kanban_complete, kanban_block, kanban_comment, kanban_heartbeat Read-only board queries: kanban_show, kanban_list Orchestrator (require HERMES_KANBAN_TASK unset): kanban_create, kanban_unblock, kanban_link Tests: - test_kanban_worker_tools_exposed: complete/block/comment/heartbeat in EXPOSED_TOOLS (regression guard for the would-hang-worker bug) - test_kanban_orchestrator_tools_exposed: create/show/list/unblock/link Docs: - New 'Workflow features' section in the docs page covering /goal, kanban, and cron behavior on this runtime - /goal: works fully via run_conversation feedback; only caveat is approval-prompt noise on long writes-heavy goals (mitigated by the default :workspace permission profile) - Kanban: enumerated which tools are reachable via the callback and why the env var propagates correctly through the codex subprocess to the MCP server subprocess - Cron: documented as 'not specifically tested' — same rules as the CLI apply since cron runs through AIAgent.run_conversation - Trade-offs table gained rows for /goal, kanban worker, kanban orchestrator 172/172 codex-runtime tests green (+2 from kanban tests). * docs(codex-runtime): wire /codex-runtime into slash-commands ref + flag aux token cost Three docs gaps caught during a final audit: 1. /codex-runtime was only in the feature docs page, not in the slash-commands reference. Added rows to both the CLI section and the Messaging section so users discover it where they'd look for slash command syntax. 2. CODEX_HOME and HERMES_KANBAN_TASK weren't in environment-variables.md. CODEX_HOME lets users redirect Codex CLI's config dir (the migration honors it). HERMES_KANBAN_TASK is set by the kanban dispatcher and propagates to the codex subprocess + the hermes-tools MCP subprocess so kanban worker tools gate correctly — documented as 'don't set manually' since it's an internal handoff. 3. Aux client behavior on this runtime. When openai_runtime= codex_app_server is on with the openai-codex provider, every aux task (title generation, context compression, vision auto-detect, session search summarization, the background self-improvement review fork) flows through the user's ChatGPT subscription by default. This is true for the existing codex_responses path too, but it's more visible / important here because users explicitly opted in for subscription billing. Added a 'Auxiliary tasks and ChatGPT subscription token cost' section to the docs page with a YAML example showing how to override specific aux tasks to a cheaper model (typically google/gemini-3-flash-preview via OpenRouter). Also documents how the self-improvement review fork gets auto-downgraded from codex_app_server to codex_responses by the fix earlier in this PR. No code changes — pure docs. 172 codex-runtime tests still green. * docs+test(codex-runtime): pin HOME passthrough, document multi-profile + CODEX_HOME OpenClaw hit a real footgun in openclaw/openclaw#81562: when spawning codex app-server they were synthesizing a per-agent HOME alongside CODEX_HOME. That made every subprocess codex's shell tool launches (gh, git, aws, npm, gcloud, ...) see a fake $HOME and miss the user's real config files. They had to back it out in PR #81562 — keep CODEX_HOME isolation, leave HOME alone. Audit confirms Hermes' codex spawn doesn't have this problem. We do os.environ.copy() and only overlay CODEX_HOME (when provided) and RUST_LOG. HOME passes through unchanged. But it was an emergent property without a test pinning it, so adding a regression guard: test_spawn_env_preserves_HOME — confirms parent HOME survives intact in the subprocess env test_spawn_env_sets_CODEX_HOME_when_provided — confirms codex_home arg still isolates codex state correctly Docs additions: 'HOME environment variable passthrough' section — calls out the contract explicitly: CODEX_HOME isolates codex's own state, HOME stays user-real so gh/git/aws/npm/etc. find their normal config. Cites openclaw#81562 as the cautionary tale. 'Multi-profile / multi-tenant setups' section — addresses the related concern: profiles share ~/.codex/ by default. For users who want per-profile codex isolation (separate auth, separate plugins), documents the manual CODEX_HOME=<profile-scoped-dir> approach. Explains why we DON'T auto-scope CODEX_HOME per profile: doing so would silently invalidate existing codex login state for anyone upgrading to this PR with tokens already at ~/.codex/auth.json. Opt-in is safer than surprising users. 174 codex-runtime tests (+2 from HOME guards), all green. * fix(codex-runtime): TOML control-char escapes + atomic config.toml write Two footguns caught in a final audit pass before merge. Bug 1: TOML control characters not escaped The _format_toml_value() helper escaped backslashes and double quotes but passed literal control characters (\n, \t, \r, \f, \b) through unchanged. TOML basic strings don't allow literal control characters — a path or env var containing a newline would produce invalid TOML that codex refuses to load. Realistic exposure: pathological cases like a HERMES_HOME with a trailing newline (env var concatenation accident), or a PYTHONPATH with a tab from a multi-line shell heredoc. Fix: escape all five TOML basic-string control sequences (\b \t \n \f \r) in addition to \\ and \" that we already did. Order matters — backslash must come first or the other escapes get re-escaped. Bug 2: config.toml write wasn't atomic If the python process crashed between target.mkdir() and the write_text() finishing, a half-written config.toml could be left behind. On NFS / Windows / some FUSE mounts this is a real concern; on ext4/APFS small writes are usually atomic in practice but not guaranteed. Fix: write to a tempfile.mkstemp() temp file in the same directory, then Path.replace() (atomic same-dir rename on POSIX, ReplaceFile on Windows). On rename failure, clean up the temp file so repeated failed migrations don't pile up .config.toml.* files. Tests: - test_string_with_newline_escaped — \n in value → \n in output - test_string_with_tab_escaped — \t in value → \t in output - test_string_with_other_controls_escaped — \r, \f, \b - test_windows_path_escaped_correctly — backslash doubling - test_atomic_write_no_temp_leak_on_success — no .config.toml.* left over after a successful write - test_atomic_write_cleanup_on_rename_failure — temp file removed when Path.replace raises (simulated disk full) 180 codex-runtime tests, all green (+6 from this commit). Footguns audited but NOT fixed (with rationale): - Concurrent migrations race. Two Hermes processes hitting /codex-runtime codex_app_server within seconds of each other could cause one writer to lose entries. Low probability (you'd have to enable from two surfaces simultaneously) and low impact (just re-run migration). Adding fcntl/msvcrt locking is more code than it's worth here. The atomic rename above means each individual write is consistent — only the merge step is racy. - Codex protocol version drift. We pin MIN_CODEX_VERSION=0.125 and check at runtime but don't reject too-new versions. Right call — the protocol has been stable through 0.125 → 0.130. If OpenAI breaks it later we'd see the error in test_codex_app_server_runtime on CI before users hit it.	2026-05-13 17:18:15 -07:00
Teknium	9d42c2c286	feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 ) * feat(video_gen): unified video_generate tool with pluggable provider backends One core video_generate tool, every backend a plugin. Mirrors the image_gen + memory_provider + context_engine architecture: ABC, registry, plugin-context registration hook, and per-plugin model catalogs surfaced through hermes tools. Surface (one schema, every backend): - operation: generate / edit / extend - modalities: text-to-video (prompt only), image-to-video (prompt + image_url), video edit (prompt + video_url), video extend (video_url) - reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model override - Providers ignore unknown kwargs and declare what they support via VideoGenProvider.capabilities() — backend-specific quirks stay in the backend, the agent learns one tool Backends shipped: - plugins/video_gen/xai/ — Grok-Imagine, full generate/edit/extend + image-to-video + reference images (salvaged from PR #10600 by @Jaaneek, reshaped into the plugin interface) - plugins/video_gen/fal/ — Veo 3.1 (t2v + i2v), Kling O3 i2v, Pixverse v6 i2v with model-aware payload building that drops keys a model doesn't declare Wiring: - agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation, success_response / error_response, save_b64_video / save_bytes_video, $HERMES_HOME/cache/videos/ - agent/video_gen_registry.py — thread-safe register/get/list + get_active_provider() reading video_gen.provider from config.yaml - hermes_cli/plugins.py — PluginContext.register_video_gen_provider() - hermes_cli/tools_config.py — Video Generation category in hermes tools, plugin-only providers list, model picker per plugin, config write to video_gen.{provider,model} - toolsets.py — new video_gen toolset - tests: 31 new tests covering ABC, registry, tool dispatch, both plugins - docs: developer-guide/video-gen-provider-plugin.md (parallel to the image-gen guide), sidebar + toolsets-reference + plugin guides updated Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse), #10458 (provider categories), #10786 (xAI media+search bundle), #2984 (FAL duplicate), #19086 (Google Veo standalone — easy port to plugin interface). Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): dynamic schema reflects active backend's capabilities Address the 'capability variance' question — instead of one tool with a static schema that lies about what every backend supports, the video_generate tool now rebuilds its description at get_definitions() time based on the configured video_gen.provider and video_gen.model. The agent sees backend-specific guidance up-front: - 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is REQUIRED; text-only prompts will be rejected' - 'fal-ai/veo3.1' (t2v): no image_url restriction shown - xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7 reference_image_urls' - Backends without edit/extend: 'not supported on this backend — surface that they need to switch backends via hermes tools' This is the same pattern PR #22694 used for delegate_task self-capping — documented in the dynamic-tool-schemas skill. Cache invalidation is free: get_tool_definitions() already memoizes on config.yaml mtime, so a mid-session backend swap rebuilds the schema automatically. Tested: - Empirical FAL OpenAPI schema check confirms image-to-video models require image_url (FAL returns HTTP 422 otherwise) — client-side rejection in FALVideoGenProvider.generate() now prevents the wasted round-trip - Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches - 6 new tests cover the builder (no config / image-only / full-surface / text-only / unknown provider / registry wiring), all passing - 37/37 in the slice, 134/134 in the broader regression set * test(video_gen/xai): full surface integration tests + cleaner schema Verified end-to-end that the xAI plugin handles every documented mode from PR #10600's surface: text-to-video, image-to-video, reference-images-to-video, video edit, video extend (with and without prompt). All five modes route to the correct xAI endpoint (/videos/generations, /videos/edits, /videos/extensions) with the right payload shape (image / reference_images / video keys), and all five client-side rejections fire before the network: edit-without-prompt, extend-without-video_url, image+refs conflict, >7 references, and duration/aspect_ratio clamping. 15 new integration tests grouped into four classes (endpoint routing, modalities, validation, clamping). httpx is stubbed via a small fake AsyncClient that records POSTs so the tests assert the actual payload the plugin would send to xAI — not just the success/error envelope. Also cleaned up a description redundancy: when a model's operations match the backend's overall set, we no longer print the duplicate 'operations supported by this model' line. xAI's description now reads: Active backend: xAI . model: grok-imagine-video - operations supported by this backend: edit, extend, generate - modalities supported by this backend: image, reference_images, text - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16 - resolution choices: 480p, 720p - duration range: 1-15s - reference_image_urls: up to 7 images Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing Two design changes per Teknium: 1) Drop edit/extend from the tool surface entirely. Only text-to-video and image-to-video remain. The agent sees a clean tool with two modalities; backend-specific quirks like xAI's edit/extend endpoints stay out of the unified schema. 2) FAL: pick a model FAMILY once, the plugin routes between the family's text-to-video and image-to-video endpoints based on whether image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND 'fal-ai/veo3.1/image-to-video' as separate options — they pick 'veo3.1', and the plugin handles the rest. Catalog rewritten as families: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video / fal-ai/kling-video/o3/standard/image-to-video xAI uses a single endpoint (/videos/generations) for both modes, routed by the presence of the 'image' field in the payload — no edit/extend exposure. Schema changes: - VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params: prompt (required), image_url, reference_image_urls, duration, aspect_ratio, resolution, negative_prompt, audio, seed, model. - VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS, DEFAULT_OPERATION. capabilities() drops 'operations' key. - success_response: add 'modality' field ('text' \| 'image') so the agent and logs can see which endpoint was actually hit. Dynamic schema builder simplified — no operations bullet, no 'switch backends if you need edit/extend' guidance. When the active backend supports both modalities (the common case), description reads: Active backend: FAL . model: pixverse-v6 - supports both text-to-video (omit image_url) and image-to-video (pass image_url) - routes automatically - aspect_ratio choices: 16:9, 9:16, 1:1 - resolution choices: 360p, 540p, 720p, 1080p - duration range: 1-15s - audio: pass audio=true to enable native audio (pricing tier) - negative_prompt: supported Tests: 51 in the video_gen slice, 216 across the broader image+video sweep, all passing. New FAL routing tests prove pixverse-v6 + no image hits text-to-video endpoint, pixverse-v6 + image_url hits image-to-video endpoint, same for veo3.1 and kling-o3-standard. Docs updated: developer-guide page rewrites the 'model families' pattern as a first-class section so external plugin authors know the convention. toolsets-reference and toolsets.py descriptions match the new surface. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers Catalog now covers everything Teknium specced from FAL: Cheap tier: ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / image-to-video pixverse-v6 fal-ai/pixverse/v6/text-to-video / image-to-video Premium tier: veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video seedance-2.0 bytedance/seedance-2.0/text-to-video / image-to-video kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / image-to-video happy-horse fal-ai/happy-horse/text-to-video / image-to-video DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane defaults, both modalities) — better first-run UX for users who haven't explicitly picked a model. New family-entry knob: image_param_key. Kling v3 4K's image-to-video endpoint expects start_image_url instead of image_url; declaring image_param_key='start_image_url' on the family lets _build_payload remap correctly. Other families default to plain image_url. Per-family capability flags reflect each model's docs: - LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution enum exposed by FAL — let endpoint apply defaults) - Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported, negative prompts NOT supported per docs - Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative - Veo 3.1: unchanged, 16:9/9:16, 4/6/8s Tests: +5 covering the new families (full catalog, Kling 4K start_image_url remap, Seedance routing, LTX payload minimality, Happy Horse minimality). 56/56 in the slice green. Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes already has a direct xAI plugin that talks to xAI's own API; routing the same model through FAL's wrapper would duplicate the surface without adding capabilities. Users on FAL who want Grok-Imagine should use the xAI plugin directly; flag if you want both routes available. * test(video_gen): tool-surface routing matrix — every model x modality End-to-end matrix test driven through _handle_video_generate() — the actual function the agent's video_generate tool call lands in. Writes config.yaml, invokes the registered handler with a raw args dict, then asserts the outbound HTTP/SDK call hit the right endpoint with the right payload shape. Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new families as they're added (add a family to FAL_FAMILIES and you get both modalities tested for free). Coverage: - All 6 FAL families x {text-only, text+image} = 12 cases - xAI x {text-only, text+image} = 2 cases - tool-level model= arg overrides config = 2 cases For each case, verifies: - result['success'] is True - result['modality'] matches input shape ('text' if no image_url, 'image' otherwise) - outbound endpoint URL matches the family's text_endpoint or image_endpoint - text-only payloads carry no image-shaped keys - text+image payloads carry the family's image key (image_url for most, start_image_url for kling-v3-4k, wrapped 'image' object for xAI) All 16 cases passing. Confirms the tool surface routes every (provider, model, modality) combination correctly with zero leakage. * feat(video_gen): keep video_gen out of first-run setup, surface in status Two changes: 1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in the first-run toolset checklist. Video gen is niche, paid, and slow — most users don't want it nagging them during initial setup. Anyone who wants it opts in via 'hermes tools' -> Video Generation, which already routes to the provider+model picker. 2. The 'hermes setup' status panel learns about video_gen — but only shows the row when a plugin reports available. Users without FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of those keys see 'Video Generation (FAL) ✓' as confirmation it's wired. Verified live: - Fresh install (no creds): zero video_gen mentions in wizard. - With FAL_KEY: status row appears with active backend name. - 160/160 in the setup + tools_config + video_gen test slice. Rationale: image_gen is on by default because it's a featured creative tool used in casual chat (telegrams, etc). Video gen is heavier — long wait, paid per-second pricing. Default-off matches user intent better. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>	2026-05-13 16:39:41 -07:00
Teknium	80c4b27437	docs(lsp): document follow-up fixes from #24630 (#24709 ) - Note that typescript-language-server pulls in the typescript SDK automatically (peer-dep relationship was previously implicit and caused initialize failures when the SDK was absent). - Add a Troubleshooting entry for the new Backend warnings section in hermes lsp status, with the shellcheck install commands across apt / brew / scoop. Reflects what shipped in PR #24630.	2026-05-12 18:44:33 -07:00
Teknium	29d7c244c5	feat(gateway): wire clarify tool with inline keyboard buttons on Telegram (#24199 ) The clarify tool returned 'not available in this execution context' for every gateway-mode agent because gateway/run.py never passed clarify_callback into the AIAgent constructor. Schema actively encouraged calling it; users never saw the question. Changes: - tools/clarify_gateway.py — new event-based primitive mirroring tools/approval.py: register/wait_for_response/resolve_gateway_clarify with per-session FIFO, threading.Event blocking with 1s heartbeat slices (so the inactivity watchdog keeps ticking), and clear_session for boundary cleanup. - gateway/platforms/base.py — abstract send_clarify with a numbered-text fallback so every adapter (Discord, Slack, WhatsApp, Signal, Matrix, etc.) gets a working clarify out of the box. Plus an active-session bypass: when the agent is blocked on a text-awaiting clarify, the next non-command message routes inline to the runner's intercept instead of being queued + triggering an interrupt. Same shape as the /approve deadlock fix from PR #4926. - gateway/platforms/telegram.py — concrete send_clarify renders one inline button per choice plus '✏️ Other (type answer)'. cl: callback handler resolves numeric choices immediately, flips to text-capture mode for Other, with the same authorization guards as exec/slash approvals. - gateway/run.py — clarify_callback wired at the cached-agent per-turn callback assignment site (only the user-facing agent path; cron and hygiene-compress agents have no human attached). Bridges sync→async via run_coroutine_threadsafe, blocks with the configured timeout, and returns a '[user did not respond within Xm]' sentinel on timeout so the agent adapts rather than pinning the running-agent guard. Text- intercept added to _handle_message before slash-confirm intercept (skipping slash commands). clear_session called in the run's finally to cancel any orphan entries. - hermes_cli/config.py — agent.clarify_timeout default 600s. - website/docs/user-guide/messaging/telegram.md — Interactive Prompts section. Tests: - tests/tools/test_clarify_gateway.py (14 tests) — full primitive coverage: button resolve, open-ended auto-await, Other flip, timeout None, unknown-id idempotency, clear_session cancellation, FIFO ordering, register/unregister notify, config default. - tests/gateway/test_telegram_clarify_buttons.py (12 tests) — render paths (multi-choice/open-ended/long-label/HTML-escape/not-connected), callback dispatch (numeric resolve/Other flip/already-resolved/ unauthorized/invalid-token), and base-adapter text fallback. Out of scope: bot-to-bot, guest mode, checklists, poll media, live photos. Closes #24191.	2026-05-12 16:33:33 -07:00
Teknium	83b93898c2	feat(lsp): semantic diagnostics from real language servers in write_file/patch (#24168 ) * feat(lsp): semantic diagnostics from real language servers in write_file/patch Wire ~26 language servers (pyright, gopls, rust-analyzer, typescript-language-server, clangd, bash-language-server, ...) into the post-write lint check used by write_file and patch. The model now sees type errors, undefined names, missing imports, and project-wide semantic issues introduced by its edits, not just syntax errors. LSP is gated on git workspace detection: when the agent's cwd or the file being edited is inside a git worktree, LSP runs against that workspace; otherwise the existing in-process syntax checks are the only tier. This keeps users on user-home cwds (Telegram/Discord gateway chats) from spawning daemons. The post-write check is layered: in-process syntax check first (microseconds), then LSP semantic diagnostics second when syntax is clean. Diagnostics are delta-filtered against a baseline captured at write start, so the agent only sees errors its edit introduced. A flaky/missing language server can never break a write -- every LSP failure path falls back silently to the syntax-only result. New module agent/lsp/ split into: - protocol.py: Content-Length JSON-RPC framer + envelope helpers - client.py: async LSPClient (spawn, initialize, didOpen/didChange, ContentModified retry, push/pull diagnostic stores) - workspace.py: git worktree walk-up + per-server NearestRoot resolver - servers.py: registry of 26 language servers (extension match, root resolver, spawn builder per language) - install.py: auto-install dispatch (npm install --prefix, go install with GOBIN, pip install --target) into HERMES_HOME/lsp/bin/ - manager.py: LSPService (per-(server_id, root) client registry, lazy spawn, broken-set, in-flight dedupe, sync facade for tools layer) - reporter.py: <diagnostics> block formatter (severity-1-only, 20-per-file) - cli.py: hermes lsp {status,list,install,install-all,restart,which} Wired into tools/file_operations.py: - write_file/patch_replace now call _snapshot_lsp_baseline before write - _check_lint_delta gains a third tier: LSP semantic diagnostics when syntax is clean - All LSP code paths swallow exceptions; write_file's contract unchanged Config: 'lsp' section in DEFAULT_CONFIG with enabled (default true), wait_mode, wait_timeout, install_strategy (default 'auto'), and per-server overrides (disabled, command, env, initialization_options). Tests: tests/agent/lsp/ -- 49 tests covering protocol framing (encode and read_message round-trip, EOF/truncation/missing Content-Length), workspace gate (git walk-up, exclude markers, fallback to file location), reporter (severity filter, max-per-file cap, truncation), service-level delta filter, and an in-process mock LSP server that exercises the full client lifecycle including didChange version bumps, dedup, crash recovery, and idempotent teardown. Live E2E verified end-to-end through ShellFileOperations: pyright auto-installed via npm into HERMES_HOME, baseline captured, type error introduced, single delta diagnostic surfaced with correct line/column/code/ source, then patch fix removes the diagnostic from the output. Docs: new website/docs/user-guide/features/lsp.md page covering supported languages, configuration knobs, performance characteristics, and troubleshooting; cli-commands.md updated with the 'hermes lsp' reference; sidebar updated. * feat(lsp): structured logging, backend gate, defensive walk caps Cherry-picks the substantive ideas from #24155 (different scope, same problem space) onto our PR. agent/lsp/eventlog.py (new): dedicated structured logger ``hermes.lint.lsp`` with steady-state silence. Module-level dedup sets keep a 1000-write session at exactly ONE INFO line ("active for <root>") at the default INFO threshold; clean writes log at DEBUG so they never reach agent.log under normal config. State transitions (server starts, no project root for a file, server unavailable) fire at INFO/WARNING once per (server_id, key); novel events (timeouts, unexpected errors) fire WARNING per call. Grep recipe: ``rg 'lsp\\['``. agent/lsp/manager.py: wire the eventlog into _get_or_spawn and get_diagnostics_sync so users can answer "did LSP fire on this edit?" with a single grep, plus surface "binary not on PATH" warnings once instead of silently retrying every write. tools/file_operations.py: backend-type gate. ``_lsp_local_only()`` returns False for non-local backends (Docker / Modal / SSH / Daytona); ``_snapshot_lsp_baseline`` and ``_maybe_lsp_diagnostics`` now skip entirely on remote envs. The host-side language server can't see files inside a sandbox, so this prevents pretending to lint a file the host process can't open. agent/lsp/protocol.py: 8 KiB cap on the header block in ``read_message``. A pathological server that streams headers without ever emitting CRLF-CRLF would have looped forever consuming bytes; now raises ``LSPProtocolError`` instead. agent/lsp/workspace.py: 64-step cap on ``find_git_worktree`` and ``nearest_root`` upward walks, plus try/except containment around ``Path(...).resolve()`` and child ``.exists()`` calls. Defensive against pathological inputs (symlink loops, encoding errors, permission failures mid-walk) — the lint hook is hot-path code and must never raise. Tests: - tests/agent/lsp/test_eventlog.py: 18 tests covering steady-state silence (clean writes stay DEBUG), state-transition INFO-once semantics (active for, no project root), action-required WARNING-once (server unavailable), per-call WARNING (timeouts, spawn failures), and the "1000 clean writes => 1 INFO" contract. - tests/agent/lsp/test_backend_gate.py: 5 tests verifying _lsp_local_only / snapshot_baseline / maybe_lsp_diagnostics skip the LSP layer for non-local backends and route correctly for LocalEnvironment. - tests/agent/lsp/test_protocol.py: new test_read_message_rejects_runaway_header exercising the 8 KiB cap. Validation: - 73/73 LSP tests pass (49 original + 18 eventlog + 5 backend-gate + 1 framer cap) - 198/198 pass when run alongside existing file_operations tests - Live E2E re-run with pyright still surfaces "ERROR [2:12] Type ... reportReturnType (Pyright)" through the full path, then patch fix removes it on the next call. * feat(lsp): atexit cleanup + separate lsp_diagnostics JSON field Two improvements salvaged from #24414's plugin-form alternative, keeping our core-integrated design: 1. atexit cleanup of spawned language servers ---------------------------------------------------------------- ``agent/lsp/__init__.get_service`` now registers an ``atexit`` handler on first creation that tears down the LSPService on Python exit. Without this, every ``hermes chat`` exit was leaking pyright/gopls/etc. processes for a few seconds while their stdout buffers drained -- they got reaped by the kernel eventually but a watchful ``ps aux`` would catch them. The handler runs once per process (gated by ``_atexit_registered``); idempotent ``shutdown_service`` ensures double-fire is a no-op. Errors during shutdown are swallowed at debug level since by the time atexit fires the user has already seen the agent's final response. 2. Separate ``lsp_diagnostics`` field on WriteResult / PatchResult ---------------------------------------------------------------- Previously the LSP layer folded its diagnostic block into the ``lint.output`` string, conflating the syntax-check tier with the semantic tier. The agent (and any downstream parsers) now read syntax errors and semantic errors as independent signals: { "bytes_written": 42, "lint": {"status": "ok", "output": ""}, "lsp_diagnostics": "<diagnostics file=...>\nERROR [2:12] ..." } ``_check_lint_delta`` returns to its original two-tier shape (syntax check + delta filter); ``write_file`` and ``patch_replace`` independently fetch LSP diagnostics via ``_maybe_lsp_diagnostics`` and pass them into the new field. ``patch_replace`` propagates the inner write_file's ``lsp_diagnostics`` so the outer PatchResult carries the patch's delta correctly. Tests: 19 new - tests/agent/lsp/test_lifecycle.py (8 tests): atexit registration fires once and only once across N get_service calls; the registered callable is our internal shutdown wrapper; shutdown_service is idempotent and safe when never started; exceptions during shutdown are swallowed; inactive service is cached so we don't rebuild on every check. - tests/agent/lsp/test_diagnostics_field.py (11 tests): WriteResult / PatchResult dataclass shape, to_dict include/omit semantics, channel separation (lint and lsp_diagnostics carry independent signals), write_file populates the field via _maybe_lsp_diagnostics only when the syntax tier is clean, patch_replace propagates the field forward from its internal write_file. Validation: - 92/92 LSP tests pass (73 prior + 8 lifecycle + 11 diagnostics field) - 217/217 pass with file_operations + LSP combined - Live E2E reverified: clean writes -> both fields empty/none; type error introduced -> lint clean (parses), lsp_diagnostics carries the pyright reportReturnType block; patch fix -> both fields clean again. * fix(lsp): broken-set short-circuit so a wedged server isn't paid every write Discovered while auditing failure paths: a language server binary that hangs (sleep forever, no LSP traffic on stdin/stdout) caused EVERY subsequent write to re-pay the 8s snapshot_baseline timeout. Five writes = ~64s of dead time. The bug: ``_get_or_spawn`` adds the (server_id, root) pair to ``_broken`` inside its inner exception handler, but when the OUTER ``_loop.run`` timeout fires, it cancels the inner task before that handler runs. The pair never makes it to broken-set, so the next write re-enters the spawn path and re-pays the timeout. Fix: - New ``_mark_broken_for_file`` helper at the service layer marks the (server_id, workspace_root) pair broken from the OUTSIDE when the outer timeout fires. Called from the except branches in ``snapshot_baseline``, ``get_diagnostics_sync`` (asyncio.TimeoutError + generic Exception). Also kills any orphan client process that survived the cancelled future, fire-and-forget with a 1s ceiling. - ``enabled_for`` now consults the broken-set BEFORE returning True. Files in already-broken (server_id, root) pairs short-circuit to False, so the file_operations layer skips the LSP path entirely with no spawn cost. Until the service is restarted (``hermes lsp restart``) or the process exits. - A single eventlog WARNING is emitted on first mark-broken so the user knows which server gave up. Subsequent edits in the same project stay silent. Tests: 7 new in tests/agent/lsp/test_broken_set.py — covers the key shape (server_id, per_server_root), enabled_for short-circuit, sibling-file skip in same project, project isolation (broken in A doesn't affect B), graceful no-op for missing-server / no-workspace, and an end-to-end test that snapshots after a failure and verifies the next ``enabled_for`` returns False. Validation: - Live retest of the wedged-binary scenario: 5 sequential writes, first 8.88s (the one snapshot timeout), subsequent four ~0.84s (no LSP cost). Down from 5x12.85s = 64s before this fix. - 99/99 LSP tests pass (92 prior + 7 broken-set) - 224/224 pass with file_operations + LSP combined - Happy path E2E reverified — clean write, type error introduced, patch fix all behave correctly with the new broken-set logic. Note: the FIRST write to a wedged binary still pays 8s (the snapshot_baseline timeout). We could shorten that, but pyright/ tsserver normally take 2-3s and slow CI rust-analyzer can need 5+ seconds, so 8s is the conservative ceiling. Subsequent writes are instant.	2026-05-12 16:31:54 -07:00

1 2 3 4 5 ...

790 commits