hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-14 14:12:44 +00:00

Author	SHA1	Message	Date
Ben	c598076b76	test(dashboard-auth): strip HERMES_DASHBOARD_OAUTH_* env vars in hermetic fixture When these vars are set in the developer's shell, every /api/status call triggers load_gateway_config() -> discover_plugins() -> the bundled dashboard_auth/nous plugin auto-registers itself, leaking a provider into the registry across tests on the same xdist worker. That breaks assertions like 'auth_providers == []' (loopback) and '== ["stub"]' (gated) in test_dashboard_auth_status_endpoint.py. CI never has these set, so this only surfaced locally -- exactly the hermeticity gap _hermetic_environment is meant to close. Add them to _HERMES_BEHAVIORAL_VARS so the autouse fixture strips them, and to the unset list in scripts/run_tests.sh as belt-and-suspenders for direct pytest invocations.	2026-05-27 02:12:27 -07:00
Ben	a498485631	feat(dashboard-auth-nous): surface token iss/aud in verification-failure error When jwt.decode raises InvalidTokenError, decode the token a second time without signature verification (safe — we never trust the values, just display them) and append the actual iss/aud claims plus our configured expected values to the error message. Lets operators see config drift between HERMES_DASHBOARD_PORTAL_URL / HERMES_DASHBOARD_OAUTH_CLIENT_ID and what Portal is actually emitting without having to hand-decode the JWT from the browser cookie.	2026-05-27 02:12:27 -07:00
Ben	b3dc539304	feat(dashboard-auth): Nous plugin always-on; default portal URL; specific error messages The Nous OAuth provider plugin (plugins/dashboard_auth/nous) is bundled and auto-loaded — same as before — but previously refused to register unless BOTH HERMES_DASHBOARD_OAUTH_CLIENT_ID and HERMES_DASHBOARD_PORTAL_URL were set, then the gate's fail-closed branch told the operator 'install the default Nous provider'. That message is misleading: the provider IS installed; it's just unconfigured. And the contract only really needs the per-instance client_id — the portal URL is the same for everyone in production. Three changes: 1. plugins/dashboard_auth/nous/__init__.py: - HERMES_DASHBOARD_PORTAL_URL is now optional and defaults to 'https://portal.nousresearch.com'. Override only for staging (portal.rewbs.uk) or a custom deployment. Empty string also falls back to the default so an empty Fly secret can't point the dashboard at nowhere. - Plugin exposes a module-level LAST_SKIP_REASON: str that the gate reads when no providers register. Cleared on each register() call. Skip reasons are human-readable and actionable ('HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. The Nous Portal provisions this env var…'). 2. plugins/dashboard_auth/nous/plugin.yaml: - requires_env drops HERMES_DASHBOARD_PORTAL_URL; only the client_id is mandatory. Description updated to reflect this. 3. hermes_cli/web_server.py: - When the gate fail-closes for 'no providers', it now reads each bundled plugin's LAST_SKIP_REASON and embeds them in the SystemExit message. Operator sees the specific config fix needed: Bundled providers reported these issues: • nous: HERMES_DASHBOARD_OAUTH_CLIENT_ID is not set. … instead of the prior generic 'Install the default Nous provider'. Tests: - TestPluginRegister rewritten to assert the new defaults + LAST_SKIP_REASON contents (6 tests, +1 new for empty-string env). - New gate test test_start_server_surfaces_nous_skip_reason_when_unconfigured. - test_get_method_is_not_allowed widened to handle the SPA-shell 200 path explicitly — assertion now verifies no JSON ticket leaks rather than asserting a specific status code (covers all four of 401/404/405/200). Docs updated: web-dashboard.md's 'Default provider' section now shows the env-var table with required/optional columns and embeds the fail-closed error message verbatim so operators can match what they see at the prompt.	2026-05-27 02:12:27 -07:00
Ben	2fc4615fc4	feat(dashboard-auth): Phase 7 — SPA AuthWidget + /api/status auth fields Phase 7 surfaces the OAuth gate state to users. web/src/components/AuthWidget.tsx (new): Sidebar widget that fetches /api/auth/me on mount and renders a compact 'Logged in as <user_id…> via <provider>' row with a logout icon. Contract V1 (Nous Portal) emits no email/display_name claims, so user_id is the display value (truncated to 14 chars + ellipsis); display_name and email fallthroughs are forward-compat for OQ-C1. Renders nothing on 401 from /api/auth/me — that's the signal the gate isn't engaged (loopback mode), in which case the widget would be confusing. Logout POSTs /auth/logout (which clears cookies + redirects to /login) then full-page-navigates to /login itself; the SPA's fetch wrapper doesn't follow that redirect, so the navigation is explicit. web/src/App.tsx: mounts <AuthWidget /> above <SidebarFooter />. Component is self-hiding in loopback mode so there's no need for a conditional mount. web/src/lib/api.ts: - getAuthMe() + logout() helpers - AuthMeResponse type - StatusResponse gets optional auth_required + auth_providers fields so the existing StatusPage can render a gated/loopback badge. hermes_cli/web_server.py: /api/status payload now includes - auth_required: bool — whether app.state.auth_required is True - auth_providers: list[str] — registered DashboardAuthProvider names Lazy-imports list_providers so early-startup status calls don't crash if the dashboard_auth module is still being set up. tests/hermes_cli/test_dashboard_auth_status_endpoint.py: 3 new tests covering the new status fields in both gated and loopback modes plus a regression that no existing field got dropped from the payload. The hermes status CLI is unchanged in this commit — that command tracks model providers + OAuth credentials, not running-dashboard state. The /api/status endpoint is the canonical place to query dashboard auth-gate state, consumed by the React StatusPage already.	2026-05-27 02:12:27 -07:00
Ben	5e9308b5b8	feat(dashboard-auth): Phase 6 — 401 re-auth envelope + next= propagation Contract V1 of nous-account-service PR #180 ships no refresh tokens, so the original Phase 6 silent-refresh design is replaced with a thinner '401 → redirect to /login' UX. The dashboard's gated middleware now emits a structured envelope on any auth failure; the SPA's fetch wrapper sees it and full-page-navigates the user through re-auth. hermes_cli/dashboard_auth/cookies.py: set_session_cookies(refresh_token='') SKIPS writing the hermes_session_rt cookie. Forward-compat: a non-empty refresh_token still emits the cookie unchanged, so a future Portal contract that starts issuing RTs flips the persistence on with no other change. clear_session_cookies still emits a Max-Age=0 deletion for the RT cookie so stale cookies from earlier deployments get flushed on logout / session expiry. Deprecation marker + rationale in module docstring per the user's docstring-only deprecation pattern. hermes_cli/dashboard_auth/middleware.py: _unauth_response now builds a structured JSON envelope for API 401s: { error: 'session_expired' \| 'unauthenticated', detail: 'Unauthorized', reason: <internal>, login_url: '/login?next=<safe-path>' } HTML redirects also carry next= so a user landing on /sessions without a cookie bounces back to /sessions after re-auth. _safe_next_target validates same-origin: drops protocol-relative paths (//evil.com), absolute URLs, and any /login or /auth/* loop. Dead cookies are cleared on the 401 path so the browser stops replaying invalid tokens. hermes_cli/dashboard_auth/routes.py: /auth/callback accepts next= query param and validates via _validate_post_login_target (same rules as the gate's _safe_next_target — defence-in-depth because next= survived a full IDP round trip and attacker-controlled state can re-enter via the callback URL). Open-redirect attempts land at '/' instead. web/src/lib/api.ts: fetchJSON parses the 401 envelope and full-page-navigates to body.login_url ONLY on the known session-expiry error codes. Domain-level 401s (e.g. permission errors) bubble up as regular errors. credentials: 'include' added so cookie auth works for all fetches routed through this wrapper. sessionStorage.lastLocation is preserved for future use by AuthWidget / hermes_status. Test files marked with pytest.mark.xdist_group so the four files that mutate web_server.app.state.auth_required serialize onto the same xdist worker — eliminates 'works locally, fails in CI' app-state bleed. 20 new tests in test_dashboard_auth_401_reauth.py: - set_session_cookies(refresh_token='') skips RT cookie - clear_session_cookies still emits RT deletion - 401 envelope shape (unauthenticated vs session_expired) - dead cookie cleared on invalid-token 401 - login_url carries next= for deep paths - login loop avoided when path is /login/auth/api-auth - protocol-relative URL rejected - _safe_next_target unit tests (accept same-origin, reject loops/abs) - /auth/callback respects safe next= but rejects open redirects 2 pre-existing tests updated to accept the new /login?next=%2F shape. Full dashboard-auth suite: 168 passed, 1 skipped (Phase 0 pre-existing).	2026-05-27 02:12:27 -07:00
Ben	b2360ba44e	feat(dashboard-auth): _ws_auth_ok helper + ticket auth on all 4 WS endpoints Phase 5 task 5.2. Four WebSocket endpoints — /api/pty, /api/ws, /api/pub, /api/events — previously authed with the same constant-time check against `_SESSION_TOKEN`. Replaced with a single helper that branches on `app.state.auth_required`: Loopback / --insecure: legacy ?token=<_SESSION_TOKEN> path (unchanged). Gated: ?ticket=<single-use> consumed against the dashboard-auth ticket store. Critical security property: gated mode UNCONDITIONALLY rejects the ?token= path. A leaked _SESSION_TOKEN value from a log line is not replayable for WS access in gated deployments. `_build_sidecar_url` now branches too: loopback uses the legacy token; gated mode mints a server-internal ticket via mint_ticket() with pseudo-user 'pty-sidecar' / provider 'server-internal' so audit logs can distinguish PTY-internal sidecar tickets from browser tickets. PTY children open /api/pub exactly once at startup so single-use suffices. Ticket rejections audit-log as WS_TICKET_REJECTED with truncated reason + client IP + WS path. Operators debugging 'WS keeps closing' issues see which endpoint and why. 17 new tests: - POST /api/auth/ws-ticket: 200 with cookie, 401/302 without, distinct per call, GET-not-allowed. - _ws_auth_ok loopback: token accept/reject, missing-token reject, ticket-param-ignored. - _ws_auth_ok gated: ticket accept, single-use rejection, unknown reject, legacy-token-rejected-in-gated assertion, audit-log emission. - _build_sidecar_url: loopback uses token=, gated uses ticket=, no-bound returns None.	2026-05-27 02:12:27 -07:00
Ben	b69fce9c86	feat(dashboard-auth): single-use WS tickets + POST /api/auth/ws-ticket Phase 5 task 5.1. Browsers cannot set Authorization on a WebSocket upgrade, so in gated mode the SPA needs an alternative way to bind the upgrade to its authenticated session. hermes_cli/dashboard_auth/ws_tickets.py — in-memory single-use ticket store with 30s TTL. Thread-safe (threading.Lock), token_urlsafe(32) values, ticket value truncated to 8 chars in error messages for log hygiene. Module-level state with _reset_for_tests() helper. hermes_cli/dashboard_auth/routes.py — adds POST /api/auth/ws-ticket. Auth-required (the gate middleware already attaches Session to request.state.session). Returns {ticket, ttl_seconds}; emits WS_TICKET_MINTED audit event with user_id + provider + ip. hermes_cli/dashboard_auth/audit.py — adds WS_TICKET_REJECTED enum value for the consume-side rejection event (wired into the WS endpoints in task 5.2). 11 new tests covering round-trip, single-use, TTL boundary, unknown ticket rejection, secret-hygiene truncation in error messages, and concurrent mint+consume from 20 threads.	2026-05-27 02:12:27 -07:00
Ben	848baeb0a8	feat(dashboard-auth): plugins/dashboard_auth/nous — contract-compliant Nous OAuth provider Bundled, kind=backend, auto-loads. Activates ONLY when Portal-injected env vars are present: HERMES_DASHBOARD_OAUTH_CLIENT_ID — agent:{instance_id} HERMES_DASHBOARD_PORTAL_URL — Portal base URL Loopback / --insecure operators leave both unset and never see this plugin register anything. The fail-closed branch in start_server handles the 'public bind + zero providers' case independently. Implementation follows nous-account-service PR #180's published OAuth contract verbatim: - client_id is per-instance (agent:{instance_id}); the suffix is cross-checked against the token's agent_instance_id claim as defense-in-depth (contract C9). - scope is agent_dashboard:access only (contract C3). - aud is the bare client_id, no hermes-cli: prefix (contract C2). - RS256 JWT verification against /.well-known/jwks.json with 5-minute cache (contract C7). - No refresh tokens in V1: refresh_session always raises RefreshExpiredError; revoke_session is a no-op (contract C5). - oauth_contract_version claim: missing → warn + proceed; present and != 1 → refuse (contract C11, OQ-C2 tolerant treatment). - redirect_uri validated client-side as defense before bouncing to Portal; authoritative check is server-side per agent-redirect-uri.ts. 41 new tests covering construction, plugin-entry env gating, start_login shape, complete_login httpx-mocked happy path + error mapping, verify_session JWT verification (RSA keypair fixture, full claim-check matrix), refresh_session always raising, revoke_session no-op. PyJWT + cryptography are already in the venv (jose was previously suggested; switched to pyjwt[crypto] since the latter is already pulled in transitively).	2026-05-27 02:12:27 -07:00
Ben	53736b3922	feat(dashboard-auth): fail-closed on no providers; proxy_headers when gated; suppress _SESSION_TOKEN injection Phase 3, Task 3.5. Three changes to web_server.py: 1. start_server replaces the legacy SystemExit-refusing-to-bind guard with: if app.state.auth_required and no providers registered, exit with a clear message; otherwise log the gate-on banner. --insecure keeps its existing behaviour. 2. uvicorn proxy_headers flag is computed from app.state.auth_required. Loopback / --insecure keep it False (so _ws_client_is_allowed sees the real peer for the loopback gate); gated mode flips it True so X-Forwarded-Proto from Fly's TLS terminator is honoured for cookie Secure-flag decisions in detect_https(). 3. _serve_index no longer injects window.__HERMES_SESSION_TOKEN__ when the gate is on — the SPA reads identity from /api/auth/me using cookie auth instead. window.__HERMES_AUTH_REQUIRED__ flag lets the SPA pick between ticket-auth (gated) and token-auth (loopback) for /api/pty + /api/ws (Phase 5 will wire this in the React layer). 4 new behavioural tests; loopback regression harness still green.	2026-05-27 02:12:27 -07:00
Ben	5b17eab67a	feat(dashboard-auth): auth gate middleware + /auth/* routes + /login HTML Phase 3, Tasks 3.2 + 3.3 + 3.4. These three pieces are mutually dependent so they land together. middleware.py - gated_auth_middleware engages when app.state.auth_required is True. Allowlists /login, /auth/, /api/auth/providers, and static asset paths; everything else demands a valid session_at cookie. Verifies by trying every registered provider's verify_session in turn (multi- provider stack); attaches verified Session to request.state.session. Returns 401 JSON for /api/ and 302 -> /login for HTML. ProviderError during verify -> 503. routes.py - APIRouter with: GET /login server-rendered HTML GET /auth/login?provider=N 302 to IDP + PKCE cookie GET /auth/callback?code,state completes login, sets session cookies POST /auth/logout clears cookies + best-effort revoke GET /api/auth/providers public bootstrap endpoint (503 if zero) GET /api/auth/me verified session as JSON (auth-required) login_page.py - Inline-CSS HTML template, no React, no JavaScript. web_server.py - Mounted gated_auth_middleware between host_header and auth_middleware (FastAPI runs middlewares in registration order: host check -> cookie auth -> token auth). auth_middleware short-circuits when auth_required so cookie auth is authoritative in gated mode. Router is included before mount_spa so the catch-all doesn't swallow /login or /auth/*. 17 new behavioural tests; loopback regression harness still green.	2026-05-27 02:12:27 -07:00
Ben	a30c4d8ebd	feat(dashboard-auth): cookie helpers for session_at/session_rt/pkce Phase 3, Task 3.1. Three cookies: - hermes_session_at: OAuth access token (HttpOnly, TTL = token TTL) - hermes_session_rt: OAuth refresh token (HttpOnly, 30d max-age) - hermes_session_pkce: PKCE state + verifier + provider hint (10min) All SameSite=Lax + Path=/. Secure flag is set ONLY when the request scheme is https — uvicorn proxy_headers=True (enabled in gated mode at Phase 3.5) rewrites scheme from X-Forwarded-Proto so Fly's TLS terminator works.	2026-05-27 02:12:27 -07:00
Ben	628a52fce2	test(dashboard-auth): stub auth provider for E2E gate testing Phase 2, Task 2.1. Self-contained fake IDP — start_login redirects straight back to {redirect_uri}?code=stub_code&state=<s> so tests can walk the OAuth round trip in-process. Tokens are HMAC-signed JSON blobs (not real JWTs) — enough structure for verify_session to detect tamper and expiry without pulling in pyjwt. Lives in tests/ only — never registered as a real plugin. Phase 3's end-to-end tests import StubAuthProvider directly. Convention: exp <= now counts as expired (TTL=0 means born-expired) — matches what Phase 6's silent-refresh test will need.	2026-05-27 02:12:27 -07:00
Ben	865cae4f61	feat(dashboard-auth): json-lines audit log at $HERMES_HOME/logs/dashboard-auth.log Phase 1, Task 1.4. Records every auth event (login start/success/failure, logout, refresh success/failure, revoke, session verify failure, WS ticket mint) as one JSON object per line. Token-like kwargs (access_token, refresh_token, code, code_verifier, state, ticket, cookie, Authorization) are dropped before serialisation so the log never contains live secrets. Write failures log at WARNING but never raise — auth flows must not fail because the audit logger broke.	2026-05-27 02:12:27 -07:00
Ben	c32b17f557	feat(plugins): add register_dashboard_auth_provider hook on PluginContext Phase 1, Task 1.3. Mirrors the existing register_image_gen_provider pattern (plugins.py:531) — wrong-type or duplicate-name registrations log at WARNING and silently return rather than raising, so a misbehaving auth plugin cannot crash the host. Deviation from plan: the plan's draft raised TypeError on non-provider input; switched to silent-warn to match the established image_gen convention. Test updated to match.	2026-05-27 02:12:27 -07:00
Ben	1bbfed70c4	test(dashboard-auth): cover registry register/get/list/clear semantics Phase 1, Task 1.2. Verifies registration order is preserved, duplicate names are rejected with ValueError, and non-compliant providers fail at register time (not later when the middleware tries to dispatch).	2026-05-27 02:12:27 -07:00
Ben	2dc6d03a3d	feat(dashboard-auth): define DashboardAuthProvider ABC + Session dataclass Phase 1, Task 1.1. New package hermes_cli/dashboard_auth/ contains: base.py - DashboardAuthProvider ABC with 5 abstract methods (start_login, complete_login, verify_session, refresh_session, revoke_session), Session + LoginStart frozen dataclasses, three exception types (ProviderError / InvalidCodeError / RefreshExpiredError), and assert_protocol_compliance() for plugins to call in their own tests. registry.py - Module-level register/get/list/clear with a lock. Nothing reads the registry yet — Phase 2 adds the StubAuthProvider and Phase 3 wires the gate middleware. The plugin hook lands in Task 1.3.	2026-05-27 02:12:27 -07:00
Ben	949ad95e4b	feat(dashboard): stash auth_required flag on app.state Phase 0, Task 0.3. start_server now computes should_require_auth(host, allow_public) and records it on app.state.auth_required BEFORE the existing legacy SystemExit guard fires. This gives middleware, the SPA token-injection path, and WS endpoints a consistent read source for 'is the gate active'. The flag is set but no one reads it yet — Phase 3 registers the gate middleware. Note: 4 pre-existing test failures in tests/hermes_cli/test_web_server.py (PtyWebSocket) + test_update_hangup_protection.py reproduce on pristine HEAD and are unrelated to this change (starlette TestClient WS regression).	2026-05-27 02:12:27 -07:00
Ben	8773bbf186	feat(dashboard): add should_require_auth predicate for OAuth gate Phase 0, Task 0.2. Single source of truth for 'is the auth gate active?'. Reuses the existing _LOOPBACK_HOST_VALUES frozenset so this stays in sync with the DNS-rebinding host-header check. RFC1918/CGNAT/link-local are treated as public — exact threat model the gate exists for.	2026-05-27 02:12:27 -07:00
Ben	f2b479e7a2	test(dashboard): pin current loopback auth behavior as regression harness Phase 0, Task 0.1 of the dashboard-oauth plan. Establishes a baseline for the loopback dashboard's auth surface so future phases can prove they didn't regress the existing _SESSION_TOKEN flow when adding the OAuth gate.	2026-05-27 02:12:27 -07:00
Teknium	249534e472	plugins: add security-guidance — pattern-matched warnings on dangerous code writes (#33131 ) New opt-in plugin that scans the content passed to write_file / patch / skill_manage for 25 known-dangerous code patterns — pickle.load, yaml.load, eval(, os.system, subprocess(shell=True), child_process.exec, dangerouslySetInnerHTML, innerHTML/outerHTML/document.write/ insertAdjacentHTML, crypto.createCipher (no IV), AES ECB, TLS verification disabled, XXE-prone xml.etree/minidom parsers, <script src=//...> without SRI, torch.load without weights_only=True, GitHub Actions ${{ github.event.* }} injection — and appends a "Security guidance" warning block to the tool result via the transform_tool_result hook. Default behaviour is non-blocking: the file is written and the warning rides back to the model in the next turn so it can self-correct or document why the construct is safe. SECURITY_GUIDANCE_BLOCK=1 upgrades to refusing the write entirely; SECURITY_GUIDANCE_DISABLE=1 is the kill switch. Pattern data (patterns.py) is a verbatim Apache-2.0 fork of Anthropic's claude-plugins-official/plugins/security-guidance/hooks/ patterns.py at commit 0bde168 (2026-05-26). LICENSE and NOTICE preserve attribution. The Hermes-side plugin glue (__init__.py, plugin.yaml, README.md, tests) is original work. Plugin is opt-in like all bundled plugins: hermes plugins enable security-guidance Inspired by https://x.com/ClaudeDevs/status/1927108527247... — Anthropic shipped this as their security-guidance plugin for Claude Code on 2026-05-26 with a measured 30-40% reduction in security-related PR comments on internal rollout. What's NOT ported (deferred): * Layer 2 (LLM diff review on turn end) — would route through main model by default on Hermes, real money on reasoning models. A follow-up can wire it to a cheap aux model with explicit opt-in. * Layer 3 (agentic commit-time review) — agent can run this on demand via delegate_task today. * .hermes/security-guidance.md project-rules file — only used by layers 2/3 upstream.	2026-05-27 02:07:21 -07:00
SuperEarn	4920f8437f	test(codex): cover null output stream terminal events	2026-05-27 02:06:21 -07:00
Teknium	96223265b9	chore(api-server): mark skills_api capability True now that /v1/skills shipped #33016 added GET /v1/skills + /v1/toolsets on the API server; the capability flag introduced in this branch was placeholder-False. Flip to True so capability probers see the truth.	2026-05-27 01:56:55 -07:00
Jonathan	464b51d455	Support media in session chat API	2026-05-27 01:56:55 -07:00
Bailey Dixon	f7527b0fdb	feat: add API server session controls	2026-05-27 01:56:55 -07:00
EvilHumphrey	4243b6dc45	fix(codex): update silent-hang workaround hint	2026-05-27 01:52:34 -07:00
Teknium	25f43d38de	feat(api-server): add GET /v1/skills and /v1/toolsets (#33016 ) Lets external clients enumerate the agent's skills and resolved toolsets deterministically over the OpenAI-compatible API server, without standing up the dashboard web server or sending a chat message and asking the model to list them. - GET /v1/skills — list installed skills (name, description, category) - GET /v1/toolsets — list toolsets resolved for the api_server platform, with enabled/configured state and the concrete tool names each expands to - Both gated by API_SERVER_KEY (same Bearer scheme as every other /v1/* endpoint) - /v1/capabilities advertises both new endpoints Closes the gap a community user just hit asking how to list skills over REST when only the OpenAI-compatible server is running. Test plan - python -m pytest tests/gateway/test_api_server.py -k "Skills or Toolsets or Capabilities" -o 'addopts=' -q → 9/9 pass - python -m pytest tests/gateway/test_api_server.py -o 'addopts=' -q → 156/156 pass, no regressions - E2E: started a real adapter on an isolated HERMES_HOME with a fake skill installed; curl-equivalent calls to /v1/capabilities, /v1/skills, /v1/toolsets returned the expected JSON; unauthenticated calls returned 401 with the configured API_SERVER_KEY.	2026-05-27 01:27:26 -07:00
Teknium	febc4cfec0	remove Vercel AI Gateway and Vercel Sandbox (#33067 ) * remove Vercel AI Gateway provider and Vercel Sandbox terminal backend Both Vercel-hosted integrations are removed end-to-end. Users on the AI Gateway should switch to OpenRouter or one of the other aggregators (Nous Portal, Kilo Code). Users on the Vercel Sandbox backend should switch to Docker, Modal, Daytona, or SSH. What's removed: - `plugins/model-providers/ai-gateway/` provider plugin - `hermes_cli/vercel_auth.py` Vercel-Sandbox auth helper - `tools/environments/vercel_sandbox.py` terminal backend - `ai-gateway` provider wiring across auth, doctor, setup, models, config, status, providers, main, web_server, model_normalize, dump - `vercel_sandbox` backend wiring across terminal_tool, file_tools, code_execution_tool, file_operations, approval, skills_tool, environments/local, credential_files, lazy_deps, prompt_builder, cli, gateway/run - `AI_GATEWAY_BASE_URL` constant, `_AI_GATEWAY_HEADERS` auxiliary-client header set, run_agent base-URL header/reasoning special-cases - `[vercel]` pyproject extra and `vercel`/`vercel-workers` from uv.lock - env vars: `AI_GATEWAY_API_KEY`, `AI_GATEWAY_BASE_URL`, `VERCEL_TOKEN`, `VERCEL_PROJECT_ID`, `VERCEL_TEAM_ID`, `VERCEL_OIDC_TOKEN`, `TERMINAL_VERCEL_RUNTIME` - Tests: deletes test_ai_gateway_models.py and test_vercel_sandbox_environment.py; scrubs references across 23 surviving test files (no entire tests deleted unless they were dedicated to AI Gateway / Sandbox) - Docs: provider tables, env-var reference, setup guides, security notes, tool config, terminal-backend tables — English plus zh-Hans i18n parity - `hermes-agent` skill: provider table entry and remote-backend list What stays (intentional): - `popular-web-designs/templates/vercel.md` — CSS design reference, unrelated to Vercel-the-AI-product - `x-vercel-id` in `stream_diag.py` headers — generic Vercel CDN response header, useful diag signal on any Vercel-hosted endpoint - `vercel-labs/agent-browser` URL in browser config — lightpanda browser project, different OSS effort - `userStories.json` historical contributor entry mentioning Vercel Sandbox — archive, not active docs Validation: - 1153 tests in the 22 targeted files pass (`scripts/run_tests.sh`) - Full repo `py_compile` clean - Live import of every touched module + invariant check (no `ai-gateway` in `PROVIDER_REGISTRY`, no `_AI_GATEWAY_HEADERS`, no `vercel_sandbox` in `_REMOTE_TERMINAL_BACKENDS`) * test: convert profile-count check from change-detector to invariant The hardcoded "== 34" assertion broke when ai-gateway was removed. Per AGENTS.md change-detector-test guidance, assert the relationship (registry count >= number of plugin dirs) instead of a literal count. Counts shift when providers are added/removed; that's expected.	2026-05-27 00:43:32 -07:00
Teknium	cb38ce28cb	refactor(codex): drop SDK responses.stream() helper; consume events directly (#33042 ) * refactor(codex): drop SDK responses.stream() helper; consume events directly The OpenAI Python SDK's high-level `client.responses.stream(...)` helper does post-hoc typed reconstruction from the terminal `response.completed.response.output` field. The chatgpt.com Codex backend has been observed (today, gpt-5.5) to ship `response.output = null` on terminal frames, which crashes the SDK with `TypeError: 'NoneType' object is not iterable` mid-iteration. Carlton's #32963 patched the symptom by wrapping the helper in try/except and recovering from the same per-event accumulator the SDK was supposed to populate. This PR removes the helper from the call path entirely: we now use `client.responses.create(stream=True)` (raw AsyncIterable of SSE events) and assemble the final response object ourselves from `response.output_item.done` events as they arrive. The terminal event's `output` field is never read for content. Same strategy OpenClaw uses for the same backend. This makes Hermes structurally immune to the bug class, not patched. The next time OpenAI ships a shape change to chatgpt.com's terminal frame, our consumer keeps working because it doesn't read that frame for content — only for usage/status/id. Changes - `agent/codex_runtime.py`: new `_consume_codex_event_stream()` shared consumer; `run_codex_stream()` uses `responses.create(stream=True)`; `run_codex_create_stream_fallback()` collapses into a thin alias since the primary path now does what the fallback used to do. - `agent/auxiliary_client.py`: `_CodexCompletionsAdapter` uses the same consumer; old null-output recovery helpers deleted as unreferenced. - Tests migrated: fixtures that mocked `responses.stream` now mock `responses.create` returning a raw iterable. New regression test asserts the auxiliary path returns streamed items even when the terminal event's `output` is literally `null`. Validation - Live: tested against fresh OAuth on `chatgpt.com/backend-api/codex` with `gpt-5.5` — response built correctly with `response.output=null` on the terminal frame, all events consumed, usage/reasoning tokens propagated. - `tests/run_agent/test_run_agent_codex_responses.py` + `tests/agent/test_auxiliary_client.py`: 242 passed. * test+fix(codex): migrate streaming tests, raise on truncated streams CI surfaced 10 test failures across tests/run_agent/test_streaming.py and tests/run_agent/test_codex_xai_oauth_recovery.py — both files had their own `responses.stream(...)` mocks I missed in the first sweep. agent/codex_runtime.py: _consume_codex_event_stream() now raises "Codex Responses stream did not emit a terminal response" when the stream ends without any terminal frame AND no usable content. This preserves the signal callers used to get from the SDK's high-level helper, which they distinguished from "completed with empty body" in error handling. Tests migrated: - test_streaming.py: text-delta callback, activity-touch, and remote-protocol-error tests all switch from mocking responses.stream to responses.create returning an iterable of events. - test_codex_xai_oauth_recovery.py: prelude-error tests are recast as wire-error-event tests (the new path raises _StreamErrorEvent directly when the wire emits type=error, which is strictly better than the old two-phase "SDK RuntimeError → retry → fallback"). The retry-on-transport-error test moves from responses.stream side-effect to responses.create side-effect. Verified live against chatgpt.com Codex with gpt-5.5 — AIAgent.chat() through the full codex_responses path returns correctly, 319/319 targeted tests passing.	2026-05-27 00:30:06 -07:00
Teknium	b6ca56f651	fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144 ) (#33035 ) * fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144) When an OpenAI-compatible Responses API surface accepts an initial request but later rejects the replayed `codex_reasoning_items` encrypted blob with HTTP 400 `invalid_encrypted_content`, the session previously got stuck retrying the same poisoned payload. Recovery: classify the error as a dedicated FailoverReason, and on the first hit disable encrypted reasoning replay for the rest of the session, strip cached items from message history, and retry once. Changes: * error_classifier: add FailoverReason.invalid_encrypted_content branch in _classify_400 (before context_overflow so the messages that mention 'encrypted content … could not be verified' don't trip context heuristics), in _classify_by_error_code, and extend _extract_error_code to peek inside wrapped JSON in error.message and ignore the bare '400' as a code. * agent_init: initialize `_codex_reasoning_replay_enabled = True` on every agent. * run_agent: add AIAgent._disable_codex_reasoning_replay() helper that flips the flag and pops cached items. * codex_responses_adapter: thread a `replay_encrypted_reasoning` kwarg through _chat_messages_to_responses_input so that when the flag is False we don't replay codex_reasoning_items. * transports/codex.py: read `replay_encrypted_reasoning` from params, thread it into the adapter, and gate the `include=['reasoning.encrypted_content']` request hint on it. * chat_completion_helpers: pass the agent's replay flag through to the transport. * conversation_loop: in the retry loop, add an invalid_encrypted_content recovery branch that fires once per session, only when api_mode == codex_responses, only when replay is still enabled, and only when at least one assistant message in history actually carries cached reasoning items (otherwise the 400 has nothing to do with our cache and the normal retry path handles it). Tests: * test_error_classifier: new wrapped-JSON _extract_error_code case; new TestClassifyApiError cases proving the 400 is retryable with no fallback, that the broad message match doesn't catch a generic 'parsed' message, and that the error code match is case-insensitive. * test_run_agent_codex_responses: end-to-end test of the recovery branch firing once and disabling replay, plus a sibling test that proves the branch does not fire (and the flag stays True) when history has no cached reasoning items. Salvages PR #10144 onto the post-refactor module layout (error_classifier / codex_responses_adapter / transports/codex / conversation_loop / agent_init) since the original diff was written against the pre-refactor monolithic run_agent.py. * chore(release): map victorGPT in AUTHOR_MAP for #10144 salvage --------- Co-authored-by: victorGPT <wuxuebin1993@gmail.com>	2026-05-26 22:01:17 -07:00
emozilla	3d9a26afad	Merge remote-tracking branch 'origin/main' into jq/hermes-update-branch-flag	2026-05-27 00:48:25 -04:00
Ben Barclay	81a4f280d2	Merge pull request #22534 from wesleysimplicio/fix/voice-mode-docker-respect-pulse-pipewire fix(voice): honor PULSE_SERVER/PIPEWIRE_REMOTE inside Docker (#21203)	2026-05-27 13:59:12 +10:00
Nick	0a83247e9f	feat: add TUI session orchestrator Add a first-class active-session orchestrator for the Ink TUI: - list, activate, close, and launch live process-local TUI sessions - hydrate committed and in-flight output when switching sessions - dispatch a new prompt session from the +new row with session-scoped model picks - expose a clickable live-session count in the status chrome - preserve stable row order while initially focusing the current session - support mouse hit-testing for floating orchestrator overlays - add backend and frontend regression coverage for the lifecycle and UI helpers	2026-05-26 20:51:59 -07:00
beardthelion	2fc77c53f0	feat(opencode-go): route qwen3.7-max via anthropic_messages qwen3.7-max on OpenCode Go rejects the OpenAI-compatible (oa-compat) format with HTTP 401 but works correctly via the Anthropic Messages endpoint (/v1/messages with x-api-key auth). Route it the same way MiniMax models are routed: anthropic_messages api_mode. Changes: - hermes_cli/models.py: add qwen3.7-max routing + curated list - hermes_cli/setup.py: add to setup wizard model list - hermes_cli/auth.py: update provider comment - tests: add assertions for qwen3.7-max api_mode routing	2026-05-26 20:44:43 -07:00
Will Falcon	bba50977bc	fix: parse Codex image generation SSE directly	2026-05-26 20:40:29 -07:00
Carlton	43a3f119fc	fix(agent): recover Codex streams with null output	2026-05-26 19:37:37 -07:00
Teknium	bb4703c761	docs(auth): replace stale 'hermes login' references with 'hermes auth add' 'hermes login' was removed (the command now just prints a deprecation message and exits). The bundled hermes-agent SKILL.md, in-code error messages, the tip rotation, the proxy adapters, and the docs site still pointed agents and users at the dead command — so models loading the skill kept running 'hermes login --provider openai-codex' and getting a dead-end print. Replacements use the canonical 'hermes auth add <provider>' surface (or bare 'hermes auth' for the interactive manager). Files: - skills/autonomous-ai-agents/hermes-agent/SKILL.md (+ regenerated docs page) - hermes_cli/tips.py (tip rotation) - agent/google_oauth.py (gemini-cli error message) - agent/conversation_loop.py (nous re-auth troubleshooting line) - agent/credential_sources.py (docstring) - hermes_cli/proxy/cli.py + hermes_cli/proxy/adapters/nous_portal.py (proxy auth hints) - tests/hermes_cli/test_proxy.py (updated assertions) - website/docs/reference/faq.md, website/docs/user-guide/features/subscription-proxy.md - zh-Hans i18n mirrors for the above 'hermes logout' is still a live command and is left untouched. The 'hermes login' stub in hermes_cli/auth.py:login_command() and the cli-commands.md 'Deprecated' rows are intentionally kept as the discoverable deprecation surface.	2026-05-26 15:41:11 -07:00
teknium1	f05a47309e	fix(gateway): refresh cached agent tools on /reload-mcp When the gateway processes /reload-mcp, it reconnects MCP servers and updates the global _servers registry, but cached AIAgent instances in _agent_cache keep the tools list they were built with. The user had to also run /new (discarding conversation history) before the agent could see the new tools — even though /reload-mcp had succeeded. This patch refreshes each cached agent's .tools and .valid_tool_names in _execute_mcp_reload after discovery returns, so existing sessions pick up new MCP tools on their next turn. The slash-confirm gate in _handle_reload_mcp_command already obtains user consent for the implied prompt-cache invalidation before this code runs. Mirrors the equivalent behaviour the CLI already does in cli.py _reload_mcp. Per-agent enabled_toolsets and disabled_toolsets are preserved so an agent that was scoped to a subset of toolsets does not silently gain disabled tools after the reload. Original diagnosis + initial implementation in #23812 from @fujinice. The auto-reload watcher half of that PR is intentionally dropped — users want /reload-mcp to remain explicit. Co-authored-by: fujinice <45688690+fujinice@users.noreply.github.com>	2026-05-26 14:28:51 -07:00
teknium1	556bf7c5c1	test(cron): guard schedule-required description text on CRONJOB_SCHEMA	2026-05-26 14:09:37 -07:00
Teknium	ccd3d04fc5	chore(models): swap qwen3.6-plus → qwen3.7-max in openrouter+nous lists (#32809 ) Updates curated picker lists for both the OpenRouter fallback snapshot (`OPENROUTER_MODELS`) and the Nous Portal list (`_PROVIDER_MODELS['nous']`). Regenerates website/static/api/model-catalog.json via `scripts/build_model_catalog.py` to keep the docs-hosted manifest in sync (drift guard in `test_in_repo_lists_match_manifest`). tests/hermes_cli/test_models.py fixtures updated — they pinned the old model id as their live-fetch sample.	2026-05-26 14:01:47 -07:00
Teknium	8b69ec03af	feat(mcp): Nous-approved MCP catalog with interactive picker (#30870 ) * feat(mcp): Nous-approved MCP catalog with interactive picker Adds an optional-mcps/ directory mirroring optional-skills/: curated, Nous-approved MCP servers shipped with the repo but disabled by default. Presence in optional-mcps/ = approval. No community tier, no trust signals. Entries are added by merging a PR. New surface: hermes mcp Interactive catalog picker (default) hermes mcp catalog Plain-text list, scriptable hermes mcp install <name> Install a catalog entry Picker behavior: not installed -> install (clone/bootstrap if needed, prompt for creds) installed/off -> enable installed/on -> menu (disable / uninstall / reinstall) Manifest schema (manifest_version: 1) supports: - transport: stdio (command/args, ${INSTALL_DIR} substitution) or http (url) - install: optional git clone + bootstrap commands (for repos that need local venv setup, like the n8n bridge); omit for npx/uvx servers - auth: api_key (prompts -> ~/.hermes/.env), oauth (provider-mediated or native MCP), or none Catalog entries are never auto-updated. Users re-run `hermes mcp install` to refresh. Credentials always go to ~/.hermes/.env (the .env-is-for-secrets rule), never to per-server env blocks. Ships n8n as the reference manifest (https://github.com/CyberSamuraiX/hermes-n8n-mcp). Tests: 19 catalog tests + E2E install/uninstall round-trip via the shipped manifest. * feat(mcp): tool-selection checklist + Linear catalog entry Adds install-time tool selection so users only enable the MCP tools they actually want, and ships Linear as a second reference catalog entry to demonstrate the http+oauth path alongside n8n's stdio+api_key+git-bootstrap. Tool selection flow: install (clone/auth/credentials) -> probe server for available tools -> curses checklist with pre-checked rows -> write mcp_servers.<name>.tools.include Pre-check priority: 1. user's prior tools.include (reinstall preserves selection) 2. manifest's tools.default_enabled (curated subset) 3. all probed tools (default) Probe-failure fallback (server unreachable, OAuth not yet complete, backing service offline): - manifest declared default_enabled -> applied directly - no default declared -> no filter written (all-on when reachable) - both cases point user at hermes mcp configure <name> Manifest schema additions: tools: default_enabled: [list, of, tool, names] # optional Updates: - optional-mcps/linear/manifest.yaml -- new reference entry (http+oauth) - optional-mcps/n8n/manifest.yaml -- tools.default_enabled set to the 8 read-mostly tools; mutating tools (activate/deactivate, container_logs) pruned by default - docs: new 'Tool selection at install time' section in features/mcp.md Tests: 7 new tests in TestToolSelection covering probe-success / probe-fail matrix, manifest-default filtering, reinstall-preserves-selection, and invalid-default-enabled rejection. 26 catalog tests + 32 existing mcp_config tests passing. * feat(mcp): polish — picker unification, include-mode convergence, hardening Addresses review findings on PR #30870. Lands all improvements that belong in this PR before merge; defers separate cleanup (consolidating two probe implementations, change-detector tests) to follow-ups. Picker UX (mcp_picker.py) - Unifies catalog + custom (user-added) MCPs in one view with distinct status badges (available / enabled / installed (disabled) / custom — enabled / custom — disabled) - Adds 'Configure tools (probe server + re-pick)' action to both the catalog-installed and custom-row submenus — the existing hermes mcp configure flow was previously unreachable from the picker - Loops until ESC/q so the user can manage several entries in one session instead of having to re-launch - Uninstall message now mentions .env credentials are preserved with a pointer to clean them up manually if no longer needed - Surfaces a 'requires a newer Hermes' warning per future-manifest entry instead of silently hiding it Catalog (mcp_catalog.py) - catalog_diagnostics() exposes which manifests were skipped and why (future_manifest vs invalid) so UIs can give actionable feedback - _do_git_install detects SHA-shaped refs (regex /[0-9a-f]{7,40}/) and skips the doomed 'git clone --branch <sha>' attempt — clone --branch only accepts branches/tags, so SHAs always failed noisily before falling back to the full-clone path - Probe-success all-tools-enabled message now mentions that new tools the server adds later will be auto-enabled (no-filter mode) Convergence (tools_config.py) - _configure_mcp_tools_interactive now writes tools.include (whitelist) instead of tools.exclude (blacklist), matching the catalog flow and hermes mcp configure. The on-disk config shape no longer depends on which UI the user touched last - Two existing tests updated to assert the new include-mode contract Discoverability - Setup wizard final step now prints 'Browse curated MCPs: hermes mcp' - Three tip-corpus entries pointing at the new catalog - Docs updated with: trust model (manifests run code locally, gated by PR review, but read before installing), runtime ${ENV_VAR} substitution semantics, and the manifest_version forward-compat behavior Tests - 7 new tests covering future-manifest diagnostics, custom MCP picker rows, SHA-ref git-install path, branch-ref git-install path, and the tools_config include-mode write contract - 80 MCP-related tests passing across test_mcp_catalog.py, test_mcp_config.py, test_mcp_tools_config.py * fix(mcp): drop setup-wizard catalog hint to satisfy supply-chain scanner The wizard line 'Browse curated MCPs: hermes mcp' triggered the CI supply-chain scanner because it pattern-matches on edits to any file named hermes_cli/setup.py — that filename matches the Python 'install-hook file' heuristic even though this setup.py is the user-facing 'hermes setup' wizard, not a packaging install hook. The catalog is already surfaced via three tip-corpus entries in hermes_cli/tips.py (which the scanner doesn't flag), so dropping the wizard mention loses no discoverability. Worth revisiting after a scanner allowlist for this specific file lands.	2026-05-26 12:48:14 -07:00
dearmayo	f4953bc648	fix(subdirectory_hints): prevent loading AGENTS.md outside workspace SubdirectoryHintTracker was scanning directories outside the active working directory, allowing files like ~/.codex/AGENTS.md or ~/.claude/CLAUDE.md to be loaded and injected into the agent context. This causes cross-agent context contamination and instruction mixup. Add _is_ancestor_or_same() helper and a path boundary check in _is_valid_subdir(): only directories within the working directory tree (i.e. path.is_relative_to(working_dir)) are allowed. Also add exist_ok=True to mkdir() calls in new tests to prevent pytest-xdist race conditions when workers share the same tmp_path parent. Tests added: - test_outside_working_dir_rejected: verifies sibling dirs are blocked - test_outside_working_dir_absolute_path_rejected: verifies ~/.codex paths blocked - test_inside_workspace_subdir_allowed: verifies normal subdir access unaffected - test_sibling_repo_not_loaded_via_ancestor_walk: ancestor walk stays within workspace	2026-05-25 23:17:33 -07:00
Krisli Dimo	9d10c45e32	fix(telegram): tighten table row-group spacing and drop redundant first bullet The GFM → Telegram-row-group rewriter previously joined every line in every row with a blank line ("\n\n".join(rendered_rows)), which made multi-column tables explode into one-bullet-per-paragraph walls on mobile. It also emitted the row heading twice when the table had no row-label column: once as the standalone bold heading and once again as the first labeled bullet (heading == headers[0] == data_cells[0]). This commit: * Uses single newlines between the heading and its bullets within a row-group, and a blank line only BETWEEN row-groups. * Skips any bullet whose value duplicates the heading text when the table has no row-label column (the heading already carries that information). Tables WITH a row-label column are unaffected since the heading comes from the label cell and never duplicates a header. Updated existing test assertions accordingly and added two regression tests: one that reproduces the screenshot bug (wide five-column "Plays" comparison table) and one that pins the row-label-column behavior so the dedup logic doesn't accidentally swallow real data. tests/gateway/test_telegram_format.py: 101 passed	2026-05-25 23:16:00 -07:00
MorAlekss	c26af46811	fix(skills): reject symlinks in skill bundles before install	2026-05-25 18:33:02 -07:00
Teknium	ccd899318e	fix(cron): split scanner into two tiers so skill prose stops false-positiving (#32339 ) The runtime cron prompt scanner (added in #3968 to plug the "malicious skill carrying an injection payload" gap) reuses the same critical-severity patterns as the create-time user-prompt scan against the assembled prompt — which includes loaded skill markdown. That works fine for narrow patterns like "ignore previous instructions" which never legitimately appear in prose. It catastrophically false- positives on command-shape patterns like `cat ~/.hermes/.env`, `authorized_keys`, `/etc/sudoers`, and `rm -rf /`, which routinely appear in security postmortems and runbooks as descriptive prose about attacks, not as actual commands. Concrete failure: the bundled `hermes-agent-dev` skill contains a security postmortem section saying "the attacker could just `cat ~/.hermes/.env`". Every PR-scout cron job that loaded this skill was silently blocked with `Blocked: prompt matches threat pattern 'read_secrets'`. All 11 scout jobs failed for weeks. Fix: split the scanner into two tiers and route by context: - `_scan_cron_prompt` (strict, unchanged behavior) runs against the small user-authored cron prompt at create/update and as a runtime defense-in-depth when no skills are attached. A legit user prompt has no business saying `cat .env`, so the strict patterns still apply there. - `_scan_cron_skill_assembled` (new, looser) runs against the assembled prompt when skills are attached. It only catches unambiguous prompt-injection directives ("ignore previous instructions", "disregard your rules", "system prompt override", "do not tell the user") plus invisible-unicode markers. Command- shape patterns are dropped because they false-positive on prose. This is defense-in-depth, not the only line of defense. Skill bodies are already scanned at install time by `skills_guard.py`; the runtime cron scan exists purely as a tripwire for an obvious injection directive surviving a malicious install. Catching prose mentions of commands was never the goal of #3968 — the test that planted a skill containing `cat ~/.hermes/.env` was the wrong shape of test for the threat model. Tests: - `_scan_cron_prompt` strict behavior preserved (56 existing tests unchanged: bare `cat .env`, `rm -rf /`, etc. still block). - New `TestScanCronSkillAssembled` class verifies the looser scanner: injection / disregard / system-override / do-not-tell-the-user / invisible-unicode still block; descriptive prose about attack commands is allowed; GitHub auth-header allowlist still works. - `test_skill_with_env_exfil_payload_raises` (planted `cat .env` in skill body) replaced with `test_skill_with_env_exfil_command _in_prose_is_allowed` documenting the new correct behavior with the real-world postmortem-style example that triggered the bug. - All 11 originally-failing PR-scout jobs validated end-to-end via `_build_job_prompt` — assembled prompts now build successfully with the `hermes-agent-dev` skill attached. Total: 75/75 tests in cron + cronjob_tools + threat scanner pass; 544/544 across the wider cron / memory / threat-pattern surface.	2026-05-25 18:20:45 -07:00
Teknium	e3236e99a4	fix(anthropic): API-key path skips OAuth autodiscovery + prunes stale entries When the user picks 'Anthropic API key' at `hermes setup` (vs 'Claude Pro/Max subscription'), `save_anthropic_api_key()` writes ANTHROPIC_API_KEY to ~/.hermes/.env and zeros ANTHROPIC_TOKEN. That env-var pattern is the user's explicit choice of auth method — API key, not OAuth. But the anthropic credential pool's autodiscovery (_seed_from_singletons) unconditionally read ~/.claude/.credentials.json from the Claude Code CLI and any saved hermes_pkce creds, and added them to the SAME anthropic pool as the user's API key. Two problems: 1. Even with the API key at higher priority, a 401/429 on the API key would rotate the session onto an autodiscovered OAuth credential, silently flipping the agent into the Claude Code masquerade mid-conversation: 'You are Claude Code' system block, every tool renamed to mcp_*, claude-cli User-Agent header. 2. Switching OAuth → API key at `hermes setup` cleared the env vars but left previously-seeded OAuth entries dormant in auth.json, where rotation could revive them. The user picking the API-key path is explicitly opting OUT of the masquerade. Mixing OAuth credentials into their pool defeats that choice. Fix: in `_seed_from_singletons` for provider='anthropic', detect the API-key path (ANTHROPIC_API_KEY set in env, no OAuth env var set) and: - Skip calling read_claude_code_credentials() and read_hermes_oauth_credentials() entirely - Prune any stale hermes_pkce / claude_code entries that may already be in the on-disk pool OAuth-path users (ANTHROPIC_TOKEN set) are unaffected — autodiscovery continues to fire as before. Tests: 3 new regression tests (api-key skips autodiscovery, api-key prunes stale entries, oauth path still autodiscovers). Full file 70/70.	2026-05-25 17:41:40 -07:00
Teknium	2c6bbaf352	fix(gateway): coerce scalar `model:` to dict before /model --global persist (#32272 ) Reported via AskClaw. When config.yaml has `model: <name>` (flat string) instead of the nested `model: {default: ..., provider: ...}` form, every gateway `/model X --global` crashed silently with TypeError: 'str' object does not support item assignment The persist block did: model_cfg = cfg.setdefault("model", {}) model_cfg["default"] = result.new_model `setdefault` returns the existing scalar, and the next assignment blows up. The 'switch failed' warning was logged at WARNING level and the user never saw why their persist didn't stick. Coerce scalar/None `model:` into a dict before mutation, in both the gateway path (`gateway/run.py`) and the sister site in `hermes_cli/doctor.py --fix` (same setdefault-on-string flaw). The CLI `/model` path is unaffected because it goes through `_set_nested` which already replaces scalar leaves with dicts. Regression test `tests/gateway/test_model_command_flat_string_config.py` covers the flat-string, missing, and proper-dict cases. Without the fix, the flat-string case fails with the exact original TypeError.	2026-05-25 15:22:23 -07:00
Teknium	de76f4dbcf	fix(secrets): only apply external secrets once per HERMES_HOME per process (#32271 ) `load_hermes_dotenv()` is called at module-import time from cli.py, hermes_cli/main.py, run_agent.py, trajectory_compressor.py, gateway/run.py, tui_gateway/server.py, acp_adapter/entry.py, and a few others. Each call triggered `_apply_external_secret_sources()`, which re-parsed config, re-fetched from Bitwarden Secrets Manager (its own 300s cache mostly absorbed this), re-ran the ASCII sanitization sweep, and reprinted Bitwarden Secrets Manager: applied N secret(s) (...) to stderr. Users saw the status line 3-5x per CLI startup. Guard the function with a process-level set of HERMES_HOME paths that have already had external secrets applied. Subsequent calls for the same home_path are no-ops. `reset_secret_source_cache()` lets tests (and any future long-running consumer that wants to refresh after a config change) force a re-pull.	2026-05-25 15:18:55 -07:00
Teknium	6bd0be30be	feat(patch): indentation preservation, CRLF preservation, per-file failure escalation (#507 ) (#32273 ) Three granular patch-tool refinements from the Roo Code deep-dive (#507). ## Indentation preservation (fuzzy_match.py) When fuzzy_find_and_replace matches via a non-exact strategy, the file's indentation may differ from what the LLM sent in old_string/new_string (common case: model sends zero-indent old/new for a method body that lives inside an 8-space-indented class). Before this commit the replacement was spliced in verbatim, producing a file with a broken indent level that may still parse but is logically wrong. The fix computes the indent delta between old_string's first meaningful line and the matched region's first meaningful line, then re-indents every line of new_string by that delta. Exact-strategy matches are untouched (passthrough). Same approach as Roo Code's multi-search-replace.ts:466-500. ## CRLF preservation (file_operations.py) Models nearly always send tool args with bare LF endings (JSON-encoded), but the file on disk may have CRLF (Windows-line-ending configs, .bat, .cmd, .ini files). Before this commit: - write_file silently normalized CRLF to LF on every overwrite - patch produced mixed-ending files: the substituted region had LF, the surrounding context kept CRLF The fix detects the file's existing line endings (via pre_content if already read for lint/LSP, otherwise a tiny head -c 4096 probe), and normalizes the entire write to that ending. New files are written verbatim (no detection possible). ## Per-file failure escalation (file_tools.py) When the agent fails to patch the same file 3+ times in a row, the existing 'old_string not found' hint isn't strong enough — the model keeps retrying with variations against a stale view of the file. The fix tracks consecutive failures per (task_id, resolved_path) and injects an escalating hint after 3 failures: 'This is failure #N patching X. Stop retrying. Either re-read fresh, use longer context, or fall back to write_file.' Counter resets on a successful patch to the same path. ## Validation - 22 new tests across tests/tools/test_fuzzy_match.py (5), test_line_ending_preservation.py (12), test_patch_failure_tracking.py (5) - All existing tests pass (165/165 in the touched files) - E2E verified with real _handle_patch / _handle_write_file calls against real CRLF files and real failure loops Closes part of #507. The remaining open items in #507 (2b start_line hint, behavioral rules) were declined after audit: - 2b adds schema bloat for a problem the existing 'multiple matches' contract already handles - Behavioral rules conflict with the personality system Items 1, 2d, 2e, 3, 4 of #507 were already landed in earlier work.	2026-05-25 15:18:45 -07:00
Teknium	30928f945f	fix(dashboard): suffix-allowlist plugin assets + denylist subprocess-influencing env vars (#32277 ) Two posture fixes surfaced by the web-pentest skill self-test against the dashboard (issue #32267). 1. /dashboard-plugins/<name>/<path> previously returned 200 for any file inside the plugin's dashboard directory — including plugin_api.py and __pycache__/.pyc. The path is unauthenticated by architecture (SPA loads JS via <script src> and CSS via <link href>, neither of which can attach a custom auth header), so the fix is not "require token" — it's "restrict to browser-fetchable suffixes." Allowlist now: .js .mjs .css .json .html .svg .png .jpg .jpeg .gif .webp .ico .woff .woff2 .ttf .otf .map. Everything else → 404. This stops a private user-installed plugin's Python source from being readable by anyone reachable on the dashboard's loopback port (other local users on a shared box, sidecar containers sharing the host netns). 2. save_env_value() now refuses to persist env-var names that influence how the next subprocess executes: LD_PRELOAD, LD_LIBRARY_PATH, LD_AUDIT, DYLD_, PYTHONPATH, PYTHONHOME, PYTHONSTARTUP, NODE_OPTIONS, NODE_PATH, PATH, SHELL, EDITOR, VISUAL, PAGER, BROWSER, GIT_SSH_COMMAND, GIT_EXEC_PATH; plus HERMES_HOME / HERMES_PROFILE / HERMES_CONFIG / HERMES_ENV. PUT /api/env is authed but the session token lives in the SPA HTML where any future plugin XSS or local process can read it. Without this gate, a token-holder could plant LD_PRELOAD in .env and the next hermes process start would load attacker code via the dotenv to os.environ chain. This is enforced on write only — pre-existing .env values are left alone (the gate is in save_env_value, not in load_env). PUT /api/env now returns 400 with the explanatory message instead of an opaque 500. IMPORTANT: HERMES_* overall is NOT blocked — only the four runtime location names. Integration credentials following the HERMES_* convention (HERMES_GEMINI_, HERMES_LANGFUSE_, HERMES_SPOTIFY_*, HERMES_QWEN_BASE_URL, ...) keep working. Regression tests cover both fixes (30 new test cases). No existing tests changed; 257 passing in tests/hermes_cli/. Closes #32267.	2026-05-25 15:07:19 -07:00
teknium1	926da69b45	test(telegram): switch transient-flake retry test to group chat Salvage follow-up. The transient thread-not-found retry test was exercising chat_id='123' (positive, looks-like-private) which now hits the new private-DM-topic fail-closed contract. The test's intent is the transient-flake retry on real forum topics in groups, so use -100123 to make the scenario unambiguous.	2026-05-25 14:54:02 -07:00

1 2 3 4 5 ...

4377 commits