hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-24 10:52:21 +00:00

Author	SHA1	Message	Date
kshitij	1f28b1a9b9	fix(gateway): redact credentials from approval prompts before sending to clients (#48456 ) (#50767 ) Tirith redacts its own findings, but the approval-request callbacks built the operator prompt from the RAW command string, so a credential-shaped value Tirith flagged was sent verbatim to clients, undoing the redaction one layer up. Two egress transports carried the leak; both are fixed via a shared module-level seam _redact_approval_command() (redact_sensitive_text force=True): 1. chat platforms — _approval_notify_sync (gateway/run.py): redact before both the button path (send_exec_approval) and the plain-text /approve fallback. 2. SSE/API stream — _approval_notify (gateway/platforms/api_server.py): redact event['command'] before it is enqueued to API/desktop clients. (whole-bug-class: sibling call path on a separate transport.) force=True so the prompt — a hard secret-egress boundary — honors redaction even when security.redact_secrets is off. Clean commands pass through unchanged. Tests bind the seam (synthetic credential-format fixtures, force-when-disabled) AND assert BOTH callbacks ASSIGN the redacted result before the send/enqueue sink, via an AST contract that rejects a discarded-result call. All mutation-checked.	2026-06-22 11:39:45 +00:00
teknium1	7726ce3040	fix(security): close hermes-0day MCP-persistence attack surface Remove the dashboard --insecure auth-bypass, add an MCP persistence guard + IOC blocklist, and raise the API-server key entropy floor. Driven by the June 2026 hermes-0day campaign (r/hermesagent, live 854.media instance): scanners find exposed Hermes dashboards/API servers, drive the root agent to plant a 'command: bash' MCP entry that appends an attacker SSH key to authorized_keys, which cron + startup then re-execute every tick. - dashboard: --insecure no longer disables the auth gate. should_require_auth returns True for every non-loopback bind; a public bind ALWAYS requires an auth provider (bundled password provider or OAuth). --insecure kept as a warned no-op for backward compat. Fail-closed error now points at the password provider, not at --insecure. - mcp_security: validate_mcp_server_entry now also rejects shell payloads that write to OS persistence surfaces (authorized_keys/.ssh/pam.d/sudoers/cron/ rc files) and hard-rejects a hermes-0day IOC blocklist (attacker SSH key + source IPs) anywhere in command/args/env. Runs at save AND spawn time. - api_server: raise network-bind API_SERVER_KEY entropy floor 8->16 chars; warn when a network-accessible API server runs an unsandboxed local backend.	2026-06-21 19:05:27 -07:00
Teknium	7a131f7f40	fix(api-server): stop silently promising async delivery on stateless HTTP path (#50319 ) * fix(api-server): stop silently promising async delivery on stateless HTTP path terminal(notify_on_complete=True / watch_patterns) and delegate_task(background=True) silently no-op'd on the API server / WebUI path (#10760): the watcher / detached child registered, but every API-server route (OpenAI-spec /v1/chat/completions and /v1/responses, plus the proprietary /v1/runs SSE stream) tears down its channel when the turn ends, and APIServerAdapter.send() is a no-op stub. A completion that fires after the response closed had nowhere to go — from the agent side, indistinguishable from a hang. There is no spec-compliant surface to wake the agent later on a stateless HTTP client, so make the no-op honest instead of silent: - Add a per-adapter capability flag supports_async_delivery (default True; APIServerAdapter = False), propagated into a HERMES_SESSION_ASYNC_DELIVERY contextvar via async_delivery_supported(). Toggle on the adapter, not a hardcoded platform string — a future stateless adapter is correct-by-default. - terminal: when delivery is unsupported, skip watcher registration, force notify_on_complete off, and return a notify_unsupported note telling the agent to process(action='poll'). - delegate_task: when delivery is unsupported, fall back to SYNCHRONOUS execution (work runs and returns in the same response) with a note, instead of handing out a handle that never resolves. CLI (in-process completion_queue) and the real gateway platforms are unchanged. Fixes #10760 * refactor(api-server): route session binding through a single no-delivery chokepoint Add APIServerAdapter._bind_api_server_session() and route both agent-entry paths (_run_agent for /v1/chat/completions + /v1/responses, and the /v1/runs _run_sync path) through it. The helper hardwires platform="api_server" and async_delivery=False with no async_delivery parameter to pass, so a future route added to the API server physically cannot reintroduce the silent no-op (#10760) by forgetting to mark the channel as non-delivering. The binding stays request-scoped (cleared per turn), so a session resumed later on a delivering interface (CLI / gateway platform) re-binds fresh and is NOT blocked — the no-delivery decision tracks the interface handling the current turn, never the session.	2026-06-21 12:15:14 -07:00
Teknium	e499d69e3e	feat(api-server): configurable concurrent-run cap to prevent DoS (#50007 ) The OpenAI-compatible API server only enforced a hardcoded cap of 10 concurrent runs on /v1/runs, leaving /v1/chat/completions and /v1/responses unbounded — a request flood could exhaust CPU, memory, and upstream LLM quota (#7483). - Add gateway.api_server.max_concurrent_runs (config.yaml, default 10, 0 disables). No env var. - Shared concurrency gate across all three agent-serving endpoints, counting both the chat/responses in-flight counter and the /v1/runs stream set. Returns OpenAI-style 429 + Retry-After when at the cap. - Remove the dead hardcoded _MAX_CONCURRENT_RUNS class attribute. Closes #7483.	2026-06-21 07:26:03 -07:00
kshitijk4poor	b577f25100	refactor(gateway): dedupe drain-timeout resolution + share active_agents parse Follow-up cleanups on top of the busy/idle readout (PR #50103): - web_server.py /api/status reused the single drain-timeout resolver hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT env -> agent.restart_drain_timeout config -> default) instead of inlining a third hand-rolled copy of that precedence chain. Also fixes a subtle divergence: the inline copy used os.environ.get() so a set-but-empty env var was treated as a value rather than falling through to config; the shared resolver .strip()s and falls through correctly. - Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces (/api/status and /health/detailed) through it, so the exposed active_agents field is consistently clamped non-negative. Previously /api/status clamped while /health/detailed exposed the raw file value, diverging on a corrupt count. - Added TestParseActiveAgents covering the shared coercion contract.	2026-06-21 17:22:52 +05:30
Ben	0ee75469d7	feat(dashboard): surface gateway busy/drainable on /api/status Give an external consumer (NAS) a trustworthy, always-reachable busy/idle readout it can poll before a disruptive lifecycle action (restart, migrate, stop, auto-update). The dashboard /api/status is the only HTTP surface guaranteed up on a hosted agent regardless of which gateway platforms are enabled, and it already reads gateway_state.json. Add to /api/status (additive, non-breaking): - active_agents — in-flight gateway-turn count (now refreshed per-turn by the companion gateway-side commit) - gateway_busy — running AND active_agents > 0 - gateway_drainable — running and live (a valid begin-drain target) - restart_drain_timeout — resolved seconds, so the consumer can size its poll deadline without out-of-band knowledge (env HERMES_RESTART_DRAIN_TIMEOUT → config agent.restart_drain_timeout → default) The busy/drainable contract is defined once in gateway.status (derive_gateway_busy / derive_gateway_drainable) and consumed by both /api/status and /health/detailed so the two surfaces can never disagree. Liveness keys off gateway_running (a live PID/health probe), NEVER gateway_updated_at — a healthy idle gateway never advances that timestamp. All derived fields degrade to safe falsy values when the gateway is down or the status file is absent/corrupt (never a spurious "busy" that would wedge the consumer). active_sessions (the 5-min DB recency heuristic the SPA reads) is left exactly as-is — new signal, new fields. Tests (behaviour contracts, not snapshots): the pure derivation contract across every running/state/count/liveness combination; /api/status integration for busy, idle-drainable, draining, down, stale-busy-file, corrupt-count, and timeout surfacing; and /health/detailed parity.	2026-06-21 17:22:52 +05:30
teknium1	a58287afcb	Merge remote-tracking branch 'origin/main' into pr48275-rebase # Conflicts: # cron/scheduler.py	2026-06-19 07:40:29 -07:00
infinitycrew39	460b1e50e5	fix(gateway): refresh max_turns before resolving runtime budget	2026-06-19 06:31:13 -07:00
Ben	b75757d4aa	feat(cron): wire on_jobs_changed, cron.chronos config, docs + agent↔NAS contract Phase 4F (F.1 + F.2 + F.3, agent side). F.4 is the operator-run live smoke (needs a NAS deployment); recorded in the PR, not code. F.1 — on_jobs_changed wiring: - cron/scheduler.py: _notify_provider_jobs_changed() — resolve the active provider, call on_jobs_changed(), swallow errors. Lives in scheduler.py (not jobs.py) so the store stays free of provider imports (no import cycle). - Wired at the consumer surfaces AFTER a successful mutation: the cronjob model tool (tools/cronjob_tools.py, create/update/remove/pause/resume) — which the `hermes cron` CLI also routes through — and the REST handlers (gateway/platforms/api_server.py, same five). Built-in's no-op default = zero behavior change on the default path. Sleeping-agent direct jobs.json writes (no tool/CLI/REST) are covered by reconcile-on-wake in start(). F.2 — config: cron.chronos.{portal_url,callback_url,expected_audience, nas_jwks_url}. All non-secret; the agent holds no scheduler creds and the outbound provision call reuses the existing Nous token (no token key). Additive deep-merge key, no version literal. F.3 — docs: - docs/chronos-managed-cron-contract.md: authoritative agent↔NAS wire contract (the three agent-cron endpoints + inbound /api/cron/fire + the 3-hop trust model + at-most-once/re-arm semantics). This is what the NAS-side agent builds against. - cron-internals.md: "Managed cron (Chronos) for scale-to-zero" section. - cli-commands.md: cron.provider accepts chronos + the cron.chronos.* keys. - User docs name no scheduler vendor (QStash is a NAS-internal detail). INVARIANT re-verified: zero qstash/upstash hits across plugins/cron, gateway, hermes_cli, tools, website/docs (the one remaining repo hit is an unrelated Context7 MCP comment in tools/mcp_tool.py). Tests: test_jobs_changed_notify (5) — notify calls provider hook, swallows errors, built-in harmless, tool create/remove notify. Full cron + chronos + webhook + config + api_server_jobs suites green (504 in the cron+chronos+webhook run).	2026-06-18 15:11:32 +10:00
Ben	3fc7b624d8	feat(cron,gateway): NAS-JWT fire verifier + /api/cron/fire webhook (Chronos) Phase 4E (E.1 + E.2). The inbound side of Chronos: NAS POSTs the agent when a one-shot fires; the agent verifies a NAS-minted JWT and runs the job. E.1 — plugins/cron/chronos/verify.py: - verify_nas_fire_token(token, expected_audience, jwks_or_key, issuer): verifies signature against the NAS JWKS (RS/ES family; symmetric rejected), aud == this agent, exp/nbf, iss, and purpose == "cron_fire" (so a general agent JWT can't be replayed against the fire endpoint). Returns claims or None; never raises. Crypto delegated to PyJWT[crypto] (already a declared dep) — no hand-rolled JWT, no new dependency. No key configured → refuse (never unsigned-decode a security boundary). - get_fire_verifier(): pluggable indirection so the DQ-4 escape hatch (direct per-job cron-key) can swap in with no handler change. E.2 — gateway/platforms/api_server.py: - POST /api/cron/fire (registered only when _CRON_AVAILABLE). Authenticated by the NAS-JWT via get_fire_verifier() — NOT API_SERVER_KEY (NAS holds no API key; this is the only inbound that triggers remote job execution, so it gets its own purpose-scoped check). Verifier args come from cron.chronos.* config. 401 on bad/missing/forged token. 400 on missing job_id. On success: 202 + fire_due runs in the background (so a long agent turn never trips NAS's HTTP timeout); the store CAS claim inside fire_due de-dupes a scheduler retry. Tests: - test_chronos_verify (11): REAL RS256 signing — valid→claims, wrong-aud, missing/wrong purpose, expired, wrong-iss, tampered-signature (attacker key), no-key-refuse, empty-token, JWKS-URL key resolution, get_fire_verifier. - test_cron_fire_webhook (5): valid→202+fire, invalid→401+no-fire, missing token→401, missing job_id→400, and fire path does NOT require API_SERVER_KEY. api_server regression suites (214) green. E.3 (NAS endpoints) is a separate cross-repo PR; the wire contract lands next (docs/chronos-managed-cron-contract.md).	2026-06-18 14:46:33 +10:00
Teknium	5105c3651a	perf(api-server): normalize chat content linearly (#46079 )	2026-06-14 03:25:49 -07:00
helix4u	b23184cad4	fix(api-server): bind request session context for tools	2026-06-08 20:52:08 -07:00
konsisumer	3714caa1b9	fix(session): follow compression continuations for transcript reads	2026-06-07 23:57:20 -07:00
Teknium	30c7913617	fix(api_server): report hermes version on /health and /health/detailed (#40620 ) Salvaged from #40479; re-verified on main, tightened, tested. Co-authored-by: tfournet <tfournet@users.noreply.github.com>	2026-06-07 18:38:54 -07:00
Teknium	0c48b7165d	hardening(api-server): scan cron prompts on REST create/update for parity with the agent tool The agent-facing cronjob tool scans the user prompt with _scan_cron_prompt() before creating/updating a job (tools/cronjob_tools.py); the REST cron endpoints (POST /api/jobs, PATCH /api/jobs/{id}) validated length but not content. This adds the same scan to both handlers so an exfiltration/injection prompt is rejected the same way regardless of which surface created the job. NOT a security boundary, defense-in-depth / parity only: the REST cron endpoints are authenticated (every handler runs _check_auth, and connect() refuses to start without API_SERVER_KEY), and _scan_cron_prompt is a documented in-process heuristic, not a containment boundary (SECURITY.md 3.2). Raised externally via GHSA-fr3q-rjg3-x6mf (DNS-rebinding pre-auth RCE). The report's load-bearing 'no auth by default' premise was already closed three weeks after it was filed by the API_SERVER_KEY-required guard (commit `1a9ef8314`); this lands the create/update prompt-validation parity the report also pointed at. Scanner imported defensively so a missing scanner cannot disable the cron REST API.	2026-06-07 10:04:57 -07:00
annguyenNous	f7dabd3019	fix(api-server): guard json.loads against corrupted SQLite data in response cache The ResponseStore.get() method calls json.loads(row[0]) without any error handling. If the SQLite responses table contains corrupted JSON data (e.g. from a crash mid-write or disk corruption), this raises an unhandled JSONDecodeError that propagates to the caller. Fix: wrap in try/except (json.JSONDecodeError, TypeError). On parse failure, log a warning, evict the corrupted entry from the cache, and return None (consistent with the function's Optional return type).	2026-06-04 06:15:29 -07:00
Fearvox	4b06c98fe4	fix(gateway): close ResponseStore + dispose unowned adapter on reconnect failure Three separate code paths in the gateway's platform reconnect loop leaked file descriptors every retry, exhausting the default 2560-fd ulimit in ~12 hours of continuous failure and turning the gateway into a zombie that raises OSError: [Errno 24] on every open() (#37011). Root cause: * APIServerAdapter.__init__ opens a ResponseStore SQLite connection that holds 2 fds (db file + WAL sidecar). * APIServerAdapter.disconnect() previously only stopped the aiohttp web server — the ResponseStore connection was never closed. * The reconnect watcher in _platform_reconnect_watcher constructs a fresh adapter on every retry attempt. When the connect call fails (3 paths: non-retryable error, retryable error, exception during connect) the adapter is dropped without ever being installed on self.adapters, so nothing else calls its disconnect(). Result: the 2 ResponseStore fds stay open until GC sweeps the unreachable object, which Python's cyclic GC does not do promptly for asyncio-bound native handles. 2 fds × 1 retry × (3600s / 300s backoff cap) ≈ 12 fds/hour. 2560 fds / 12 fds/hr ≈ 12h to ulimit exhaustion. Fix: * APIServerAdapter.disconnect() now also calls self._response_store.close() (with a try/except so a SQLite close failure doesn't abort the aiohttp teardown). * New module-level helper _dispose_unused_adapter(adapter) in gateway/run.py that calls adapter.disconnect() and swallows any exception (so half-constructed adapters whose __init__ crashed don't kill the watcher loop). * _platform_reconnect_watcher calls _dispose_unused_adapter() in all three failure paths: non-retryable, retryable, and the except Exception arm. adapter = None is initialized before the try so the except arm can see the partial construction. Tests: * New file tests/gateway/test_platform_reconnect_fd_leak.py with 7 regression tests covering all three failure paths, the _dispose_unused_adapter helper (None + raising-disconnect cases), and the APIServerAdapter ResponseStore close behavior (success + close-exception cases). The _CountingAdapter fixture tracks disconnect() invocations and an _open_fds counter that is decremented on dispose, so the assertion is the literal observable behavior of the leak. Refs: - Closes #37011 (the original fd-leak report) - Supersedes #37018, #37110, #37238, #37260, #37394 (7 competing open PRs all addressing the same root cause from different angles; none of them rebased cleanly against current main, and none covered all three failure paths in one fix with regression tests for both the watcher and the platform-level close behavior)	2026-06-02 17:27:44 -07:00
Teknium	1cb850b674	fix(api_server): emit per-turn transcript on run.completed (#34703 ) (#34804 ) * docs(code-execution): document HERMES_* env narrowing + passthrough workaround The execute_code sandbox-child env scrub (`108397726`, #27303) deliberately dropped the broad HERMES_ prefix passthrough, keeping only an operational 4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES__WEBHOOK, or a plugin-defined one) now sees it unset in the child. Document the behavior change and the two recovery routes (terminal.env_passthrough in config.yaml, or required_environment_variables in skill frontmatter), plus the debug log line that surfaces the drop for diagnosis. fix(api_server): emit per-turn transcript on run.completed (#34703) WebUI clients lost intermediate (pre-tool-call) assistant text after switching session pages mid-stream. The session-chat SSE stream delivers all assistant text as assistant.delta events under one message_id interleaved with tool.* events, then a single assistant.completed carrying only the final reply — so a client accumulating deltas into one buffer cannot reconstruct intermediate text segments that preceded tool calls, and they vanish from the live view (state.db persists them correctly). run.completed now carries the authoritative per-turn transcript (assistant + tool messages for this turn, in client-safe shape) so any SSE consumer can reconcile its live view against ground truth without a separate GET /messages round-trip. Purely additive — clients that ignore the field are unaffected.	2026-05-29 12:27:49 -07:00
Dusk1e	1a9ef83147	fix(security): require API_SERVER_KEY before dispatching API server work	2026-05-28 00:25:08 -07:00
Teknium	96223265b9	chore(api-server): mark skills_api capability True now that /v1/skills shipped #33016 added GET /v1/skills + /v1/toolsets on the API server; the capability flag introduced in this branch was placeholder-False. Flip to True so capability probers see the truth.	2026-05-27 01:56:55 -07:00
Jonathan	464b51d455	Support media in session chat API	2026-05-27 01:56:55 -07:00
Bailey Dixon	f7527b0fdb	feat: add API server session controls	2026-05-27 01:56:55 -07:00
Teknium	25f43d38de	feat(api-server): add GET /v1/skills and /v1/toolsets (#33016 ) Lets external clients enumerate the agent's skills and resolved toolsets deterministically over the OpenAI-compatible API server, without standing up the dashboard web server or sending a chat message and asking the model to list them. - GET /v1/skills — list installed skills (name, description, category) - GET /v1/toolsets — list toolsets resolved for the api_server platform, with enabled/configured state and the concrete tool names each expands to - Both gated by API_SERVER_KEY (same Bearer scheme as every other /v1/* endpoint) - /v1/capabilities advertises both new endpoints Closes the gap a community user just hit asking how to list skills over REST when only the OpenAI-compatible server is running. Test plan - python -m pytest tests/gateway/test_api_server.py -k "Skills or Toolsets or Capabilities" -o 'addopts=' -q → 9/9 pass - python -m pytest tests/gateway/test_api_server.py -o 'addopts=' -q → 156/156 pass, no regressions - E2E: started a real adapter on an isolated HERMES_HOME with a fake skill installed; curl-equivalent calls to /v1/capabilities, /v1/skills, /v1/toolsets returned the expected JSON; unauthenticated calls returned 401 with the configured API_SERVER_KEY.	2026-05-27 01:27:26 -07:00
Glen Workman	d952b377aa	fix: add cron API provenance logging (#24889 ) Co-authored-by: sgtworkman <178342791+sgtworkman@users.noreply.github.com>	2026-05-25 01:15:56 -07:00
Hinotoi-agent	3bace071bf	fix(state): restrict sensitive store file permissions response_store.db (api server) holds conversation history including tool payloads, prompts, and results. webhook_subscriptions.json holds per-route HMAC secrets. Under a permissive umask (e.g. 0o022, default on most distros) both files were created mode 0o644 — readable by other local users on shared boxes. - gateway/platforms/api_server.py: ResponseStore tightens itself + WAL/SHM sidecars to 0o600 after __init__, then trusts the inode. (Original contributor patch chmod'd after every _commit() — wasteful on a hot api_server path; chmod-on-create is sufficient since SQLite preserves mode bits across writes.) - hermes_cli/webhook.py: _save_subscriptions writes via tempfile.mkstemp (which itself creates the file with 0o600), chmods the temp before the atomic rename, and re-asserts 0o600 on the destination so an existing permissive file from before this fix gets narrowed. Tests cover (a) creation under permissive umask leaves 0o600 and (b) an existing 0o644 webhook_subscriptions.json gets narrowed on next save. Tests guarded with skipif os.name=='nt' since POSIX mode bits don't apply on Windows. Salvaged from PR #30917 by @Hinotoi-agent. Reworked the api_server.py side from chmod-on-every-commit to chmod-on-create. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>	2026-05-24 04:55:18 -07:00
bitkyc08-arch	5631345b12	[agent] fix: harden api server response headers	2026-05-16 23:11:43 -07:00
Sylw3ster	8d4766afca	fix(api_server): coerce stringified booleans in request payloads	2026-05-16 23:02:02 -07:00
CoinTheHat	814c60092b	fix: clean stale conversation mappings on response eviction/deletion ResponseStore.put() and .delete() now remove conversations rows that reference evicted or deleted response IDs, preventing 404 errors when a conversation name is reused after its backing response was purged. Adds regression tests for delete, eviction, and handler-level reuse. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-15 01:27:43 -07:00
Ahmet Oşrak	4bb0a82a2b	fix(gateway): enqueue SSE EOS sentinel on task completion	2026-05-12 15:04:54 -07:00
kshitij	2ec8d2b42f	chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937 ) Replace with for all literal-tuple membership tests. Set lookup is O(1) vs O(n) for tuple — consistent micro-optimization across the codebase. 608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining. 133 files, +626/-626 (net zero).	2026-05-11 11:13:25 -07:00
Teknium	2124ad72a2	fix(api-server): emit length/error finish_reason for truncation/failure (#22775 ) Non-streaming /v1/chat/completions wrapped any AIAgent result \u2014 including partial/failed runs \u2014 as a successful 200 with finish_reason='stop' and the internal failure string substituted into message.content. API clients had no way to distinguish 'agent answered: X' from 'agent crashed and the X you see is its error message'. After the fix: - completed: True \u2192 200 finish_reason='stop' (unchanged) - partial + truncated text \u2192 200 finish_reason='length' + hermes extras - partial + no text / failed \u2192 502 OpenAI error envelope (SDKs raise) - other failures \u2192 200 finish_reason='error' + hermes extras Adds X-Hermes-Completed / X-Hermes-Partial / X-Hermes-Error headers plus a 'hermes' extras object on partial responses for clients that want the full picture. Closes #22496.	2026-05-09 12:48:08 -07:00
kshitij	2a7047c2ed	fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043 ) SQLite's WAL mode requires shared-memory (mmap) coordination and fcntl byte-range locks that don't reliably work on network filesystems. Upstream documents this explicitly: https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode On NFS / SMB / some FUSE mounts / WSL1, 'PRAGMA journal_mode=WAL' raises 'sqlite3.OperationalError: locking protocol' (SQLITE_PROTOCOL). Before this change, every feature backed by state.db or kanban.db broke silently: - /resume, /title, /history, /branch returned 'Session database not available.' with no cause - gateway logged the init failure at DEBUG (invisible in errors.log) - kanban dispatcher crashed every 60s, driving the known migration race (duplicate column name: consecutive_failures, #21708 / #21374) Changes: - hermes_state.apply_wal_with_fallback(): shared helper that tries WAL and falls back to DELETE on SQLITE_PROTOCOL-style errors with one WARNING explaining why - hermes_state.get_last_init_error() + format_session_db_unavailable(): capture the init failure cause and surface it in user-facing strings (with an NFS/SMB pointer for 'locking protocol') - hermes_cli/kanban_db.connect(): use the shared helper - gateway/run.py: bump SessionDB init failure log DEBUG -> WARNING (matches cli.py's existing correct behavior) - cli.py (4 sites) + gateway/run.py (5 sites): replace bare 'Session database not available.' with format_session_db_unavailable() Tests: 12 new tests in tests/test_hermes_state_wal_fallback.py + 1 new test in tests/hermes_cli/test_kanban_db.py. Existing suites (state, kanban, gateway, cli) remain green for all tests unrelated to pre-existing failures on main. Evidence: real-world user on NFSv3 mount (172.26.224.200:d2dfac12/home, local_lock=none) reporting 'Session database not available.' on /resume; 'locking protocol' appears in 4 distinct log entries across backup, kanban, TUI, and CLI paths in the same session. closes #22032	2026-05-09 02:09:35 -07:00
Zhicheng Han	526c0e018a	feat(api-server): expose run approval events	2026-05-08 07:30:14 -07:00
wabrent	98ca0694d6	fix(gateway): log agent task failures instead of silently losing usage data	2026-05-07 06:25:03 -07:00
pingchesu	43a6645718	docs: clarify API server tool execution locality	2026-05-07 05:30:37 -07:00
thelumiereguy	8a96fa48c1	fix(gateway): avoid duplicated responses history	2026-05-07 05:07:59 -07:00
bogerman1	3188e63b05	fix(api_server): SSE token batching + error handling for Open WebUI performance Reduces SSE event rate ~500/turn → ~20/turn via 50ms text-delta batching in _dispatch(), which eliminates markdown re-render storms on Open WebUI. Also: - Trim tool_call.arguments in the response.completed event to 100KB (prevents silent hangs on 848KB+ single-line SSE events). - Catch-all exception handlers in _write_sse_responses() + _write_sse_chat_completion() emit a proper error chunk instead of TransferEncodingError from incomplete chunked encoding when the agent crashes mid-stream. - MAX_REQUEST_BYTES 1MB → 10MB; pass client_max_size to aiohttp Application to avoid silent 400s on truncated request bodies for long conversations. Salvage of #17552 (api_server portion only). The contrib/openwebui-filter/ payload from that PR — Open WebUI Filter Function + benchmark writeup — is a client-side user-installable add-on and doesn't need to live in the repo; dropped here. Closes #17537. Co-authored-by: bogerman1 <93757150+bogerman1@users.noreply.github.com>	2026-05-05 15:13:36 -07:00
Teknium	314361733f	test(api_server): _run_agent result now carries session_id for #16938	2026-05-05 06:01:03 -07:00
vominh1919	7f735b4db2	fix: return effective session_id after context compression (#16938 ) When context compression rotates the agent's session_id to a new child session, the API server was still returning the stale parent session_id in the X-Hermes-Session-Id response header. This caused external clients to keep sending the old session_id, loading uncompressed parent history instead of the compressed continuation. Fix: _run_agent() now includes the effective session_id in its result dict, and the response header uses it instead of the original provided session_id.	2026-05-05 06:01:03 -07:00
Teknium	fe8560fc12	feat(api-server): X-Hermes-Session-Key header for long-term memory scoping (#20199 ) * feat(api-server): X-Hermes-Session-Key header for long-term memory scoping API Server integrations (Open WebUI, custom web UIs) can now pass a stable per-channel identifier via X-Hermes-Session-Key that scopes long-term memory (Honcho, etc.) independently of the transcript-scoped X-Hermes-Session-Id. This matches the native gateway's session_key / session_id split: one stable key per assistant channel, many independent transcripts that rotate on /new. - _create_agent and _run_agent accept gateway_session_key and pass it to AIAgent(gateway_session_key=...), which is already honored by the Honcho memory provider (plugins/memory/honcho/client.py resolve_session_name). - New shared helper _parse_session_key_header applies the same API-key gate, control-character sanitization, and a 256-char length cap as the existing session-id header. - All three agent endpoints honor the header: /v1/chat/completions, /v1/responses, /v1/runs. JSON and SSE responses echo it back. - /v1/capabilities advertises session_key_header so clients can feature-detect. Closes #20060. Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com> * chore: AUTHOR_MAP entry for manateelazycat --------- Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>	2026-05-05 05:34:47 -07:00
ygd58	297eaa3533	fix(api_server): emit run.failed when run_conversation returns failed=True When run_conversation encounters a non-retryable client error (401, 400, etc.), it returns a dict with failed=True instead of raising. The gateway's _run_and_close only branched on exceptions, so it always emitted run.completed even for failed runs — clients could not distinguish success from failure. Inspect the result dict before emitting: if failed=True, emit run.failed with the error message; otherwise emit run.completed as before. The existing except Exception path is unchanged for genuine programming errors. Fixes #15561	2026-05-04 04:47:36 -07:00
Asunfly	8a364df2c8	fix: inherit reasoning config in API server runs	2026-05-04 01:44:16 -07:00
Zyproth	a5cae16496	fix(api_server): fall back to default port on malformed API_SERVER_PORT	2026-05-03 15:27:03 -07:00
hharry11	2997ef9446	fix(api-server): use session-scoped task IDs for tool isolation	2026-04-30 19:59:38 -07:00
briandevans	e0a03f3f40	fix(api-server): collapse tool start/lifecycle into a single SSE event Address Copilot review on PR #16666: 1. Duplicate event on every tool start — both ``tool_progress_callback`` and ``tool_start_callback`` fire side-by-side in ``run_agent.py``, so wiring both into chat completions emitted two ``hermes.tool.progress`` events per real tool call. Drop the legacy ``_on_tool_progress`` emit entirely; ``_on_tool_start`` now produces a single unified event that carries the legacy ``tool``/``emoji``/``label`` fields plus the new ``toolCallId``/``status`` correlation fields. Label is computed inline via ``build_tool_preview`` so callers do not need to pre-format it. 2. Weak per-event correlation in the regression test — the previous assertion checked that a ``toolCallId`` appeared somewhere in the aggregate, which would have passed even if ``running`` lacked the id. Collect ``(status, toolCallId)`` per event and assert each event carries the correct pair, plus exactly two events on the wire (no silent duplication regression). The two existing chat-completions tool-progress tests are updated to fire ``tool_start_callback`` instead of ``tool_progress_callback``, matching production reality where ``run_agent`` always pairs them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 08:08:16 -07:00
Magaav	810d98e892	feat(api_server): expose run status for external UIs (#17085 ) Adds two API server endpoints for external UIs and orchestrators: - GET /v1/capabilities — machine-readable feature discovery so clients can detect which Runs API / SSE / auth features this Hermes version supports before depending on them. - GET /v1/runs/{run_id} — pollable run status so dashboards can check queued/running/completed/failed/cancelled/stopping state without holding an SSE connection open. Also moves request validation ahead of run allocation so invalid payloads no longer leave orphaned entries in _run_streams waiting for the TTL sweep. task_id is intentionally kept as "default" for the Runs API to preserve the shared-sandbox model used by CLI, gateway, and the existing _run_agent_with_callbacks path. session_id is surfaced in run status for external-UI correlation only. Salvage of PR #17085 by @Magaav.	2026-04-29 06:38:10 -07:00
Teknium	01535a4732	fix(api_server): cap stop-run wait at 5s so interrupt can't hang handler task.cancel() can't preempt the run_in_executor thread running run_conversation(), so we rely on agent.interrupt() to wake the loop. Without a timeout, a slow/unresponsive interrupt blocks the HTTP response indefinitely. Wrap the await in wait_for(shield(task), 5.0) and log a warning on timeout. Also tidy one extra space in the module docstring's /stop entry.	2026-04-25 18:40:35 -07:00
ekko	0a15dbdc43	feat(api_server): add POST /v1/runs/{run_id}/stop endpoint Add ability to interrupt a running agent via the runs API. Previously /v1/runs could start a run and subscribe to events, but there was no way to cancel it. The new endpoint stores agent and task references during execution, calls agent.interrupt() to stop LLM calls, then cancels the asyncio task. Includes 15 tests covering start, events, and stop scenarios.	2026-04-25 18:40:35 -07:00
Teknium	36d68bcb82	fix(api-server): persist incomplete snapshot on asyncio.CancelledError too Extends PR #15171 to also cover the server-side cancellation path (aiohttp shutdown, request-level timeout) — previously only ConnectionResetError triggered the incomplete-snapshot write, so cancellations left the store stuck at the in_progress snapshot written on response.created. Factors the incomplete-snapshot build into a _persist_incomplete_if_needed() helper called from both the ConnectionResetError and CancelledError branches; the CancelledError handler re-raises so cooperative cancellation semantics are preserved. Adds two regression tests that drive _write_sse_responses directly (the TestClient disconnect path races the server handler, which makes the end-to-end assertion flaky).	2026-04-24 15:22:19 -07:00
UgwujaGeorge	a29bad2a3c	fix(api-server): persist response snapshot on client disconnect when store=True	2026-04-24 15:22:19 -07:00

1 2

95 commits