hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-16 09:31:37 +00:00

Author	SHA1	Message	Date
Teknium	de491fdf0e	chore: remove unit tests from maps skill Skills are self-contained scripts — they don't need test suites in the repo.	2026-04-19 05:19:22 -07:00
Mibayy	7fa01fafa5	feat: add maps skill (OpenStreetMap + Overpass + OSRM, no API key) Adds a maps optional skill with 8 commands, 44 POI categories, and zero external dependencies. Uses free open data: Nominatim, Overpass API, OSRM, and TimeAPI.io. Commands: search, reverse, nearby, distance, directions, timezone, area, bbox. Improvements over original PR #2015: - Fixed directory structure (optional-skills/productivity/maps/) - Fixed distance argparse (--to flag instead of broken dual nargs=+) - Fixed timezone (TimeAPI.io instead of broken worldtimeapi heuristic) - Expanded POI categories from 12 to 44 - Added directions command with turn-by-turn OSRM steps - Added area command (bounding box + dimensions for a named place) - Added bbox command (POI search within a geographic rectangle) - Added 23 unit tests - Improved haversine (atan2 for numerical stability) - Comprehensive SKILL.md with workflow examples Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>	2026-04-19 05:19:22 -07:00
Teknium	206a449b29	feat(webhook): direct delivery mode for zero-LLM push notifications (#12473 ) External services can now push plain-text notifications to a user's chat via the webhook adapter without invoking the agent. Set deliver_only=true on a route and the rendered prompt template becomes the literal message body — dispatched directly to the configured target (Telegram, Discord, Slack, GitHub PR comment, etc.). Reuses all existing webhook infrastructure: HMAC-SHA256 signature validation, per-route rate limiting, idempotency cache, body-size limits, template rendering with dot-notation, home-channel fallback. No new HTTP server, no new auth scheme, no new port. Use cases: Supabase/Firebase webhooks → user notifications, monitoring alert forwarding, inter-agent pings, background job completion alerts. Changes: - gateway/platforms/webhook.py: new _direct_deliver() helper + early dispatch branch in _handle_webhook when deliver_only=true. Startup validation rejects deliver_only with deliver=log. - hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on subscribe; list/show output marks direct-delivery routes. - website/docs/user-guide/messaging/webhooks.md: new Direct Delivery Mode section with config example, CLI example, response codes. - skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only with use cases (bumped to v1.1.0). - tests/gateway/test_webhook_deliver_only.py: 14 new tests covering agent bypass, template rendering, status codes, HMAC still enforced, idempotency still applies, rate limit still applies, startup validation, and direct-deliver dispatch. Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified with real aiohttp server + real urllib POST — agent not invoked, target adapter.send() called with rendered template, duplicate delivery_id suppressed. Closes the gap identified in PR #12117 (thanks to @H1an1 / Antenna team) without adding a second HTTP ingress server.	2026-04-19 05:18:19 -07:00
Teknium	66ee081dc1	skills: move 7 niche mlops/mcp skills to optional (#12474 ) Built-in → optional-skills/: mlops/training/peft → optional-skills/mlops/peft mlops/training/pytorch-fsdp → optional-skills/mlops/pytorch-fsdp mlops/models/clip → optional-skills/mlops/clip mlops/models/stable-diffusion → optional-skills/mlops/stable-diffusion mlops/models/whisper → optional-skills/mlops/whisper mlops/cloud/modal → optional-skills/mlops/modal mcp/mcporter → optional-skills/mcp/mcporter Built-in mlops training kept: axolotl, trl-fine-tuning, unsloth. Built-in mlops models kept: audiocraft, segment-anything. Built-in mlops evaluation/research/huggingface-hub/inference all kept. native-mcp stays built-in (documents the native MCP tool); mcporter was a redundant alternative CLI. Also: removed now-empty skills/mlops/cloud/ dir, refreshed skills/mlops/models/DESCRIPTION.md and skills/mcp/DESCRIPTION.md to match what's left, and synchronized both catalog pages (skills-catalog.md, optional-skills-catalog.md).	2026-04-19 05:14:17 -07:00
kshitijk4poor	957ca79e8e	fix(feishu): drop dead helper and cover repeated fenced blocks	2026-04-19 03:30:36 -07:00
kshitijk4poor	a9debf10ff	fix(feishu): harden fenced post row splitting	2026-04-19 03:30:36 -07:00
sgaofen	cc59d133dc	fix(feishu): split fenced code blocks in post payload	2026-04-19 03:30:36 -07:00
kshitijk4poor	4f0e49dc7b	chore: add sgaofen to AUTHOR_MAP	2026-04-19 03:30:03 -07:00
kshitijk4poor	4b6ff0eb7f	fix: tighten gateway interrupt salvage follow-ups Follow-up on top of the helix4u #12388 cherry-picks: - make deferred post-delivery callbacks generation-aware end-to-end so stale runs cannot clear callbacks registered by a fresher run for the same session - bind callback ownership to the active session event at run start and snapshot that generation inside base adapter processing so later event mutation cannot retarget cleanup - pass run_generation through proxy mode and drop stale proxy streams / final results the same way local runs are dropped - centralize stop/new interrupt cleanup into one helper and replace the open-coded branches with shared logic - unify internal control interrupt reason strings via shared constants - remove the return from base.py's finally block so cleanup no longer swallows cancellation/exception flow - add focused regressions for generation forwarding, proxy stale suppression, and newer-callback preservation This addresses all review findings from the initial #12388 review while keeping the fix scoped to stale-output/typing-loop interrupt handling.	2026-04-19 03:03:57 -07:00
helix4u	8466268ca5	fix(gateway): keep typing loop overrides backward-compatible	2026-04-19 03:03:57 -07:00
helix4u	150382e8b7	fix(gateway): stop typing loops on session interrupt	2026-04-19 03:03:57 -07:00
helix4u	b05d30418d	docs: clarify profiles vs workspaces	2026-04-19 02:00:46 -07:00
kshitijk4poor	ff63e2e005	fix: tighten telegram docker-media salvage follow-ups Follow-up on top of the helix4u #6392 cherry-pick: - reuse one helper for actionable Docker-local file-not-found errors across document/image/video/audio local-media send paths - include /outputs/... alongside /output/... in the container-local path hint - soften the gateway startup warning so it does not imply custom host-visible mounts are broken; the warning now targets the specific risky pattern of emitting container-local MEDIA paths without an explicit export mount - add focused regressions for /outputs/... and non-document media hint coverage This keeps the salvage aligned with the actual MEDIA delivery problem on current main while reducing false-positive operator messaging.	2026-04-19 01:55:33 -07:00
helix4u	588333908c	fix(telegram): warn on docker-only media paths	2026-04-19 01:55:33 -07:00
Tranquil-Flow	b668c09ab2	fix(gateway): strip cursor from frozen message on empty fallback continuation (#7183 ) When _send_fallback_final() is called with nothing new to deliver (the visible partial already matches final_text), the last edit may still show the cursor character because fallback mode was entered after a failed edit. Before this fix the early-return path left _already_sent = True without attempting to strip the cursor, so the message stayed frozen with a visible ▉ permanently. Adds a best-effort edit inside the empty-continuation branch to clean the cursor off the last-sent text. Harmless when fallback mode wasn't actually armed or when the cursor isn't present. If the strip edit itself fails (flood still active), we return without crashing and without corrupting _last_sent_text. Adapted from PR #7429 onto current main — the surrounding fallback block grew the #10807 stale-prefix handling since #7429 was written, so the cursor strip lives in the new else-branch where we still return early. 3 unit tests covering: cursor stripped on empty continuation, no edit attempted when cursor is not configured, cursor-strip edit failure handled without crash. Originally proposed as PR #7429.	2026-04-19 01:51:12 -07:00
Teknium	62ce6a38ae	fix(gateway): cancel_background_tasks must drain late-arrivals (#12471 ) During gateway shutdown, a message arriving while cancel_background_tasks is mid-await (inside asyncio.gather) spawns a fresh _process_message_background task via handle_message and adds it to self._background_tasks. The original implementation's _background_tasks.clear() at the end of cancel_background_tasks dropped the reference; the task ran untracked against a disconnecting adapter, logged send-failures, and lingered until it completed on its own. Fix: wrap the cancel+gather in a bounded loop (MAX_DRAIN_ROUNDS=5). If new tasks appeared during the gather, cancel them in the next round. The .clear() at the end is preserved as a safety net for any task that appeared after MAX_DRAIN_ROUNDS — but in practice the drain stabilizes in 1-2 rounds. Tests: tests/gateway/test_cancel_background_drain.py — 3 cases. - test_cancel_background_tasks_drains_late_arrivals: spawn M1, start cancel, inject M2 during M1's shielded cleanup, verify M2 is cancelled. - test_cancel_background_tasks_handles_no_tasks: no-op path still terminates cleanly. - test_cancel_background_tasks_bounded_rounds: baseline — single task cancels in one round, loop terminates. Regression-guard validated: against the unpatched implementation, the late-arrival test fails with exactly the expected message ('task leaked'). With the fix it passes. Blast radius is shutdown-only; the audit classified this as MED. Shipping because the fix is small and the hygiene is worth it. While investigating the audit's other MEDs (busy-handler double-ack, Discord ExecApprovalView double-resolve, UpdatePromptView double-resolve), I verified all three were false positives — the check-and-set patterns have no await between them, so they're atomic on single-threaded asyncio. No fix needed for those.	2026-04-19 01:48:42 -07:00
konsisumer	1d1e1277e4	fix(gateway): flush undelivered tail before segment reset to preserve streamed text (#8124 ) When a streaming edit fails mid-stream (flood control, transport error) and a tool boundary arrives before the fallback threshold is reached, the pre-boundary tail in `_accumulated` was silently discarded by `_reset_segment_state`. The user saw a frozen partial message and missing words on the other side of the tool call. Flush the undelivered tail as a continuation message before the reset, computed relative to the last successfully-delivered prefix so we don't duplicate content the user already saw.	2026-04-19 01:43:04 -07:00
Teknium	e017131403	feat(cron): add wakeAgent gate — scripts can skip the agent entirely Extends the existing cron script hook with a wake gate ported from nanoclaw #1232. When a cron job's pre-check Python script (already sandboxed to HERMES_HOME/scripts/) writes a JSON line like ```json {"wakeAgent": false} ``` on its last stdout line, `run_job()` returns the SILENT marker and skips the agent entirely — no LLM call, no delivery, no tokens spent. Useful for frequent polls (every 1-5 min) that only need to wake the agent when something has genuinely changed. Any other script output (non-JSON, missing key, non-dict, `wakeAgent: true`, truthy/falsy non-False values) behaves as before: stdout is injected as context and the agent runs normally. Strict `False` is required to skip — avoids accidental gating from arbitrary JSON. Refactor: - New pure helper `_parse_wake_gate(script_output)` in cron/scheduler.py - `_build_job_prompt` accepts optional `prerun_script` tuple so the script runs exactly once per job (run_job runs it for the gate check, reuses the output for prompt injection) - `run_job` short-circuits with SILENT_MARKER when gate fires Script failures (success=False) still cannot trigger the gate — the failure is reported as context to the agent as before. This replaces the approach in closed PR #3837, which inlined bash scripts via tempfile and lost the path-traversal/scripts-dir sandbox that main's impl has. The wake-gate idea (the one net-new capability) is ported on top of the existing sandboxed Python-script model. Tests: - 11 pure unit tests for _parse_wake_gate (empty, whitespace, non-JSON, non-dict JSON, missing key, truthy/falsy non-False, multi-line, trailing blanks, non-last-line JSON) - 5 integration tests for run_job wake-gate (skip returns SILENT, wake-true passes through, script-runs-only-once, script failure doesn't gate, no-script regression) - Full tests/cron/ suite: 194/194 pass	2026-04-19 01:42:35 -07:00
helix4u	c94d26c69b	fix(cli): sanitize interactive command output	2026-04-19 01:16:34 -07:00
kshitijk4poor	175cf7e6bb	fix: tighten quiet-mode salvage follow-ups Follow-up for the helix4u easy-fix salvage batch: - route remaining context-engine quiet-mode output through _should_emit_quiet_tool_messages() so non-CLI/library callers stay silent consistently - drop the extra senderAliases computation from WhatsApp allowlist-drop logging and remove the now-unused import This keeps the batch scoped to the intended fixes while avoiding leaked quiet-mode output and unnecessary duplicate work in the bridge.	2026-04-19 00:28:25 -07:00
helix4u	cd59af17cc	fix(agent): silence quiet_mode in python library use	2026-04-19 00:28:25 -07:00
helix4u	361675018f	fix(setup): stop hardcoding max-iterations copy	2026-04-19 00:28:25 -07:00
helix4u	3ade655999	fix(whatsapp): log allowlist drops in bridge	2026-04-19 00:28:25 -07:00
Teknium	7c10761dd2	fix(discord): shield text-batch flush from follow-up cancel (#12444 ) When Discord splits a long message at 2000 chars, _enqueue_text_event buffers each chunk and schedules a _flush_text_batch task with a short delay. If another chunk lands while the prior flush task is already inside handle_message, _enqueue_text_event calls prior_task.cancel() — and without asyncio.shield, CancelledError propagates from the flush task into handle_message → the agent's streaming request, aborting the response the user was waiting on. Reproducer: user sends a 3000-char prompt (split by Discord into 2 messages). Chunk 1 lands, flush delay starts, chunk 2 lands during the brief window when chunk 1's flush has already committed to handle_message. Agent's current streaming response is cancelled with CancelledError, user sees a truncated or missing reply. Fix (gateway/platforms/discord.py): - Wrap the handle_message call in asyncio.shield so the inner dispatch is protected from the outer task's cancel. - Add an except asyncio.CancelledError clause so the outer task still exits cleanly when cancel lands during the sleep window (before the pop) — semantics for that path are unchanged. The new flush task spawned by the follow-up chunk still handles its own batch via the normal pending-message / active-session machinery in base.py, so follow-ups are not lost. Tests: tests/gateway/test_text_batching.py — test_shield_protects_handle_message_from_cancel. Tracks a distinct first_handle_cancelled event so the assertion fails cleanly when the shield is missing (verified by stashing the fix and re-running). Live E2E on the live-loaded DiscordAdapter: first_handle_cancelled: False (shield worked) first_handle_completed: True (handle_message ran to completion)	2026-04-19 00:09:38 -07:00
Teknium	dca439fe92	fix(tui): scope session.interrupt pending-prompt release to the calling session (#12441 ) session.interrupt on session A was blast-resolving pending clarify/sudo/secret prompts on ALL sessions sharing the same tui_gateway process. Other sessions' agent threads unblocked with empty-string answers as if the user had cancelled — silent cross-session corruption. Root cause: _pending and _answers were globals keyed by random rid with no record of the owning session. _clear_pending() iterated every entry, so the session.interrupt handler had no way to limit the release to its own sid. Fix: - tui_gateway/server.py: _pending now maps rid to (sid, Event) tuples. _clear_pending takes an optional sid argument and filters by owner_sid when provided. session.interrupt passes the calling sid so unrelated sessions are untouched. _clear_pending(None) remains the shutdown path for completeness. - _block and _respond updated to pack/unpack the new tuple format. Tests (tests/test_tui_gateway_server.py): 4 new cases. - test_interrupt_only_clears_own_session_pending: two sessions with pending prompts, interrupting one must not release the other. - test_interrupt_clears_multiple_own_pending: same-sid multi-prompt release works. - test_clear_pending_without_sid_clears_all: shutdown path preserved. - test_respond_unpacks_sid_tuple_correctly: _respond handles the tuple format. Also updated tests/tui_gateway/test_protocol.py to use the new tuple format for test_block_and_respond and test_clear_pending. Live E2E against the live Python environment confirmed cross-session isolation: interrupting sid_a released its own pending prompt without touching sid_b's. All 78 related tests pass.	2026-04-19 00:03:58 -07:00
Teknium	ce410521b3	feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369 ) Agents can now send arbitrary CDP commands to the browser. The tool is gated on a reachable CDP endpoint at session start — it only appears in the toolset when BROWSER_CDP_URL is set (from '/browser connect') or 'browser.cdp_url' is configured in config.yaml. Backends that don't currently expose CDP to the Python side (Camofox, default local agent-browser, cloud providers whose per-session cdp_url is not yet surfaced) do not see the tool at all. Tool schema description links to the CDP method reference at https://chromedevtools.github.io/devtools-protocol/ so the agent can web_extract specific method docs on demand. Stateless per call. Browser-level methods (Target., Browser., Storage.*) omit target_id. Page-level methods attach to the target with flatten=true and dispatch the method on the returned sessionId. Clean errors when the endpoint becomes unreachable mid-session or the URL isn't a WebSocket. Tests: 19 unit (mock CDP server + gate checks) + E2E against real headless Chrome (Target.getTargets, Browser.getVersion, Runtime.evaluate with target_id, Page.navigate + re-eval, bogus method, bogus target_id, missing endpoint) + E2E of the check_fn gate (tool hidden without CDP URL, visible with it, hidden again after unset).	2026-04-19 00:03:10 -07:00
helix4u	d66414a844	docs(custom-providers): use key_env in examples	2026-04-18 23:07:59 -07:00
helix4u	7b1a11b971	fix(memory): keep Honcho provider opt-in	2026-04-18 22:50:55 -07:00
kshitijk4poor	0a8d48809f	chore: add LeonSGP43 numeric noreply email to AUTHOR_MAP The cherry-picked commit from #11434 uses the 154585401+ prefixed noreply format. Add it alongside the existing bare entry so the contributor audit passes.	2026-04-18 22:50:55 -07:00
Erosika	21d5ef2f17	feat(honcho): wizard cadence default 2, surface reasoning level, backwards-compat fallback Setup wizard now always writes dialecticCadence=2 on new configs and surfaces the reasoning level as an explicit step with all five options (minimal / low / medium / high / max), always writing dialecticReasoningLevel. Code keeps a backwards-compat fallback of 1 when dialecticCadence is unset so existing honcho.json configs that predate the setting keep firing every turn on upgrade. New setups via the wizard get 2 explicitly; docs show 2 as the default. Also scrubs editorial lines from code and docs ("max is reserved for explicit tool-path selection", "Unset → every turn; wizard pre-fills 2", and similar process-exposing phrasing) and adds an inline link to app.honcho.dev where the server-side observation sync is mentioned in honcho.md. Recommended cadence range updated to 1-5 across docs and wizard copy.	2026-04-18 22:50:55 -07:00
LeonSGP43	5b6792f04d	fix(honcho): scope gateway sessions by runtime user id	2026-04-18 22:50:55 -07:00
Erosika	ba7da73ca9	test(honcho): drop two first-turn tests subsumed by prewarm + smoke coverage - TestDialecticDepth::test_first_turn_runs_dialectic_synchronously: covered by TestSessionStartDialecticPrewarm::test_turn1_falls_back_to_sync_when_prewarm_missing (more realistic — exercises the empty-prewarm → sync-fallback path) - TestDialecticDepth::test_first_turn_dialectic_does_not_double_fire: covered by TestDialecticLifecycleSmoke (turn 1 flow) and TestDialecticCadenceAdvancesOnSuccess::test_empty_dialectic_result_does_not_advance_cadence Both predate the prewarm refactor and test paths that are now fallback behaviors already covered elsewhere.	2026-04-18 22:50:55 -07:00
Erosika	c630dfcdac	feat(honcho): dialectic liveness — stale-thread watchdog, stale-result discard, empty-streak backoff Hardens the dialectic lifecycle against three failure modes that could leave the prefetch pipeline stuck or injecting stale content: - Stale-thread watchdog: _thread_is_live() treats any prefetch thread older than timeout × 2.0 as dead. A hung Honcho call can no longer block subsequent fires indefinitely. - Stale-result discard: pending _prefetch_result is tagged with its fire turn. prefetch() discards the result if more than cadence × 2 turns passed before a consumer read it (e.g. a run of trivial-prompt turns between fire and read). - Empty-streak backoff: consecutive empty dialectic returns widen the effective cadence (dialectic_cadence + streak, capped at cadence × 8). A healthy fire resets the streak. Prevents the plugin from hammering the backend every turn when the peer graph is cold. - liveness_snapshot() on the provider exposes current turn, last fire, pending fire-at, empty streak, effective cadence, and thread status for in-process diagnostics. - system_prompt_block: nudge the model that honcho_reasoning accepts reasoning_level minimal/low/medium/high/max per call. - hermes honcho status: surface base reasoning level, cap, and heuristic toggle so config drift is visible at a glance. Tests: 550 passed. - TestDialecticLiveness (8 tests): stale-thread recovery, stale-result discard, fresh-result retention, backoff widening, backoff ceiling, streak reset on success, streak increment on empty, snapshot shape. - Existing TestDialecticCadenceAdvancesOnSuccess::test_in_flight_thread_is_not_stacked updated to set _prefetch_thread_started_at so it tests the fresh-thread-blocks branch (stale path covered separately). - test_cli TestCmdStatus fake updated with the new config attrs surfaced in the status block.	2026-04-18 22:50:55 -07:00
Erosika	098efde848	docs(honcho): wizard cadence default 2, prewarm/depth + observation + multi-peer - cli: setup wizard pre-fills dialecticCadence=2 (code default stays 1 so unset → every turn) - honcho.md: fix stale dialecticCadence default in tables, add Session-Start Prewarm subsection (depth runs at init), add Query-Adaptive Reasoning Level subsection, expand Observation section with directional vs unified semantics and per-peer patterns - memory-providers.md: fix stale default, rename Multi-agent/Profiles to Multi-peer setup, add concrete walkthrough for new profiles and sync, document observation toggles + presets, link to honcho.md - SKILL.md: fix stale defaults, add Depth at session start callout	2026-04-18 22:50:55 -07:00
Erosika	5f9907c116	chore(honcho): drop docs from PR scope, scrub commentary - Revert website/docs and SKILL.md changes; docs unification handled separately - Scrub commit/PR refs and process narration from code comments and test docstrings (no behavior change)	2026-04-18 22:50:55 -07:00
Erosika	78586ce036	fix(honcho): dialectic lifecycle — defaults, retry, prewarm consumption Several correctness and cost-safety fixes to the Honcho dialectic path after a multi-turn investigation surfaced a chain of silent failures: - dialecticCadence default flipped 3 → 1. PR #10619 changed this from 1 to 3 for cost, but existing installs with no explicit config silently went from per-turn dialectic to every-3-turns on upgrade. Restores pre-#10619 behavior; 3+ remains available for cost-conscious setups. Docs + wizard + status output updated to match. - Session-start prewarm now consumed. Previously fired a .chat() on init whose result landed in HonchoSessionManager._dialectic_cache and was never read — pop_dialectic_result had zero call sites. Turn 1 paid for a duplicate synchronous dialectic. Prewarm now writes directly to the plugin's _prefetch_result via _prefetch_lock so turn 1 consumes it with no extra call. - Prewarm is now dialecticDepth-aware. A single-pass prewarm can return weak output on cold peers; the multi-pass audit/reconcile cycle is exactly the case dialecticDepth was built for. Prewarm now runs the full configured depth in the background. - Silent dialectic failure no longer burns the cadence window. _last_dialectic_turn now advances only when the result is non-empty. Empty result → next eligible turn retries immediately instead of waiting the full cadence gap. - Thread pile-up guard. queue_prefetch skips when a prior dialectic thread is still in-flight, preventing stacked races on _prefetch_result. - First-turn sync timeout is recoverable. Previously on timeout the background thread's result was stored in a dead local list. Now the thread writes into _prefetch_result under lock so the next turn picks it up. - Cadence gate applies uniformly. At cadence=1 the old "cadence > 1" guard let first-turn sync + same-turn queue_prefetch both fire. Gate now always applies. - Restored query-length reasoning-level scaling, dropped in 9a0ab34c. Scales dialecticReasoningLevel up on longer queries (+1 at ≥120 chars, +2 at ≥400), clamped at reasoningLevelCap. Two new config keys: `reasoningHeuristic` (bool, default true) and `reasoningLevelCap` (string, default "high"; previously parsed but never enforced). Respects dialecticDepthLevels and proportional lighter-early passes. - Restored short-prompt skip, dropped in `ef7f3156`. One-word acknowledgements ("ok", "y", "thanks") and slash commands bypass both injection and dialectic fire. - Purged dead code in session.py: prefetch_dialectic, _dialectic_cache, set_dialectic_result, pop_dialectic_result — all unused after prewarm refactor. Tests: 542 passed across honcho_plugin/, agent/test_memory_provider.py, and run_agent/test_run_agent.py. New coverage: - TestTrivialPromptHeuristic (classifier + prefetch/queue skip) - TestDialecticCadenceAdvancesOnSuccess (empty-result retry, pile-up guard) - TestSessionStartDialecticPrewarm (prewarm consumed, sync fallback) - TestReasoningHeuristic (length bumps, cap clamp, interaction with depth) - TestDialecticLifecycleSmoke (end-to-end 8-turn session walk)	2026-04-18 22:50:55 -07:00
Teknium	bf5d7462ba	fix(tui): reject history-mutating commands while session is running (#12416 ) Fixes silent data loss in the TUI when /undo, /compress, /retry, or rollback.restore runs during an in-flight agent turn. The version- guard at prompt.submit:1449 would fail the version check and silently skip writing the agent's result — UI showed the assistant reply but DB / backend history never received it, causing UI↔backend desync that persisted across session resume. Changes (tui_gateway/server.py): - session.undo, session.compress, /retry, rollback.restore (full-history only — file-scoped rollbacks still allowed): reject with 4009 when session.running is True. Users can /interrupt first. - prompt.submit: on history_version mismatch (defensive backstop), attach a 'warning' field to message.complete and log to stderr instead of silently dropping the agent's output. The UI can surface the warning to the user; the operator can spot it in logs. Tests (tests/test_tui_gateway_server.py): 6 new cases. - test_session_undo_rejects_while_running - test_session_undo_allowed_when_idle (regression guard) - test_session_compress_rejects_while_running - test_rollback_restore_rejects_full_history_while_running - test_prompt_submit_history_version_mismatch_surfaces_warning - test_prompt_submit_history_version_match_persists_normally (regression) Validated: against unpatched server.py the three 'rejects_while_running' tests fail and the version-mismatch test fails (no 'warning' field). With the fix, all 6 pass, all 33 tests in the file pass, 74 TUI tests in total pass. Live E2E against the live Python environment confirmed all 5 patches present and guards enforce 4009 exactly as designed.	2026-04-18 22:30:10 -07:00
Teknium	3a6351454b	fix(gateway): close pending-drain and late-arrival races in base adapter (#12371 ) Two related race conditions in gateway/platforms/base.py that could produce duplicate agent runs or silently drop messages. Neither is specific to any one platform — all adapters inherit this logic. R5 (HIGH) — duplicate agent spawn on turn chain In _process_message_background, the pending-drain path deleted _active_sessions[session_key] before awaiting typing_task.cancel() and then recursively awaiting _process_message_background for the queued event. During the typing_task await, a fresh inbound message M3 could pass the Level-1 guard (entry now missing), set its own Event, and spawn a second _process_message_background for the same session_key — two agents running simultaneously, duplicate responses, duplicate tool calls. Fix: keep the _active_sessions entry populated and only clear() the Event. The guard stays live, so any concurrent inbound message takes the busy-handler path (queue + interrupt) as intended. R6 (MED-HIGH) — message dropped during finally cleanup The finally block has two await points (typing_task, stop_typing) before it deletes _active_sessions. A message arriving in that window passes the guard (entry still live), lands in _pending_messages via the busy-handler — and then the unconditional del removes the guard with that message still queued. Nothing drains it; the user never gets a reply. Fix: before deleting _active_sessions in finally, pop any late pending_messages entry and spawn a drain task for it. Only delete _active_sessions when no pending is waiting. Tests: tests/gateway/test_pending_drain_race.py — three regression cases. Validated: without the fix, two of the three fail exactly where the races manifest (duplicate-spawn guard loses identity, late-arrival 'LATE' message not in processed list).	2026-04-18 19:32:26 -07:00
Teknium	762f7e9796	feat: configurable approval mode for cron jobs (approvals.cron_mode) Add approvals.cron_mode config option that controls how cron jobs handle dangerous commands. Previously, cron jobs silently auto-approved all dangerous commands because there was no user present to approve them. Now the behavior is configurable: - deny (default): block dangerous commands and return a message telling the agent to find an alternative approach. The agent loop continues — it just can't use that specific command. - approve: auto-approve all dangerous commands (previous behavior). When a command is blocked, the agent receives the same response format as a user denial in the CLI — exit_code=-1, status=blocked, with a message explaining why and pointing to the config option. This keeps the agent loop running and encourages it to adapt. Implementation: - config.py: add approvals.cron_mode to DEFAULT_CONFIG - scheduler.py: set HERMES_CRON_SESSION=1 env var before agent runs - approval.py: both check_command_approval() and check_all_command_guards() now check for cron sessions and apply the configured mode - 21 new tests covering config parsing, deny/approve behavior, and interaction with other bypass mechanisms (yolo, containers)	2026-04-18 19:24:35 -07:00
Teknium	b02833f32d	fix(codex): Hermes owns its own Codex auth; stop touching ~/.codex/auth.json (#12360 ) Codex OAuth refresh tokens are single-use and rotate on every refresh. Sharing them with the Codex CLI / VS Code via ~/.codex/auth.json made concurrent use of both tools a race: whoever refreshed last invalidated the other side's refresh_token. On top of that, the silent auto-import path picked up placeholder / aborted-auth data from ~/.codex/auth.json (e.g. literal {"access_token":"access-new","refresh_token":"refresh-new"}) and seeded it into the Hermes pool as an entry the selector could eventually pick. Hermes now owns its own Codex auth state end-to-end: Removed - agent/credential_pool.py: _sync_codex_entry_from_cli() method, its pre-refresh + retry + _available_entries call sites, and the post-refresh write-back to ~/.codex/auth.json. - agent/credential_pool.py: auto-import from ~/.codex/auth.json in _seed_from_singletons() — users now run `hermes auth openai-codex` explicitly. - hermes_cli/auth.py: silent runtime migration in resolve_codex_runtime_credentials() — now surfaces `codex_auth_missing` directly (message already points to `hermes auth`). - hermes_cli/auth.py: post-refresh write-back in _refresh_codex_auth_tokens(). - hermes_cli/auth.py: dead helper _write_codex_cli_tokens() and its 4 tests in test_auth_codex_provider.py. Kept - hermes_cli/auth.py: _import_codex_cli_tokens() — still used by the interactive `hermes auth openai-codex` setup flow for a user-gated one-time import (with "a separate login is recommended" messaging). User-visible impact - On existing installs with Hermes auth already present: no change. - On a fresh install where the user has only logged in via Codex CLI: `hermes chat --provider openai-codex` now fails with "No Codex credentials stored. Run `hermes auth` to authenticate." The interactive setup flow then detects ~/.codex/auth.json and offers a one-time import. - On an install where Codex CLI later refreshes its token: Hermes is unaffected (we no longer read from that file at runtime). Tests - tests/hermes_cli/test_auth_codex_provider.py: 15/15 pass. - tests/hermes_cli/test_auth_commands.py: 20/20 pass. - tests/agent/test_credential_pool.py: 31/31 pass. - Live E2E on openai-codex/gpt-5.4: 1 API call, 1.7s latency, 3 log lines, no refresh events, no auth drama. The related 14:52 refresh-loop bug (hundreds of rotations/minute on a single entry) is a separate issue — that requires a refresh-attempt cap on the auth-recovery path in run_agent.py, which remains open.	2026-04-18 19:19:46 -07:00
yeyitech	bd01ec7885	fix(cli): strip all reasoning tag variants from /resume recap HermesCLI._display_resumed_history() calls the module-level _strip_reasoning_tags() to clean assistant content before rendering the recap panel. The tag list was missing <thought> (Gemma 4) and there was no pass for stray orphan </tag> closes, so those variants leaked internal reasoning into the recap display (#11316). - Add <thought> to _REASONING_TAGS. - Add a third regex pass that strips orphan close tags (e.g. 'stuff</think>answer' → 'stuffanswer'). - Apply IGNORECASE to closed-pair and unclosed-pair passes so mixed-case variants (<THINK>, <Thinking>) are handled uniformly — previously both 'THINKING' and 'thinking' had to be listed explicitly as distinct tuple entries, which missed <Thinking>. 7 new regression tests in tests/cli/test_resume_display.py covering: <think>, <thinking>, <reasoning>, <thought>, unclosed <think>, multiple interleaved blocks, and orphan </think> close. Resolves #11316. Originally proposed as PR #11366.	2026-04-18 19:19:24 -07:00
Tranquil-Flow	ec48ec5530	fix(agent): strip <think> blocks from stored assistant content Inline reasoning tags in an assistant message's content field leak to every downstream consumer: messaging platforms (#8878, #9568), API replay of prior turns, session transcript, CLI recap, generated session titles, and context compression. _extract_reasoning() already captures the reasoning text into msg['reasoning'] separately, so the raw tags in content are redundant. Stripping once at the storage boundary in _build_assistant_message() cleans the content for every downstream path in one place — no per-platform or per-path stripper needed. Measured impact on a real MiniMax M2.7-highspeed session (per @luoyejiaoe-source, #9306): 55% of assistant messages started with <think> blocks, 51/100 session titles were polluted, 16% content-size reduction. 3 new regression tests in TestBuildAssistantMessage: closed-pair strip with reasoning capture, no-think-tag passthrough, and unterminated-block strip. Resolves #8878 and #9568. Originally proposed as PR #9250.	2026-04-18 19:19:24 -07:00
Teknium	9489d1577d	fix(agent): strip unterminated <think> blocks from visible content Providers served via NIM (MiniMax M2.7, some Moonshot/DeepSeek proxies) sometimes drop the closing </think> tag, leaving raw reasoning in the assistant's content field. _strip_think_blocks()'s closed-pair regex is non-greedy so it only matches complete blocks — any orphan <think>...EOF survived the stripper and leaked to users (#8878, #9568, #10408). Adds an unterminated-tag pass that fires when an open reasoning tag sits at a block boundary (start of text or after a newline) with no matching close. Everything from that tag to end of string is stripped. The block-boundary check mirrors gateway/stream_consumer.py's filter so models that mention <think> in prose are not over-stripped. Also makes the closed-pair regexes consistently case-insensitive so <THINK>...</THINK> and <Thinking>...</Thinking> are handled uniformly — previously the mixed-case open tag would bypass the closed-pair pass and be caught by the unterminated-tag pass, taking trailing visible content with it. 6 new regression tests in TestStripThinkBlocks covering: unterminated <think>, unterminated <thought>, multi-line unterminated, line-start orphan with preserved prefix, prose-mention non-regression, mixed-case closed pairs. The implementation is inspired by @luinbytes's PR #10408 report of the NIM/MiniMax symptom. This commit does not include the 💭/🧠 emoji regexes from that PR — those glyphs are Hermes CLI display decorations, not model content markers.	2026-04-18 19:19:24 -07:00
Teknium	79c5a381c5	feat(uninstall): offer to remove named profiles when uninstalling from default When `hermes uninstall` runs from the default HERMES_HOME (~/.hermes) and other named profiles exist under ~/.hermes/profiles/, show them in the installation overview and prompt: Also stop and remove these N profile(s)? [y/N] If confirmed, for each named profile we: 1. Shell out to `python -m hermes_cli.main -p <name> gateway stop/uninstall` to stop the gateway and remove its systemd unit or launchd plist (service names + unit paths are derived from HERMES_HOME, so we can't cleanly switch in-process) 2. Remove the ~/.local/bin/<name> alias wrapper (outside HERMES_HOME) 3. Wipe the profile's HERMES_HOME dir Previously `hermes uninstall` was silently profile-scoped, leaving zombie systemd units at ~/.config/systemd/user/hermes-gateway-<profile>.service and zombie HERMES_HOMEs under ~/.hermes/profiles/ whenever a user uninstalled from default with other profiles configured. Prompt only appears when uninstalling from the default root. Uninstalling from within a named profile stays profile-scoped as before.	2026-04-18 19:18:13 -07:00
Teknium	3fe0d503b6	fix(uninstall): properly stop and destroy gateway on hermes uninstall The uninstaller's gateway cleanup was incomplete: - Linux only (ignored macOS launchd) - Only checked user systemd scope (missed system services) - Didn't kill standalone gateway processes (hermes gateway run) - Missing DBUS env setup for headless servers Now delegates to gateway.py's existing machinery: 1. Kill any standalone gateway processes (all platforms) 2. Linux: stop + disable + remove both user AND system systemd services 3. macOS: unload + remove launchd plist 4. Warns (instead of silently failing) when system service needs sudo	2026-04-18 19:18:13 -07:00
Teknium	1e5f0439d9	docs: update Anthropic console URLs to platform.claude.com Anthropic migrated their developer console from console.anthropic.com to platform.claude.com. Two user-facing display URLs were still pointing to the old domain: - hermes_cli/main.py — API key prompt in the Anthropic model flow - run_agent.py — 401 troubleshooting output The OAuth token refresh endpoint was already migrated in PR #3246 (with fallback). Spotted by @LucidPaths in PR #3237. (Salvage of #3758 — dropped the setup.py hunk since that section was refactored away and no longer contains the stale URL.)	2026-04-18 18:55:58 -07:00
Teknium	2a2e5c0fed	fix: force relogin on 401/403 Codex token refresh failures When the OAuth token endpoint returns 401/403 but the JSON body doesn't contain a known error code (invalid_grant, etc.), relogin_required stayed False. Users saw a bare error message without guidance to re-authenticate. Now any 401/403 from the token endpoint forces relogin_required=True, since these status codes always indicate invalid credentials on a refresh endpoint. 500+ errors remain as transient (no relogin).	2026-04-18 18:54:34 -07:00
Teknium	beabbd87ef	fix(gateway): close adapter resources when connect() fails or raises (#12339 ) Gateway startup leaks aiohttp.ClientSession (and other partial-init resources) when an adapter's connect() returns False or raises. The adapter is never added to self.adapters, so the shutdown path at gateway/run.py:2426 never calls disconnect() on it — Python GC later logs 'Unclosed client session' at process exit. Seen on 2026-04-18 18:08:16 during a double --replace takeover cycle: one of the partial-init sessions survived past shutdown and emitted the warning right before status=75/TEMPFAIL. Fix: - New GatewayRunner._safe_adapter_disconnect() helper — calls adapter.disconnect() and swallows any exception. Used on error paths. - Connect loop calls it in both failure branches: success=False and except Exception. - Adapter disconnect() implementations are already expected to be idempotent and tolerate partial-init state (they all guard on self._http_session / self._bridge_process before touching them). Tests: tests/gateway/test_safe_adapter_disconnect.py — 3 cases verify the helper forwards to disconnect, swallows exceptions, and tolerates platform=None.	2026-04-18 18:53:31 -07:00
Teknium	632a807a3e	fix(gateway): slash commands never interrupt a running agent (#12334 ) Any recognized slash command now bypasses the Level-1 active-session guard instead of queueing + interrupting. A mid-run /model (or /reasoning, /voice, /insights, /title, /resume, /retry, /undo, /compress, /usage, /provider, /reload-mcp, /sethome, /reset) used to interrupt the agent AND get silently discarded by the slash-command safety net — zero-char response, dropped tool calls. Root cause: - Discord registers 41 native slash commands via tree.command(). - Only 14 were in ACTIVE_SESSION_BYPASS_COMMANDS. - The other ~15 user-facing ones fell through base.py:handle_message to the busy-session handler, which calls running_agent.interrupt() AND queues the text. - After the aborted run, gateway/run.py:9912 correctly identifies the queued text as a slash command and discards it — but the damage (interrupt + zero-char response) already happened. Fix: - should_bypass_active_session() now returns True for any resolvable slash command. ACTIVE_SESSION_BYPASS_COMMANDS stays as the subset with dedicated Level-2 handlers (documentation + tests). - gateway/run.py adds a catch-all after the dedicated handlers that returns a user-visible "agent busy — wait or /stop first" response for any other resolvable command. - Unknown text / file-path-like messages are unchanged — they still queue. Also: - gateway/platforms/discord.py logs the invoker identity on every slash command (user id + name + channel + guild) so future ghost-command reports can be triaged without guessing. Tests: - 15 new parametrized cases in test_command_bypass_active_session.py cover every previously-broken Discord slash command. - Existing tests for /stop, /new, /approve, /deny, /help, /status, /agents, /background, /steer, /update, /queue still pass. - test_steer.py's ACTIVE_SESSION_BYPASS_COMMANDS check still passes. Fixes #5057. Related: #6252, #10370, #4665.	2026-04-18 18:53:22 -07:00
Teknium	41560192c4	chore(attribution): add AUTHOR_MAP entry for nish3451 Adds the nish3451 noreply email to the AUTHOR_MAP so CI attribution checks pass for the #6100 Telegram DM fallback fix merged in `1a9a2d7f`.	2026-04-18 18:52:41 -07:00

1 2 3 4 5 ...

4891 commits