hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-17 09:41:58 +00:00

Author	SHA1	Message	Date
Teknium	733472952a	fix: complete cron jobs lock salvage Route curator rollback through the same cross-process cron job lock, make save_jobs lock for legacy direct callers without deadlocking nested mutation paths, and harden the regression test so a second _jobs_lock caller really blocks across processes.	2026-06-15 06:29:00 -07:00
CiarasClaws	e5b4cf7bea	fix(cron): make jobs.json writes safe across processes `hermes cron pause`/`resume`/`remove` run in their own CLI process (CLI → cronjob tool → pause_job → update_job → save_jobs), entirely separate from the gateway process that also writes jobs.json (mark_job_run, advance_next_run, due-fast-forward in get_due_jobs). The only synchronization was a module-level `threading.Lock`, which serializes writers within a single process but does nothing across processes — and update_job/pause_job/remove_job/create_job did not even take it. The result is a classic lost update: a `cron pause` issued while the gateway is live loads jobs.json, sets enabled=False, and saves; concurrently the gateway loads the same file and saves back its run-bookkeeping, clobbering the pause. The CLI prints "Paused" (it succeeded against its own in-memory copy) but the job stays enabled and keeps firing, with no error surfaced. The scheduler's `.tick.lock` flock can't be reused for this — it is held for the entire tick, including multi-minute agent runs, so a CLI mutation would block for minutes. Add `_jobs_lock()`: a short-held cross-process advisory file lock (fcntl/msvcrt flock on `<hermes_home>/cron/.jobs.lock`) layered over the existing in-process lock, and wrap every load→modify→save critical section with it — create_job, update_job, remove_job, mark_job_run, advance_next_run, get_due_jobs, rewrite_skill_refs. The lock degrades to in-process-only if neither fcntl nor msvcrt is available, preserving prior behaviour. All critical sections are short (field edits, no agent execution), so contention resolves in milliseconds. Adds a regression test that proves the lock excludes a second process (an in-process threading.Lock cannot). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 06:29:00 -07:00
FT_IOxCS	92a456f711	fix(cli,deps): clear esbuild audit loop Upgrade the Vite/esbuild surfaces that kept web, ui-tui, and the bootstrap installer on vulnerable esbuild versions, regenerate the root lockfile, and preserve intentional package+lock dependency edits during update lockfile cleanup.	2026-06-15 06:18:27 -07:00
Teknium	0d82060c74	fix: harden WhatsApp target alias salvage Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.	2026-06-15 05:51:47 -07:00
Keiron McCammon	ea49a79633	fix(messaging): route WhatsApp group JIDs to the target, not the home DM send_message(target="whatsapp:<group-jid>") silently delivered to the configured home DM instead of the requested group. Two gaps: 1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us), user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and broadcast/newsletter JIDs matched no pattern and fell through to `return None, None, False`, so the caller treated them as unresolvable and used the home channel. The bridge's /send endpoint accepts any chatId, so only the tool-side target parsing was at fault. Add a whatsapp branch that recognizes native JIDs as explicit targets. The pre-existing '+'-prefixed E.164 path is preserved. 2. WhatsApp groups have no human-friendly name — the channel directory is regenerated from session data on a timer, so a group shows up as its raw 18-digit JID and any hand-edit to channel_directory.json is clobbered on the next rebuild. Add a user-maintained alias overlay (~/.hermes/channel_aliases.json) re-applied on every build AND every load, giving durable friendly names and letting a freshly-created group be pre-named before its first message. Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser; TestChannelAliases (7 cases) for the overlay, plus an autouse fixture isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the existing directory tests.	2026-06-15 05:51:47 -07:00
Veritas-7	febdddb41a	fix(auth): refresh xAI OAuth tokens earlier	2026-06-15 05:40:23 -07:00
Teknium	aab2e99bae	test: cover request debug dump redaction Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.	2026-06-15 05:31:21 -07:00
Teknium	a688d2a1bd	test: assert disk cleanup prunes protected walks	2026-06-15 05:25:27 -07:00
墨綠BG	40699c3292	🐛 fix(disk-cleanup): avoid brittle sweep review issues	2026-06-15 05:25:27 -07:00
墨綠BG	c1a70a5439	🐛 fix(disk-cleanup): prune protected cleanup walks	2026-06-15 05:25:27 -07:00
liuhao1024	2cddc9c895	fix(bedrock): check boto3 version >= 1.34.59 before using converse_stream converse() and converse_stream() were added in boto3 1.34.59. When Hermes is installed editable into system Python (e.g. Ubuntu 24.04 ships 1.34.46), the system boto3 takes precedence and calls to converse_stream fail with AttributeError. Add an early version check in _require_boto3() that raises a clear RuntimeError with upgrade instructions.	2026-06-15 05:25:17 -07:00
Tharushka Dinujaya	ec05d2bc3e	fix(gateway): evict scoped lock when PID+start_time match but process is not a gateway On Linux, systemd spawns core services (cron, nginx, sshd) with deterministic PIDs and jiffy start_times across reboots. A service can land on the exact same PID and start_time as a previous gateway, causing acquire_scoped_lock to mistake it for a live gateway and block startup. The existing stale-detection paths only covered: - start_times both non-None and different (clear mismatch) - start_times both None (macOS/Windows fallback to cmdline check) The boot-time collision falls through both: times are non-None and equal, so neither branch fired. Add a third check: when both start_times are known and match but the live process fails _looks_like_gateway_process, read its cmdline. If the cmdline is readable (non-None), we have positive evidence of an impostor and mark the lock stale. Requiring a readable cmdline keeps the check conservative — if cmdline is unreadable we do not evict.	2026-06-15 05:25:07 -07:00
Nicolò Boschi	a376ca0081	feat(hindsight): make observation scopes configurable on retain Adds an observation_scopes config key (and HINDSIGHT_RETAIN_OBSERVATION_SCOPES env var) so retained memories can opt into per_tag / all_combinations / custom scoping instead of Hindsight's default combined pass. Threaded through _build_retain_kwargs so all three retain paths honor it: auto-retain and flush-on-switch already use aretain_batch; the tool retain path is switched from aretain to aretain_batch (functionally equivalent, aretain just wraps a single-item batch) since aretain doesn't accept the observation_scopes parameter.	2026-06-15 04:59:17 -07:00
kshitijk4poor	497352bc4e	fix(auth): write rotated xAI OAuth tokens back to global root (#43589 ) The salvaged read-side fix lets a profile resolve the xAI OAuth grant from the global-root auth store when it has no own providers.xai-oauth block. But _save_xai_oauth_tokens still wrote rotated tokens only to the active profile store. Because xAI rotates the refresh_token on every refresh, a profile that reads root's grant and refreshes it left root holding a now- revoked refresh token — killing every other profile reading the stale root grant with invalid_grant once its access token expired (#43589). Detect the read-from-root case (profile lacks its own providers.xai-oauth block) and, after the profile save, write the rotated chain back to the global root too via a best-effort, TOCTOU-safe write-through that reuses _save_auth_store with an explicit target path. A profile that genuinely shadows root (has its own block) is left untouched, classic mode is a no-op, and a failed root write never breaks the profile's own save. Pairs with the read fallback in the preceding commit so the cross-profile xAI grant stays coherent in both directions.	2026-06-15 17:08:19 +05:30
Andrew Walker	f1d6f04362	fix(auth): resolve xAI OAuth credentials across profiles (cherry picked from commit `8d8b9f50e4`)	2026-06-15 17:03:35 +05:30
helix4u	dcc3216955	fix(mcp): fail fast for noninteractive oauth without tokens	2026-06-15 04:22:07 -07:00
Teknium	aca11c227e	fix(docker): skip gateway reconciliation in dashboard container (autodetect) (#46293 ) * fix(docker): skip per-profile gateway reconciliation in dashboard container When gateway and dashboard containers share a bind-mounted HERMES_HOME, both run the cont-init.d profile reconciliation script, which creates s6-log processes for every persisted profile. These s6-log processes in different containers race to flock() the same log-directory lock files under logs/gateways/<profile>/lock, producing repeated "s6-log: fatal: unable to lock ... Resource busy" errors and a supervision restart storm. Add HERMES_SKIP_PROFILE_RECONCILE env var support to container_boot.py and set it in the official docker-compose.yml dashboard service so the dashboard container no longer creates per-profile gateway s6 services it never uses. * chore(release): map salvaged contributor * refactor(docker): autodetect dashboard container instead of env-var gate Replace the HERMES_SKIP_PROFILE_RECONCILE env var with PID 1 argv role detection. A dashboard-only container never spawns or supervises per-profile gateways, so the reconcile boot hook now skips itself when /proc/1/cmdline is the dashboard command — no operator flag to set (or forget in a hand-written manifest, which would reintroduce the s6-log flock storm this prevents). - Extract _strip_container_argv_prefix() shared by the legacy-gateway and new dashboard detectors (DRY the init/wrapper/hermes peel). - Add _is_dashboard_container(); gate reconcile main() on it. - Drop HERMES_SKIP_PROFILE_RECONCILE from code + docker-compose.yml. - Tests: argv matrix for both roles + main()-level skip/reconcile proof and a regression that the removed env var is now inert. Co-authored-by: 895252509 <895252509@qq.com> --------- Co-authored-by: zhouxiang <895252509@qq.com> Co-authored-by: Ben <ben@nousresearch.com>	2026-06-15 20:51:48 +10:00
Gille	d6a8d9dcab	fix(tools): respect session cwd in file tools	2026-06-15 14:00:42 +05:30
Ben Barclay	95715dcb03	fix(s6): reserved default gateway must not follow sticky active_profile (#46483 ) The supervised `gateway-default` s6 slot runs bare `hermes gateway run` (no -p) to mean "the root HERMES_HOME profile". But `_apply_profile_override` falls through its #22502 HERMES_HOME guard for the container root (/opt/data, whose parent is not `profiles`) and reads the sticky `active_profile` file. If the user set another profile active (e.g. via the dashboard), the reserved default gateway gets redirected into that profile — producing a duplicate gateway for the active profile and no real default gateway. The profile page and `gateway status` then correctly report default as "not running" because there genuinely isn't one. Guard step 2 (the sticky active_profile fallback) with the existing HERMES_S6_SUPERVISED_CHILD sentinel that the container run-script already exports. Supervised named-profile slots pass -p explicitly (step 1, never reaches step 2); only the bare default slot was affected. Inert outside the s6 container — the sentinel is never set elsewhere. Reported in the 'Docker & Profiles & Dashboard' support thread.	2026-06-15 05:36:20 +00:00
Ben Barclay	80f8ffc74c	fix(dashboard): pin machine-dashboard reroute to the machine root, not $HOME/.hermes (#46487 ) The unified machine-dashboard reroute (cmd_dashboard) re-execs a named-profile dashboard launch as the machine dashboard and dropped HERMES_HOME from the child env with the comment "so the child binds the machine root". That holds for a standard install (root == ~/.hermes) but breaks the Docker layout: the published image sets `ENV HERMES_HOME=/opt/data`, so once HERMES_HOME is unset the child falls back to $HOME/.hermes = /opt/data/.hermes — an empty, auto-seeded home. Two user-visible symptoms, one root cause (reported via support): 1. Dashboard Profiles page shows only an empty `default` — the real default/oracle/saga profiles live under /opt/data/profiles, but the rerouted child resolves _get_profiles_root() to /opt/data/.hermes/profiles. 2. The "Update Hermes" button runs `hermes update` inside the container repeatedly instead of bailing with the docker-update guidance. The Docker guard keys off detect_install_method(), which reads $HERMES_HOME/.install_method; the image stamps that at /opt/data, but the misresolved home has no stamp, no HERMES_MANAGED, and no .git → falls through to "pip", so the guard never fires. The reporter's workaround was to bind-mount the host dir at both /opt/data and /opt/data/.hermes so the two paths converge (at the cost of a self-referential recursion). Fix: resolve the machine root explicitly with get_default_hermes_root() and set it on the child env instead of popping HERMES_HOME. That helper returns the root for both layouts — ~/.hermes for a standard install, and /opt/data for Docker (it strips a trailing profiles/<name>). Falls back to the old pop behaviour only if root resolution raises, so the reroute is never blocked. Regression tests in test_dashboard_unified_launch.py: the existing standard- install test now asserts the child carries HERMES_HOME == get_default_hermes_root() (not absent), and a new test_reexec_pins_docker_machine_root covers the Docker layout (HERMES_HOME=/opt/data/profiles/oracle → child gets /opt/data). Both fail against the pre-fix pop behaviour (mutation-verified).	2026-06-15 15:33:15 +10:00
Teknium	b770967263	fix(s6): persist profile gateway desired state (#46292 ) * fix: persist s6 gateway desired state * chore(release): map salvaged contributor --------- Co-authored-by: Alfred Smith <alfred@my-cloud.me> Co-authored-by: Ben <ben@nousresearch.com>	2026-06-15 14:02:10 +10:00
Teknium	61ee2dbfdb	fix(s6): make profile gateway log parent writable (#46291 ) * fix(gateway): chown logs/gateways parent so late-added profiles can log The per-profile log service script created $HERMES_HOME/logs/gateways/ via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When the first log service boots in root context, the gateways/ parent stays root:root; every profile registered later runs its log service as the dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a sub-second fatal crash-loop flooding the container log. The stage2 recursive heal does not catch it either: it is gated on needs_chown, which is false when the top-level $HERMES_HOME is already hermes-owned. Two complementary fixes: - service_manager._render_log_run: chown the gateways/ parent (non-recursively) before the leaf chown. Runs on every root-context boot, so it also heals volumes already poisoned by older images. - docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p block; cont-init runs before any service starts, so the parent already exists hermes-owned when the first log/run does 'mkdir -p'. The needs_chown repair loop needs no twin entry: it already chowns logs/ recursively, which covers logs/gateways. Fixes #45258 * chore(release): map salvaged contributor --------- Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>	2026-06-15 13:47:05 +10:00
Teknium	40d7c264f0	fix(s6): register profile gateways without auto-starting (#46266 ) * fix(s6): prevent profile create from auto-starting gateway service When hermes profile create runs inside an s6 container, _maybe_register_gateway_service() calls register_profile_gateway() which creates the service directory and triggers s6-svscanctl -a. Previously the service always started immediately, causing profiles that share the main gateway's bot token (e.g. Kanban worker profiles) to fail with a token-lock conflict and persist gateway_state: running — becoming zombies that resurrect on every container restart. Wire the existing start_now parameter through the S6 implementation: when start_now=False, write a marker file (same pattern as container_boot.py _register_gateway_slot) so s6-supervise leaves the service stopped until the user explicitly runs hermes -p <profile> gateway start. 4 files, +61/-6, 4 new tests (all passing). * test(docker): wait for gateway running state before restart --------- Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>	2026-06-15 11:43:23 +10:00
Teknium	4eb0ff639b	Remove is_container check when restarting over dashboard (#46290 ) Co-authored-by: IAvecilla <ignacio.avecilla@lambdaclass.com>	2026-06-15 11:09:23 +10:00
Teknium	f3fe99863d	revert(web): remove keyless Parallel search fallback (#46350 ) Remove the free Parallel Search MCP path and restore the keyed Parallel backend behavior from before it was introduced. Also drops the keyless fallback registration/display labeling tests and returns the Parallel SDK pin to the prior version.	2026-06-14 16:47:57 -07:00
Teknium	a829e04d62	fix: migrate cloned profile configs (#46345 )	2026-06-14 16:30:23 -07:00
Teknium	2a14e8957d	fix(kimi): surface K2.7 Code in native picker (#46309 )	2026-06-14 14:01:03 -07:00
mr-r0b0t	bff78a34dc	feat(zai): add GLM-5.2 with verified 1M context window GLM-5.2 ships with a 1M (1,048,576) token context window. Without this entry, Hermes falls through to the generic 'glm' key (202,752 tokens), under-reporting the context bar and prematurely compressing conversations. The 1M limit was verified empirically via needle-in-a-haystack retrieval at 789,240 prompt tokens on api.z.ai/api/coding/paas/v4 — zero errors, zero truncation, correct retrieval at every tested size (25K through 789K). Changes: - agent/model_metadata.py: add 'glm-5.2': 1_048_576 before 'glm' fallback - hermes_cli/models.py: add glm-5.2 to zai curated models - hermes_cli/setup.py: add glm-5.2 to setup wizard zai list - hermes_cli/auth.py: add glm-5.2 to coding plan endpoint probes - plugins/model-providers/zai/__init__.py: add glm-5.2 to fallback_models - tests/agent/test_model_metadata.py: context resolution + vendor-prefix tests	2026-06-14 13:50:36 -07:00
Teknium	4e6d05c6a5	perf(skills): share raw config cache in skill utils (#46149 )	2026-06-14 11:14:58 -07:00
Teknium	a1f51feb72	fix(telegram): avoid rich final duplicate previews (#46206 )	2026-06-14 11:13:38 -07:00
kshitij	6c34088a17	Merge pull request #46237 from kshitijk4poor/salvage/46095-cross-process-cache fix(gateway): cross-process agent-cache coherence (#45966) + preserve prompt caching	2026-06-14 23:05:17 +05:30
kshitijk4poor	3bc4a2ff78	fix(gateway): re-baseline agent-cache message_count after each turn The #45966 cross-process coherence guard snapshots a session's on-disk message_count next to the cached agent and rebuilds the agent when the count changes. But the snapshot is taken at agent-BUILD time — before the turn writes its own user + assistant (+ tool) rows — and the cache entry is never rewritten on a reuse. So this process's OWN turn grows message_count, and the very next turn sees a mismatch and rebuilds the agent. That happens every turn, for every conversation, silently destroying the per-conversation prompt caching the cache exists to protect (AGENTS.md: prompt caching is sacred). Add _refresh_agent_cache_message_count(): after a turn completes and the agent has flushed its rows to the SessionDB, re-baseline the stored count to the now-current value. The guard then fires ONLY when a DIFFERENT process changes the transcript — preserving the #45966 fix while keeping the cache warm for normal single-process operation. Tests drive the real SessionDB + the real guard condition: 5 consecutive same-process turns now all REUSE the cached agent (0 before the fix); a cross-process append still invalidates; and the re-baseline is fail-safe (no DB, falsy session_id, raising probe, legacy 2-tuple, pending sentinel all no-op).	2026-06-14 22:58:55 +05:30
kshitijk4poor	ce19fdb7ce	fix(skills): apply global\|platform disabled union to all resolution sites The platform-disabled fix landed only in agent.skill_utils.get_disabled_skill_names (the system-prompt path). Two sibling resolvers still used the old replace-not-union semantics, so the same skill could be hidden from the <available_skills> prompt yet reported enabled elsewhere: - hermes_cli/skills_config.get_disabled_skills (the 'hermes skills config' UI) returned only the platform list, so a globally-disabled skill showed as enabled (unchecked) on any platform with a platform_disabled entry. - tools/skills_tool._is_skill_disabled (gates whether skill_view loads a skill) ignored the global list when a platform list existed, so a globally-disabled skill could still be loaded on such a platform. Both now union the global list with the platform list, matching get_disabled_skill_names. An explicit empty platform list no longer re-enables a globally-disabled skill — global disables hold on every platform (#46201). Also: fix the now-stale get_disabled_skill_names docstring and drop a stray blank line. Regression tests added for both sites (proven to fail on the old replace semantics).	2026-06-14 22:54:54 +05:30
ibrahim özsaraç	7bbe7024c2	fix: filter platform-disabled skills from <available_skills> prompt (#46201 ) build_skills_system_prompt() already resolved _platform_hint but called get_disabled_skill_names() with no argument, so the resolved platform never reached the filter and the prompt cache_key varied by platform while the disabled set did not. Pass _platform_hint or None. get_disabled_skill_names() also fully ignored the global 'disabled' list once a platform-specific list was found. Return the union (global \| platform) so a globally-disabled skill stays disabled on every platform. Salvaged from #46203 by @iborazzi; the unrelated apps/shared/tsconfig.json ES2023 bump is intentionally dropped (one concern per PR).	2026-06-14 22:52:57 +05:30
Teknium	7433d5f0eb	fix(gateway): scope early duplicate guard to pid file	2026-06-14 08:42:06 -07:00
konsisumer	1436793051	fix(gateway): block shell gateway run when a service supervises the profile	2026-06-14 08:42:06 -07:00
Teknium	2c174bce24	fix(gateway): preserve new input on interrupted replay cleanup	2026-06-14 05:10:39 -07:00
Diyon18	288f7026e3	fix(messaging): correct Weixin personal account labeling	2026-06-14 04:52:54 -07:00
Teknium	efbe1635dd	fix(gateway): include replied-to media attachments (#46107 )	2026-06-14 04:51:50 -07:00
Teknium	a27d7e68cc	fix(mcp): block suspicious stdio configs before probe (#46112 )	2026-06-14 04:46:54 -07:00
Teknium	13a1bd0f83	perf(model-metadata): persist OpenRouter metadata cache (#46114 )	2026-06-14 04:45:46 -07:00
Teknium	972a9885ee	fix(mcp): block exfil-shaped stdio server configs (#46083 )	2026-06-14 04:24:14 -07:00
Teknium	9459057d7f	fix(telegram): guard rich details math crash (#46102 )	2026-06-14 04:22:22 -07:00
Teknium	cf7d5932f8	fix(email): make IPv4 SMTP fallback use supported sockets	2026-06-14 04:16:26 -07:00
liuhao1024	04d4471d79	fix(email): use SMTP_SSL for port 465 and fall back to IPv4 on timeout Port 465 expects implicit TLS (SMTP_SSL) from the first byte. The email adapter always used SMTP() + starttls(), which is correct for port 587 but hangs/fails on port 465 providers (e.g., Swiss ISPs). Additionally, when the SMTP host has AAAA DNS records but IPv6 is unreachable, socket.create_connection() tries IPv6 first and hangs until timeout. Add an IPv4 fallback via AF_INET socket. Extract _connect_smtp() helper to consolidate the 4 duplicate SMTP connection sites into a single method with correct protocol selection and IPv6 fallback logic.	2026-06-14 04:16:26 -07:00
Teknium	5105c3651a	perf(api-server): normalize chat content linearly (#46079 )	2026-06-14 03:25:49 -07:00
Aldo	293c04fef6	fix(gateway): suppress exact silence tokens without mutating history	2026-06-14 03:25:08 -07:00
Teknium	10bad2faf1	fix(gateway): serialize startup auto-resume before inbound (#46074 ) Gateway startup now queues real inbound messages until restart-interrupted auto-resume turns have completed, preventing duplicate agents for the same session after a restart.	2026-06-14 03:21:06 -07:00
Teknium	2b4873f7fb	fix(agent): persist repaired-turn responses (#46071 )	2026-06-14 03:20:25 -07:00
Teknium	723c2331bd	fix: make profile subprocess HOME policy explicit	2026-06-14 03:20:21 -07:00

1 2 3 4 5 ...

5510 commits