Commit graph

12339 commits

Author SHA1 Message Date
kshitijk4poor
d6cb69a7a9 chore: add sweetcornna to AUTHOR_MAP
Salvage co-author of the cron ticker-liveness fix.
2026-06-21 13:00:50 +05:30
annguyenNous
07424da76f fix(cron): keep ticker alive on BaseException + heartbeat-aware status
The in-process cron ticker (cron/scheduler_provider.py) caught only
`Exception` and logged at DEBUG, so a `SystemExit`/`KeyboardInterrupt`
raised from a misbehaving provider SDK or agent retry path killed the
ticker thread silently. The gateway PROCESS stayed up, so `hermes cron
status` — which only checks `find_gateway_pids()` — kept reporting
"✓ jobs will fire automatically" while no jobs ever fired (#32612,
#32895).

This makes ticker death survivable and detectable:

- The ticker loop now catches `BaseException` and logs at ERROR with a
  traceback, so a single bad tick no longer tears the thread down and
  the failure is visible in the gateway log.
- The loop records a heartbeat (`cron/ticker_heartbeat`, epoch seconds)
  on startup and after every tick — best-effort, never raised into the
  loop. Both ticker entry points (the gateway and the desktop fallback
  in web_server.py) funnel through `InProcessCronScheduler.start`, so one
  heartbeat site covers both.
- `hermes cron status` now reads the heartbeat age: if the gateway is
  running but the heartbeat is stale (> 200s, i.e. several missed ~60s
  ticks), it reports the ticker as STALLED and suggests a restart instead
  of falsely claiming jobs will fire. A missing heartbeat (older build /
  never ran) is treated as "unknown", not "dead".

Adds tests for BaseException survival, per-iteration heartbeat recording,
heartbeat round-trip/age, staleness detection, and silent-write-failure.

Salvaged from #49660 (BaseException survival on current structure),
extended with the heartbeat + honest-status reporting that the earlier
(pre-refactor) watchdog PRs #35616 and #33849 proposed.

Fixes #32612
Fixes #32895

Co-authored-by: banditburai <promptsiren@gmail.com>
Co-authored-by: sweetcornna <96944678+sweetcornna@users.noreply.github.com>
2026-06-21 13:00:50 +05:30
kshitijk4poor
35752fc3a5 chore: add szzhoujiarui-sketch and rayjun to AUTHOR_MAP
Salvage co-authors of the cron model.default fix.
2026-06-21 12:37:56 +05:30
konsisumer
73b92264ee fix(cron): resolve model.default + fail fast on missing model
Cron jobs created without an explicit `model` are stored as `model: null`.
At fire time `run_job` resolved `model = job.get("model") or os.getenv(
"HERMES_MODEL") or ""` and then `_model_cfg.get("default", model)`, so when
config.yaml had no `model.default` (or `model: {default: null}`) an empty
string flowed straight to the provider and surfaced as an opaque HTTP 400
("Model parameter is required" / "model: String should have at least 1
character"). The operator had to inspect jobs.json to discover the job was
stored with a null model.

This change makes cron model resolution robust and symmetric with the CLI:

- Coerce `model: null`/missing config to `{}` so a falsy default never
  overwrites an already-resolved env value with `None`.
- Only overwrite `model` from `model.default` when the resolved value is
  truthy; accept a `model.model` alias key, mirroring the sibling resolvers
  in hermes_cli/oneshot.py, fallback_cmd.py and prompt_size.py.
- Resolve AFTER the managed-scope overlay so an administrator-pinned model
  still wins.
- Fail fast with an actionable error (caught by run_job's outer handler and
  recorded as the job's last_error — the cron ticker is unaffected) instead
  of letting an empty model reach the API.
- The per-job model is re-read every tick, so a `cronjob action=update
  model=...` after a failed run takes effect on the next tick (no cache).

Adds tests/cron/conftest.py pinning a default HERMES_MODEL so existing
run_job tests don't trip the new guard, plus regression tests covering env
fallback, config.default fallback, string-form config, the model alias key,
null-default-no-clobber, corrupt-config graceful degradation, fail-fast,
and the no-cache re-read property.

Salvaged from #24005, rebased onto current main, with additional test
coverage folded in from #45550 and the alias-key behavior from #43952.

Fixes #43899
Fixes #23979
Fixes #22761

Co-authored-by: szzhoujiarui-sketch <szzhoujiarui@gmail.com>
Co-authored-by: rayjun <rayjun0412@gmail.com>
2026-06-21 12:37:56 +05:30
teknium1
14ef6312b5 fix(compression): decay protect_first_n so early turns don't fossilize (#11996)
protect_first_n keeps the first N non-system messages verbatim through
compaction so the original task framing survives. But it was applied on
EVERY compression pass: the same early user turns were re-copied into each
child session and never summarized away, so across a long, repeatedly-
compressed session those old messages became immortal and grew the
protected head unboundedly (#11996, P1).

Decay it: protect_first_n applies on the FIRST compaction only. Once the
session has been compressed at least once (compression_count >= 1, or a
handoff summary already exists), the early turns are captured in the
summary, so _effective_protect_first_n() returns 0 and only the system
prompt stays protected. The decay is read at compress_start computation
time, before compression_count/_previous_summary are mutated at the end of
compress(), so the first pass still protects correctly.

Co-authored-by: truenorth-lj <liliangjya@gmail.com>
Co-authored-by: davidvv <david.vv@icloud.com>
2026-06-21 00:06:58 -07:00
Teknium
c6bf6bda90
fix(memory): recover from missing old_text on single-op replace/remove (#49997)
Single-op replace/remove failed with a dead-end 'old_text is required'
error when a structured-output client omitted the optional old_text field
(it can't be schema-required without a top-level if/then combinator that
OpenAI's Codex backend 400s on). The model couldn't recover.

Now a missing old_text returns the current entry inventory plus a retry
instruction (mirroring the batch path's _batch_error), so the model can
reissue the call with old_text set. Also sharpens the old_text schema
description to state it's required for replace/remove.

Fixes #49466, #43412.
2026-06-20 23:46:52 -07:00
Teknium
d5f0e737d9 chore(release): add AUTHOR_MAP entry for #49544 salvage 2026-06-20 23:42:47 -07:00
Teknium
c1f11f8c69 fix(telegram): index streamed rich finals via editMessageText too
The native echo recovery handles replies to most rich messages, but
messages sent before the bot's first rich send have no echo to read.
record() was only called on the fresh-send path (_try_send_rich); a
streamed final finalized via _try_edit_rich/editMessageText was never
indexed, so a reply to it had neither a native echo nor an index entry.
Mirror the fresh-send record() into the edit success path to close
that gap.
2026-06-20 23:42:47 -07:00
izumi0uu
29e5e127c6 fix(telegram): recover reply text from native rich echo
Telegram DOES echo a rich message's content back in
reply_to_message.api_kwargs['rich_message']['blocks'] when a user
replies to it. Read that native field first in _build_message_event,
keeping the local send-time index only as a fallback. Duck-type
api_kwargs via .get() since it is a mappingproxy, not a dict.

Fixes #49534
2026-06-20 23:42:47 -07:00
teknium
fcdefb4181 chore(release): add AUTHOR_MAP entries for docs PR salvage cluster 2 2026-06-20 23:23:47 -07:00
Tony Simons
2008a96b20 docs: align contributor test checklist with wrapper 2026-06-20 23:23:47 -07:00
BBCrypto-web
72e4cca00e docs(config): correct MCP docs path in cli-config.yaml.example
The MCP section pointed to docs/mcp.md, which does not exist. Point it
to website/docs/user-guide/features/mcp.md, matching the existing
hooks.md reference convention in the same file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 23:23:47 -07:00
namredips
b1ab5a8ae1 docs(antigravity-cli): add delegation patterns + output/bounding caveats
Brings the antigravity-cli skill to parity with the codex / claude-code
delegation playbooks. Additive only — auth/sandbox/plugin/settings content
is unchanged.

- New 'Delegation patterns' section: one-shot, background bounded runs,
  interactive PTY+tmux, parallel worktree fan-out, and an orchestration
  boundary note (agy is a worker backend / reviewer, not a coordination
  primitive).
- Documents the two ways agy -p differs from claude-code: plain-text
  output (no --output-format json / result envelope) and bounding via
  --print-timeout rather than a nonexistent --max-turns. Mirrored into
  Pitfalls.
- Bumps version 0.1.0 -> 0.2.0.
2026-06-20 23:23:47 -07:00
Sworntech-dev
9f507a0aa3 docs: remove file tools TBD placeholder 2026-06-20 23:23:47 -07:00
BBCrypto-web
225dcf855c docs(.env.example): add HF_BASE_URL placeholder
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 23:23:47 -07:00
loes5050
85f108ef03 test(cron): document consent-first self-learning suggestions 2026-06-20 23:23:47 -07:00
allo
bc85f6150e docs: document per-event extra keys in shell-hook wire protocol
The shell-hook stdin payload's extra object contains event-specific
kwargs, but the docstring only mentioned the field without listing
what each event actually puts inside it.

Add a reference table covering post_tool_call, pre_tool_call,
on_session_start, on_session_end, and subagent_stop — the five
hook sites that emit extra keys beyond the top-level payload.

Closes #49370
2026-06-20 23:23:47 -07:00
Tortugasaur
c02648c5dd fix(docs): align slash-command and docker docs 2026-06-20 23:23:47 -07:00
teknium1
98ecd0beeb docs(mcp): fix stale ~0.75s discovery-wait reference in late-refresh docstring
The MCP discovery wait is now bounded by the config-driven mcp_discovery_timeout
(default 1.5s), not the old 0.75s flat value. Updates the _schedule_mcp_late_refresh
docstring that still cited ~0.75s after #49208 made the bound configurable.
2026-06-20 23:23:47 -07:00
Kevin Anderson
b337afdf6e docs(cli): fix broken terminal-backend guide link in setup wizard
The terminal backend onboarding step pointed at
/docs/developer-guide/environments, which no longer exists. Point it at
the live docs page /docs/user-guide/configuration#terminal-backend-configuration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 23:23:47 -07:00
virtuadex
defeda8c55 docs: sync documentation with current implementation 2026-06-20 23:23:47 -07:00
miha
95d970a752 docs: sharpen software-development skills 2026-06-20 23:23:47 -07:00
aieng-abdullah
74b5cc7ca4 docs(spotify): document 6-month re-auth cycle and add client-level invalid_grant test
- Remove the 'you only log in once per machine' claim from spotify.md
  and document the ~6-month refresh token expiry with re-auth instructions
- Add test_client_wraps_invalid_grant_as_spotify_auth_required_error to
  confirm SpotifyClient wraps AuthError(code=spotify_refresh_invalid_grant)
  into SpotifyAuthRequiredError with a user-facing message

Refs: #28155
2026-06-20 23:23:47 -07:00
EloquentBrush0x
9bd5003d4f fix(spotify): quarantine dead tokens on terminal refresh failure
resolve_spotify_runtime_credentials() called _refresh_spotify_oauth_state()
without a try/except, so a terminal failure (HTTP 400/401, invalid_grant,
refresh_token_reused) raised AuthError but left the dead refresh_token in
auth.json. Every subsequent session re-read and retried the same token over
the network, failing identically each time.

Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_at, expires_in, obtained_at) and write a
last_auth_error quarantine marker to auth.json before re-raising. The next
call sees no access_token and fails fast with spotify_access_token_missing —
no network retry — and the user is prompted to re-authenticate.

Mirrors the quarantine pattern already in place for Nous, xAI-OAuth,
Codex-OAuth (#28116, #28118), and MiniMax-OAuth (#28119).
2026-06-20 23:23:47 -07:00
HwangJohn
242962e1f5 docs(providers): clarify vllm qwen reasoning output
Signed-off-by: HwangJohn <angelic805@gmail.com>

Co-authored-by: OpenAI Codex <codex@openai.com>
2026-06-20 23:23:47 -07:00
X7
fe5c8d2316 fix(docs): document curl, xz-utils, and g++ as Linux prerequisites 2026-06-20 23:23:47 -07:00
Sworntech-dev
fa53e36438 docs(hooks): document manual shell hook allowlisting 2026-06-20 23:23:47 -07:00
DrZM007
f80088f035 docs: add missing Prerequisites/How to Run sections to SKILL.md template
The SKILL.md template in CONTRIBUTING.md was missing the Prerequisites
and How to Run sections, even though the "modern section order"
guidance immediately below it lists both as required.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 23:23:47 -07:00
brett-bonner_infodesk
eec9c1d84e docs(agents): clarify background delegation durability 2026-06-20 23:23:47 -07:00
michael.chen
063155e234 docs(hooks): document subagent_start plugin hook 2026-06-20 23:23:47 -07:00
x7peeps
df4015bbc1 docs: session lifecycle documentation 2026-06-20 23:23:47 -07:00
e10552
2609bcccca feat(i18n): add complete Spanish translation
- Complete README.es.md (full Spanish translation of README)
- Add CONTRIBUTING.es.md (Spanish contributing guide)
- Add SECURITY.es.md (Spanish security policy)
- Fix remaining English strings in locales/es.yaml (resume Matrix section)
- Add Spanish badge to README.md

All 47 i18n tests pass, including catalog key parity and placeholder parity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-20 23:23:47 -07:00
Sworntech-dev
38756f2d55 docs(docker): document gateway tool-loop hard stops 2026-06-20 23:23:47 -07:00
GauravPatil2515
cc30e0b659 docs(config): document auxiliary task fallback_chain 2026-06-20 23:23:47 -07:00
Greg DeYoung
5eb158e317 docs(hermes-agent skill): document project context files and their discovery rules
Adds a new 'Project Context Files' section to the hermes-agent skill
explaining the priority order and discovery rules for .hermes.md,
AGENTS.md, CLAUDE.md, and .cursorrules. Specifically clarifies:

- .hermes.md walks parents up to the git root (good for monorepos)
- AGENTS.md / agents.md is cwd-only (portable to other agents)
- The 20K cap and head+tail truncation strategy
- The threat-pattern scanner behavior (blocks content, not file)
- What --ignore-rules actually skips (everything)

Also fixes an inaccurate docstring in agent/agent_init.py for
skip_context_files — the previous text only mentioned SOUL.md,
AGENTS.md, and .cursorrules, but the actual behavior (per
build_context_files_prompt and the --ignore-rules CLI flag) skips
all of them plus .hermes.md and CLAUDE.md.

Refs: https://github.com/NousResearch/hermes-agent/issues/46775
2026-06-20 23:23:47 -07:00
Andres Sommerhoff
97563ab821 fix: warn on line-oriented newline search patterns 2026-06-20 23:23:47 -07:00
Andres Sommerhoff
eb9a002284 docs: clarify search_files newline regex behavior 2026-06-20 23:23:47 -07:00
lkz-de
6403ed06b3 docs(session-search): document source-first retrieval limits
Clarify that session_search is secondary context and direct source identifiers must be inspected first when accessible. Add regression coverage for the tool description.
2026-06-20 23:23:47 -07:00
BBCrypto-web
1eb2959309 docs(.env.example): add missing ELEVENLABS_API_KEY placeholder 2026-06-20 23:23:47 -07:00
skyc1e
46cc0345ae docs(skills): add hermes-agent verification rule 2026-06-20 23:23:47 -07:00
teknium1
8ac5e90ec2 fix(gateway): dedup image_generate media across the compression boundary
After context compression, the agent re-sent an already-delivered
generated image on every subsequent turn (#46627). The auto-append
fallback rescans full history when the message list shrinks (compression-
safe path), deduping against _history_media_paths — but that set was built
by scanning ONLY MEDIA: text tags in tool results. image_generate returns
its path in a JSON payload field (host_image/image/agent_visible_image),
never a MEDIA: tag, so generated-image paths never entered the dedup set
and were re-emitted after the boundary.

Extract the history-path collection into _collect_history_media_paths(),
which now covers BOTH delivery shapes: MEDIA: text tags AND image_generate
JSON-payload paths (mirroring what _collect_auto_append_media_tags
extracts). The inline block in _handle_message is replaced with a call to
the helper.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-06-20 23:20:16 -07:00
teknium1
1f874dfe44 fix(compression): stop fallback summary triplicating the latest user ask
When LLM summarization fails, the deterministic fallback summary rendered
the latest user ask (active_task = "User asked: '<ask>'") verbatim under
THREE headings — Historical Task Snapshot, Historical In-Progress State,
and Historical Pending User Asks. Re-presenting an already-handled ask as
unresolved in-progress/pending work made the model re-answer it AND treat
the resurrected ask as the active turn, burying the genuinely-new
post-compaction user message (#49307: answer repetition + new-instruction
loss, P1).

Keep the latest ask once, under Task Snapshot, as historical context only.
The In-Progress and Pending-Asks sections now say 'Unknown / None
recoverable from deterministic fallback' (consistent with the Active
State / Key Decisions / Resolved Questions sections) and explicitly note
the ask is historical, not outstanding. The raw turn text still appears in
the verbatim 'Last Dropped Turns' transcript — that's the dropped-turn
record, not a re-labeled instruction.

Note: the separate role=assistant standalone-summary regurgitation
(#33256) is left as-is — that role choice is constrained by strict message
alternation (user collides with a user-ending head) and is already
mitigated by the summary end-marker; forcing the role would risk the
alternation invariant.

Co-authored-by: r266-tech <r2668940489@gmail.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
2026-06-20 23:19:27 -07:00
teknium1
2f3177adf4 fix(compression): protect the summary call from mid-flight interrupts
Context compression is atomic, but a gateway interrupt (an incoming user
message while the agent is busy) could abort the in-flight summary call.
The Codex Responses aux stream polls the thread interrupt flag and raised
InterruptedError unconditionally — so compression fell back to a degraded
static 'summary unavailable' marker, losing the real handoff (#23975).

Add a thread-local interrupt-protection flag (aux_interrupt_protection
context manager) in auxiliary_client; the Codex stream's cancellation
check honors it. The compressor wraps its summary call_llm in the context
manager. Timeouts still fire (a hung call must die) and all other aux
tasks (vision, web_extract, title_generation, …) stay interruptible.
Re-entrant, so the main-model retry recursion is safe.

Co-authored-by: konsisumer <der@konsi.org>
2026-06-20 21:32:30 -07:00
Teknium
4b7f9a4d30
test(matrix): make voice-detection tests hermetic against mention gating (#49946)
test_matrix_voice flaked in CI (6/7 failing on some shards, passing on
others and on main) depending on leaked MATRIX_REQUIRE_MENTION env state.

Root cause: the adapter defaults require_mention=True (falling back to the
MATRIX_REQUIRE_MENTION env var). These tests fire a group-room audio event
with no @mention, so _resolve_message_context drops it before dispatch
('No event was captured') whenever require_mention resolves True — which
happens in a clean shard, but an earlier test in another shard can leave
MATRIX_REQUIRE_MENTION=false in os.environ and mask it. The plugin
migration (#5600105478 adapter→bundled plugin) shifted shard composition
and exposed it.

Pin require_mention: False in the test adapter config so these media-TYPE
detection tests are no longer gated by the mention requirement, regardless
of ambient env. Verified: 7/7 pass with MATRIX_REQUIRE_MENTION=true (the
failing condition) AND with the env unset.
2026-06-20 21:22:11 -07:00
teknium1
4c349e85f8 fix(gateway): preserve transcript when hygiene auto-compress can't rotate
Gateway Session Hygiene auto-compression destroyed the original transcript
when the throwaway hygiene agent couldn't rotate the session (#21301, P1).

The _hyg_agent is built WITHOUT a session_db, so _compress_context cannot
end-and-fork the session (its rotate block is gated on agent._session_db).
The session_id stays unchanged, and the rewrite_transcript() call ran
UNCONDITIONALLY — replacing the full original transcript with just the
head+summary list. Permanent data loss on every hygiene compaction.

Guard the rewrite behind 'rotated OR in-place' exactly like the /compress
path already does (#44794/#39704): only overwrite when a new session id
was minted or in-place compaction succeeded; otherwise preserve the
original transcript and log a warning. The token/count bookkeeping that
followed the rewrite is moved inside the guard, with no-change values in
the preserve branch.

Co-authored-by: SandroHub013 <sandrohub013@gmail.com>
Co-authored-by: WuTianyi123 <wtyopenclaw@gmail.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
2026-06-20 21:07:11 -07:00
teknium1
79f297834a fix(gateway): widen cron namespace-collision fix to all migrated adapters
#49431 corrected parents[2]->parents[3] for discord + raft only. The same
bug existed in slack, whatsapp, and telegram adapters (migrated from
gateway/platforms/ in 5600105478): each inserts parents[2] = plugins/ onto
sys.path[0], shadowing the real cron/ package with plugins/cron/ so
'import cron.scheduler_provider' raises ModuleNotFoundError on gateway start.

Fixes #49410, #49824.
2026-06-20 20:45:12 -07:00
kyssta-exe
4c206b972d fix(gateway): correct sys.path insertion in plugins to prevent cron namespace collision (#49410) 2026-06-20 20:45:12 -07:00
teknium
e5e173eefd chore(release): add AUTHOR_MAP entries for docs PR salvage cluster 2026-06-20 20:42:49 -07:00
mintybasil
5d05415292 Expand .gitignore example 2026-06-20 20:42:49 -07:00
mintybasil
094d9cba6c Update docs to clarify requirement for gitignore 2026-06-20 20:42:49 -07:00