The Dockerfile dropped the manual `@hermes/ink` materialisation gymnastics
in favour of letting npm workspaces resolve the bundled package
naturally. Two contract tests still asserted the older flow:
`test_dockerfile_installs_tui_dependencies` required:
'ui-tui/packages/hermes-ink/package-lock.json' in dockerfile_text
…but the lockfile is no longer COPIED individually \u2014 the entire
`ui-tui/packages/hermes-ink/` tree is COPIED instead (the workspace
reference from `ui-tui/package.json` is `file:` so npm needs the
real source, not just a manifest stub).
`test_dockerfile_materializes_local_tui_ink_package` required a 7-clause
conjunction matching specific `rm -rf` / `npm install --omit=dev`
`--prefix node_modules/@hermes/ink` / `rm -rf .../react` invocations
that were stripped out when the workspace resolution was simplified.
Update the assertions to pin the *contract* the image actually has to
carry rather than the *exact shell incantations* the old flow used:
* TUI deps install: ui-tui/package.json + ui-tui/package-lock.json +
ui-tui/packages/hermes-ink/ tree are all COPIED, and an npm
install/ci step runs in ui-tui.
* Bundled hermes-ink: the workspace package source is COPIED (so
`await import('@hermes/ink')` resolves at runtime).
This keeps the spirit of #15012 / #16690 (zombie reaping + bundled
workspace materialisation must continue to work) without locking the
Dockerfile into one specific implementation flavour.
Validation:
$ pytest tests/tools/test_dockerfile_pid1_reaping.py -q
6 passed in 1.43s
No production code change. Fixes the two failures observed on `main`
(run 25250051126):
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies`
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`
Adds SearXNG as a free, self-hosted web search provider. SearXNG is a
privacy-respecting metasearch engine that requires no API key — just a
running instance and SEARXNG_URL pointing at it.
## What this adds
- `tools/web_providers/searxng.py` — `SearXNGSearchProvider` implementing
`WebSearchProvider` (search only; no extract capability)
- `_is_backend_available("searxng")` — gates on SEARXNG_URL
- `_get_backend()` — accepts "searxng" as a configured value; adds it to
auto-detect candidates (lower priority than paid services)
- `web_search_tool` — dispatches to SearXNG when it is the active backend
- `check_web_api_key()` — includes SearXNG in availability check
- `OPTIONAL_ENV_VARS["SEARXNG_URL"]` — registered with tools=["web_search"]
- `tools_config.py` — SearXNG appears in the `hermes tools` provider picker
- `nous_subscription.py` — `direct_searxng` detection, web_active / web_available
- `setup.py` — SEARXNG_URL listed in the missing-credential hint
- 23 tests covering: is_configured, happy-path search, score sorting, limit,
HTTP/request errors, _is_backend_available, _get_backend, check_web_api_key
## Config
```yaml
# Use SearXNG for search, any paid provider for extract
web:
search_backend: "searxng"
extract_backend: "firecrawl"
# Or: SearXNG as the sole backend (web_extract will use the next available)
web:
backend: "searxng"
```
SearXNG is search-only — it does not implement WebExtractProvider. Users
who only configure SEARXNG_URL get web_search available; web_extract falls
back to the next available extract provider (or is unavailable if none).
Closes#19198 (Phase 2 Task 4 — SearXNG provider)
Ref: #11562 (original SearXNG PR)
Introduce the foundation for independently selecting web search and
extract backends — enabling future combinations like SearXNG for
search + Firecrawl for extract.
Architecture:
- tools/web_providers/base.py: WebSearchProvider and WebExtractProvider
ABCs with normalized result contracts (mirrors CloudBrowserProvider)
- tools/web_tools.py: _get_search_backend() and _get_extract_backend()
read per-capability config keys, fall through to shared web.backend
- hermes_cli/config.py: web.search_backend and web.extract_backend in
DEFAULT_CONFIG (empty = inherit from web.backend)
Behavioral change:
- web_search_tool() now dispatches via _get_search_backend()
- web_extract_tool() now dispatches via _get_extract_backend()
- When per-capability keys are empty (default), behavior is identical
to before — _get_search_backend() falls through to _get_backend()
This is purely structural — no new backends are added. SearXNG and
other search-only/extract-only providers can now be added as simple
drop-in modules in follow-up PRs.
12 new tests, 49 existing tests pass with zero regressions.
Ref: #19198
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.
Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:
1. Each working directory got its own full shadow git repo — no object
dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
/rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
without ever asking for /rollback.
Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.
Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
per-project metadata (store/projects/<hash>.json with workdir +
created_at + last_touch). On first v2 init, any pre-v2 per-directory
shadow repos are auto-migrated into legacy-<timestamp>/ so the new
store starts clean. _prune() now actually rewrites the per-project ref
to the last max_snapshots commits and runs git gc --prune=now. New
_enforce_size_cap() drops oldest commits round-robin across projects
when the store exceeds max_total_size_mb. _drop_oversize_from_index()
filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
(status / list / prune / clear / clear-legacy) for managing the store
outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
*.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
coverage for the pre-v2 prune path (per-directory shadow repos under
base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
shared store, new defaults, migration, and the new CLI;
reference/cli-commands.md documents 'hermes checkpoints'.
E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
--retention-days 0' removes its ref, index, and metadata; GC reclaims
the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
CLI without an agent running.
Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
Add Lightpanda as an optional browser engine for local mode.
Lightpanda is a headless browser built from scratch in Zig -- faster
navigation than Chrome with significantly less memory.
One config line to enable:
browser:
engine: lightpanda
New functions in browser_tool.py:
- _get_browser_engine() -- config/env reader with validation + caching
- _should_inject_engine() -- only inject in local non-cloud mode
- _needs_lightpanda_fallback() -- detect empty/failed LP results
- _chrome_fallback_screenshot() -- temporary Chrome session for screenshots
- Engine injection in _run_browser_command (--engine flag)
- browser_vision pre-routes screenshots to Chrome when engine=lightpanda
Config:
- browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome)
- AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS
- /browser status shows engine info in local mode
Rebased from PR #7144 onto current main. All existing code preserved --
pure additions only (+520/-2).
25 new tests + 81 total browser tests pass (0 failures).
Previously, pinning a skill blocked every skill_manage write action
(edit, patch, delete, write_file, remove_file). The 'hard fence'
design conflated two concerns:
1. Pin as deletion protection — don't let the curator archive
or the agent delete a stable skill.
2. Pin as content freeze — don't let the agent rewrite it mid-conversation.
In practice (1) is what users pin for: they want a skill to survive
curator passes. (2) created friction — agents finding a new pitfall
in a pinned skill had to ask the user to unpin, then the agent
patches, then the user re-pins. The dance discouraged skill
maintenance and pinned skills went stale.
This narrows the _pinned_guard to skill_manage(action='delete') only.
Patches, edits, and supporting-file writes go through on pinned
skills so the agent can keep improving them. The curator's own
pinned-skip behavior (agent/curator.py:271 for auto-archive,
line 349 for the LLM review prompt) is unchanged — curator still
never touches pinned skills.
Changes:
- tools/skill_manager_tool.py: remove _pinned_guard calls from
_edit_skill, _patch_skill, _write_file, _remove_file; keep on
_delete_skill. Updated _pinned_guard docstring and error message.
- tools/skill_manager_tool.py: updated skill_manage model-facing tool
description to reflect the new semantic.
- website/docs/user-guide/features/curator.md: updated pinning
section.
- tests/tools/test_skill_manager_tool.py: flipped refuses-pinned
tests for edit/patch/write_file/remove_file into allowed-when-pinned;
kept test_delete_refuses_pinned (strengthened assertion to check the
'cannot be deleted' wording).
Closes#18354
The cherry-picked test predates #19618/#19621 which rewrote
list_agent_created_skill_names() to require an explicit
created_by: 'agent' provenance marker. Without mark_agent_created(),
my-skill is excluded from the list and the positive assertion fails.
Closes the gap where write_file skipped the post-edit syntax check that
patch already ran, so silent file corruption (bad quote escaping,
truncated writes, etc.) would persist on disk until a later read.
## Changes
tools/file_operations.py:
- Add in-process linters for .py, .json, .yaml, .toml (LINTERS_INPROC).
Python uses ast.parse, JSON/YAML/TOML use stdlib/PyYAML parsers.
Zero subprocess overhead; preferred over shell linters when both apply.
- _check_lint() now accepts optional content and routes to in-process
linter first. Shell linter (py_compile, node --check, tsc, go vet,
rustfmt) remains the fallback for languages without an in-process
equivalent.
- New _check_lint_delta() implements the post-first/pre-lazy pattern
borrowed from Cline and OpenCode: lint post-write state first; only
if errors are found AND pre-content was captured does it lint the
pre-state and diff. If the pre-existing file had the SAME errors the
edit didn't introduce anything new, so the file is reported as 'still
broken, pre-existing' with success=False but a message explaining the
errors were pre-existing. If the edit introduced genuinely new errors,
those are surfaced and pre-existing ones are filtered out.
- WriteResult gains a lint field.
- write_file() captures pre-content for in-process-lintable extensions
and calls _check_lint_delta after a successful write.
- patch_replace() switches from _check_lint to _check_lint_delta,
reusing the pre-edit content it already has in scope.
tools/file_tools.py:
- Update write_file schema description to mention the post-write lint.
tests/tools/test_file_operations_edge_cases.py:
- Update existing brace-path tests to use .js (shell linter) now that
.py is in-process.
- Add TestCheckLintInproc (9 tests) covering Python/JSON/YAML/TOML
in-process linters.
- Add TestCheckLintDelta (5 tests) covering the post-first/pre-lazy
short-circuit, new-file path, and the single-error-parser caveat.
## Performance
In-process linters are microseconds per call (ast.parse, json.loads).
The hot path (clean write) runs exactly one lint — matches main's cost
for patch. Pre-state capture is skipped when the file has no applicable
linter. Measured 4.89ms/write average over 100 .py writes including lint.
## Inspiration
- Cline's DiffViewProvider.getNewDiagnosticProblems() — filters pre-write
diagnostics from post-write diagnostics (src/integrations/editor/DiffViewProvider.ts).
- OpenCode's WriteTool — runs lsp.diagnostics() after write and appends
errors to tool output (packages/opencode/src/tool/write.ts).
- Claude Code's DiagnosticTrackingService — captures baseline via
beforeFileEdited() and returns new-diagnostics-only from
getNewDiagnostics() (src/services/diagnosticTracking.ts).
## Validation
- tests/tools/test_file_operations.py + test_file_operations_edge_cases.py
+ test_file_tools.py + test_file_tools_live.py + test_file_write_safety.py
+ test_write_deny.py + test_patch_parser.py + test_file_ops_cwd_tracking.py:
228 passed locally.
- Live E2E reproduction of the tips.py corruption incident: broken
content written; lint field surfaces 'SyntaxError: invalid syntax.
Perhaps you forgot a comma? (line 6, column 5)' — the exact error
that would have self-corrected the bug on the next turn.
MCP servers commonly emit JSON Schema `pattern` (e.g. `\\d{4}-\\d{2}-\\d{2}`
for date-time params) and `format` keywords. llama.cpp's
`json-schema-to-grammar` converter rejects regex escape classes
(\\d/\\w/\\s) and most format values, returning HTTP 400
"parse: error parsing grammar: unknown escape at \\d" — the whole request
fails.
Cloud providers (OpenAI, Anthropic, OpenRouter, Gemini) accept these
keywords fine and use them as prompting hints. Stripping unconditionally
loses useful hints for every cloud user to fix a llama.cpp-only bug.
Approach: classify the llama.cpp grammar-parse 400 in the error
classifier, and on match do a one-shot in-place strip of pattern/format
from `self.tools`, then retry. Follows the existing
`thinking_signature` recovery pattern. Cloud users hit zero overhead;
llama.cpp users pay one failed request per session.
Changes
- agent/error_classifier.py: new `FailoverReason.llama_cpp_grammar_pattern`
+ narrow HTTP-400 branch matching "error parsing grammar",
"json-schema-to-grammar", or "unable to generate parser ... template".
- tools/schema_sanitizer.py: new `strip_pattern_and_format()` helper —
reactive, walks schema nodes, skips property names (search_files.pattern
survives). Returns strip count for logging.
- run_agent.py: new one-shot recovery block in the retry loop. Strips,
logs, continues. Falls through to normal retry if nothing to strip.
- tests: 4 classifier tests (3 variants + 1 non-400 negative), 7 strip
tests including the property-name preservation and idempotency checks.
Co-authored-by: Chris Danis <cdanis@gmail.com>
Address Copilot review on PR #17569:
1. _resolve_safe_cwd never tested the filesystem root because the loop
exited when `os.path.dirname(parent) == parent`, which is true once
`parent == '/'`. Restructure so the root is checked before the
self-equal exit. Adds `test_returns_root_when_only_root_exists` —
regression-guarded by reverting the loop and watching it fail.
2. The fake `Popen.stdout` was a `MagicMock`; `BaseEnvironment._wait_for_process`
calls `proc.stdout.fileno()` then `select.select`/`os.read` against it,
which raised `TypeError: fileno() returned a non-integer` (visible as a
thread exception in test output) and could in theory read from an
unrelated real fd. Hand `fake_popen` a real `os.pipe()` with the write
end pre-closed so the drain loop sees EOF immediately. Helper records
each fd so the test cleans up after itself.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a tool call deletes its own working directory (`cd /tmp/foo &&
rm -rf /tmp/foo`), the next `subprocess.Popen(args, cwd=self.cwd)` raised
`FileNotFoundError: [Errno 2]` before bash even started — every subsequent
terminal/file-tool call hit the same wedge until the gateway restarted.
Fix in `LocalEnvironment._run_bash`: before handing `self.cwd` to Popen,
resolve a safe alternative when the path is gone (walk up to the nearest
existing ancestor, falling back to `tempfile.gettempdir()` only as a last
resort). Log a warning so the recovery is visible — not silent — and
update `self.cwd` so the next call doesn't repeat the message.
Defense in depth in `LocalEnvironment._update_cwd`: only adopt the new
cwd when it still exists as a directory. `pwd -P` from a deleted cwd can
leave a stale value in the marker file; refusing to store a missing path
keeps `self.cwd` valid by construction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MiniMax deprecated the old v1/t2a_v2 endpoint (api.minimax.io) and
moved to v1/text_to_speech (api.minimax.chat). The new API:
- Uses a flat payload: {model, text, voice_id} instead of nested
voice_setting / audio_setting objects
- Returns raw audio bytes (Content-Type: audio/mpeg) instead of
JSON with hex-encoded audio
- Uses model 'speech-01' instead of 'speech-2.8-hd'
- Updated default voice_id to 'female-shaonv' for Chinese TTS
The implementation detects Content-Type to handle both old and new
API responses, maintaining backward compatibility for any users who
manually configured the legacy base_url.
Clear inherited provider preference filters when delegation.provider is set so delegated children do not route back to the parent provider. Add a regression test for cross-provider delegation with parent OpenRouter filters.
Closes#10653
When a delegation child session (e.g. source='telegram') contains the
FTS5 hit but _resolve_to_parent() maps it to a different root session
(source='api_server'), the result entry was still reporting the child's
source because the loop discarded session_meta as `_` and fell back to
match_info.get('source'), which carries the child session's value.
Use the resolved parent's session_meta for source, model, and started_at
with match_info as a fallback, so the output accurately reflects the
session the user actually interacted with.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reverts ff3d2773e2. Teknium reviewed the merged PR and decided this
behavior isn't wanted — tool-driven kanban_create should not mirror
the slash-command path's auto-subscribe. Orchestrators that want
their originating chat notified can call kanban_notify-subscribe
explicitly; we're not going to make it implicit.
Closes#19479.
When an orchestrator agent calls kanban_create from a gateway session
(e.g. a Telegram user delegating to an orchestrator profile), auto-
subscribe the originating (platform, chat, thread, user) to the new
task's terminal events. Mirrors the behavior of the /kanban create
slash command in gateway/run.py so tool-driven creation is at parity
with human-driven creation.
Without this, a user who interacts with an orchestrator exclusively
via the gateway never receives blocked / completed / gave_up
notifications for tasks the orchestrator created on their behalf —
silently breaking the gateway-first multi-agent flow the reporter
describes.
Reads the context-local HERMES_SESSION_* vars via get_session_env()
(not os.environ — those are contextvars for asyncio concurrency
safety). Falls through cleanly in CLI / cron contexts with no
session active (subscribed=False in the response). Best-effort: if
the gateway module isn't importable (test rigs stubbing gateway.*),
the task still creates, we just skip the subscription.
Response gains a 'subscribed' bool so the orchestrator knows whether
terminal events will land back in the originating chat or whether it
needs to poll / unblock manually.
Tests: 4 new in tests/tools/test_kanban_tools.py covering
CLI/no-subscribe, telegram/gateway-auto-subscribe, discord-DM/no-
thread subscribe, and partial-ctx/no-chat_id no-subscribe. 40/40
kanban tool tests pass.
Closes#19534 (security).
A worker spawned by the kanban dispatcher has HERMES_KANBAN_TASK set
to its own task id. The destructive tools (kanban_complete,
kanban_block, kanban_heartbeat) resolved task_id via
_default_task_id() which preferred an explicit arg over the env var,
with no ownership check — so a buggy or prompt-injected worker could
complete / block / heartbeat any OTHER task (sibling, cross-tenant,
anything) by supplying its id. Reporter's repro: worker for t_A
passed task_id=t_B to kanban_complete and got {"ok": true}.
Fix: add _enforce_worker_task_ownership(tid). If HERMES_KANBAN_TASK
is set and tid doesn't match, return a structured tool error with
guidance to use kanban_comment (for information handoff across tasks)
or kanban_create (for follow-up work). Orchestrator profiles (no env
var, but kanban toolset enabled per #18968) are exempt — their job
is routing and sometimes includes closing out child tasks.
Kept unrestricted (deliberately):
- kanban_show — workers legitimately read parent/sibling handoff context
- kanban_comment — cross-task comments are the handoff mechanism
- kanban_create — orchestrator fan-out, worker follow-up spawning
- kanban_link — parent/child linking
Tests: 5 new regression tests in tests/tools/test_kanban_tools.py
covering the grid (worker-attacks-foreign ×3 tools, worker-own-task
preserved, orchestrator-unrestricted). 36/36 pass.
AnyUrl was imported inside the same try block as mcp.client.auth, so
when the mcp package was not installed, AnyUrl was undefined and
_build_client_metadata raised NameError at runtime.
Moved the AnyUrl import to its own try/except block so it's available
whenever pydantic is installed (which is a core dependency), regardless
of whether the mcp SDK is present.
Also added pytest.importorskip('mcp') to the three
test_build_client_metadata tests that exercise _build_client_metadata,
since that function depends on OAuthClientMetadata from the mcp package.
TestTranscribeLocalExtended patches faster_whisper.WhisperModel, which
triggers an ImportError when the faster_whisper package is not installed.
Added a pytest.mark.skipif marker using importlib.util.find_spec so
these tests are gracefully skipped instead of failing with
ModuleNotFoundError.
Set max_result_size_chars=100_000 on the read_file registry entry (was
float('inf')), closing the Layer 2 defense-in-depth gap in
tool_result_storage.py. The existing Layer 1 guard inside
_handle_read_file already returns a JSON error for oversized reads;
this aligns the registry cap with every other tool.
Update test_read_file_never_persisted → test_read_file_result_size_cap
to assert 100_000, and add test_read_file_registry_cap_is_100k as an
explicit regression guard against re-introducing float('inf').
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Treat explicit CDP override mode as a valid browser backend even when agent-browser is absent, and add a regression test to prevent false-negative availability gating.
Tighten the provenance semantics added in #19618: skills a user asks a
foreground agent to write via skill_manage(create) now stay invisible to
the curator. Only skills the background self-improvement review fork
sediments through skill_manage get the created_by=agent marker.
- tools/skill_provenance.py — new ContextVar module mirroring the
_approval_session_key pattern: set_current_write_origin / reset /
get / is_background_review. Default origin is 'foreground'; the
review fork sets 'background_review'.
- run_agent.py — run_conversation() binds the ContextVar from
self._memory_write_origin at the top of each call. The review fork
runs on its own thread (fresh context), so foreground and review
contexts never cross-contaminate.
- tools/skill_manager_tool.py — skill_manage(action='create') now
only calls mark_agent_created() when is_background_review(). All
other cases (foreground create, patch, edit, write_file, delete)
continue as before.
- tests: test_skill_provenance.py (6 tests covering the ContextVar
surface), split test_full_create_via_dispatcher into foreground
vs. review-fork variants, curator status tests now mark-first.
Why: the agent routinely edits existing user skills on the user's
behalf; those writes must never flip provenance. And when a user
explicitly asks the foreground agent to create a skill, that skill
belongs to the user. The curator should only be cleaning up after
its own autonomous sediment from the review nudge loop.
PR #19427 dropped the 'You are a Kanban worker' identity line from
KANBAN_GUIDANCE so SOUL.md stays authoritative for profile identity.
This test assertion was stale against that change; update it to the
new protocol-only header.
_build_child_agent constructed child AIAgents without passing
fallback_model, leaving _fallback_chain=[] for every subagent.
When a subagent hit a rate-limit or credential exhaustion the
runtime fallback check (run_agent.py:7486 / 12267) found an empty
chain and failed immediately — even though the parent agent was
configured with fallback_providers and would have recovered.
The cron scheduler already propagates fallback_model correctly
(scheduler.py:1038). Fix closes the parity gap by reading the
parent's _fallback_chain (the normalised list form accepted by
AIAgent's fallback_model parameter) and threading it through.
Empty chains coerce to None so AIAgent initialises _fallback_chain=[]
as usual rather than iterating an empty list.
Follow-up on #9925 cherry-pick adding two additional tests:
- bytes content hashes identically to its str-decoded form
- mixed bytes+str bundle hash equals the on-disk content_hash from
skills_guard (the production invariant used to detect drift)
Also map dodofun@126.com and 1615063567@qq.com in AUTHOR_MAP so the
CI contributor check passes for the cherry-picked commit.
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: zhao0112 <1615063567@qq.com>
* feat: add video_analyze tool for native video understanding
Adds a video_analyze tool that sends video files to multimodal LLMs
(e.g. Gemini) for analysis via the OpenRouter-compatible video_url
content type. Mirrors vision_analyze in structure, error handling,
and registration pattern.
Key design:
- Base64 encodes entire video (no frame extraction, no ffmpeg dep)
- Uses 'video_url' content block type (OpenRouter standard)
- Supports mp4, webm, mov, avi, mkv, mpeg formats
- 50 MB hard cap, 20 MB warning threshold
- 180s minimum timeout (videos take longer than images)
- AUXILIARY_VIDEO_MODEL env override, falls back to AUXILIARY_VISION_MODEL
- Same SSRF protection, retry logic, and cleanup as vision_analyze
Default disabled: registered in 'video' toolset (not in _HERMES_CORE_TOOLS).
Users opt in via: hermes tools enable video, or enabled_toolsets=['video'].
* feat(video): add models.dev capability pre-check + CONFIGURABLE_TOOLSETS entry
- Pre-checks model video capability via models.dev modalities.input
before expensive base64 encoding. Fails early with helpful message
suggesting video-capable alternatives (gemini, mimo-v2.5-pro).
- Passes optimistically if model unknown or lookup fails.
- Adds ModelInfo.supports_video_input() helper.
- Adds 'video' to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS
so 'hermes tools enable video' works from CLI.
- 8 new tests for the capability check (37 total).
* refactor(video): remove models.dev capability pre-check
Removes _check_video_model_capability and ModelInfo.supports_video_input.
The vision_analyze tool doesn't pre-check image capability either — both
tools rely on the same pattern: send request, handle API errors gracefully
with categorized user-facing messages. The pre-check was inconsistent
(only worked for some providers/models) so drop it for parity.
* cleanup: compress comments, fix fragile timeout coupling
- Replace _VISION_DOWNLOAD_TIMEOUT * 2 with hardcoded 60s (no silent
breakage if vision timeout changes independently)
- Strip verbose comments and redundant log lines throughout
- No behavioral changes
Under context pressure, frontier models sometimes emit tool calls with
required fields dropped. Previously _handle_write_file() used
args.get('content', '') which substituted an empty string for the missing
key, returned success with bytes_written=0, and created a zero-byte file
on disk. The model had no way to detect the failure.
Changes:
- Reject calls where 'path' is absent or not a non-empty string
- Reject calls where 'content' key is entirely absent (key-presence check,
not truthiness) — distinguishing a legitimately empty file from a dropped arg
- Reject calls where 'content' is a non-string type
- All error messages include guidance to re-emit the tool call or switch
to execute_code with hermes_tools.write_file() for large payloads
- Explicit empty string content (file truncation) continues to work
Regression tests added for all four cases: missing path, missing content,
explicit-empty content, and wrong content type.
Fixes#19096
* fix(curator): authoritative absorbed_into declarations on skill delete
Closes#18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.
The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
- absorbed_into='<umbrella>' -> consolidated, target must exist on disk
- absorbed_into='' -> explicit prune, no forwarding target
- missing -> legacy path, falls through to heuristic/YAML
The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.
Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
+ _delete_skill. Validate target exists when non-empty. Reject
absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
tool calls for skill_manage(delete) with the arg. _reconcile_classification
accepts absorbed_declarations= and treats them as authoritative. Curator
prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
target, empty string, nonexistent target, self-reference, whitespace,
backward compat, dispatcher plumbing). 11 new curator tests covering
the extractor + authoritative reconciler path + mixed-legacy-and-
declared runs.
Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
all 3. Model emits NO YAML block. Heuristic misses (patch prose
doesn't name old slugs). Delete calls carry absorbed_into. Result:
both PR skills correctly classified 'consolidated' + cron rewritten
['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
-> routed via existing 'model' source, cron still rewritten correctly.
* feat(curator): capture + restore cron skill links across snapshot/rollback
Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.
Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.
Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.
Reconciliation rules:
- Job in backup AND live, skills differ → skills restored.
- Job in backup AND live, skills match → no-op.
- Job in backup, NOT in live → skipped (user deleted it
after snapshot; their choice
is later than the snapshot).
- Job in live, NOT in backup → untouched (user created it
after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
reports 'not captured'
(older pre-feature snapshots
keep working).
Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().
Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
'cron jobs: N (will be restored for skill-link fields only)' when the
snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
summarizing the reconciliation outcome.
Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
captured-as-raw, full rollback-restores-skills-and-cron, rollback
touches only skill fields, rollback skips user-deleted jobs, rollback
leaves user-created jobs untouched, rollback still works with
pre-feature snapshot that has no cron-jobs.json, standalone unit test
on _restore_cron_skill_links exercising the full report shape.
Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
name, prompt) preserved across the round trip.
Widens #16528 to two sibling sites that had the same quoted-boolean
bug: a YAML string "false" (or "0", "no", "off") silently evaluated
truthy under bool() / if-check.
- gateway/run.py _load_show_reasoning: is_truthy_value wrap
- tools/skill_manager_tool.py _guard_agent_created_enabled: is_truthy_value wrap
- regression tests for both
When running on a host with sudoers NOPASSWD configured for the current
user, interactive Hermes sessions were unnecessarily entering the
password prompt path before executing sudo commands. Outside Hermes,
`sudo -n true` exits 0 for that user.
Add `_sudo_nopasswd_works()` that probes `sudo -n true` and, when it
succeeds, lets `_transform_sudo_command()` return the command unchanged
with no stdin password. The probe:
- Is scoped to the `local` terminal backend only, so Docker/SSH/Modal
and other remote backends do not inherit host sudo state.
- Re-probes every call (no process-lifetime cache) so an expired sudo
timestamp cannot silently make a later command block waiting for a
password that Hermes never prompts for.
- Is bypassed entirely when `SUDO_PASSWORD` is configured or a cached
password already exists, preserving existing explicit-password flows.
Co-authored-by: Junting Wu <juntingpublic@gmail.com>
Proves token A's detected capabilities do not leak to token B after the
fix in the preceding commit. Before the fix this test would have seen
both tokens return token A's cached value.
- order session_search recent-mode results by last activity instead of session start time
- add an opt-in `order_by_last_active` path to `SessionDB.list_sessions_rich`
- add regression coverage for both the database ordering and recent-mode call path