hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-06-18 09:51:59 +00:00

Author	SHA1	Message	Date
Wysie	be99feff1f	fix(image-gen): force-refresh plugin providers in long-lived sessions	2026-04-23 03:01:18 -07:00
Julien Talbot	d8cc85dcdc	review(stt-xai): address cetej's nits - Replace hardcoded 'fr' default with DEFAULT_LOCAL_STT_LANGUAGE ('en') — removes locale leak, matches other providers - Drop redundant default=True on is_truthy_value (dict .get already defaults) - Update auto-detect comment to include 'xai' in the chain - Fix docstring: 21 languages (match PR body + actual xAI API) - Update test_sends_language_and_format to set HERMES_LOCAL_STT_LANGUAGE=fr explicitly, since default is no longer 'fr' All 18 xAI STT tests pass locally.	2026-04-23 01:57:33 -07:00
Julien Talbot	18b29b124a	test(stt): add unit tests for xAI Grok STT provider Covers: - _transcribe_xai: no key, successful transcription, whitespace stripping, API error (HTTP 400), empty transcript, permission error, network error, language/format params sent, custom base_url, diarize config - _get_provider xAI: key set, no key, auto-detect after mistral, mistral preferred over xai, no key returns none - transcribe_audio xAI dispatch: dispatch, default model (grok-stt), model override	2026-04-23 01:57:33 -07:00
Ubuntu	a3014a4481	fix(docker): add SETUID/SETGID caps so gosu drop in entrypoint succeeds The Docker terminal backend runs containers with `--cap-drop ALL` and re-adds only DAC_OVERRIDE, CHOWN, FOWNER. Since commit `fee0e0d3` ("run as non-root user, use virtualenv") the image entrypoint drops from root to the `hermes` user via `gosu`, which requires CAP_SETUID and CAP_SETGID. Without them every sandbox container exits immediately with: Dropping root privileges error: failed switching to 'hermes': operation not permitted Breaking every terminal/file tool invocation in `terminal.backend: docker` mode. Fix: add SETUID and SETGID to the cap-add list. The `no-new-privileges` security-opt is kept, so gosu still cannot escalate back to root after the one-way drop — the hardening posture is preserved. Reproduction ------------ With any image whose ENTRYPOINT calls `gosu <user>`, the container exits immediately under the pre-fix cap set. Post-fix, the drop succeeds and the container proceeds normally. docker run --rm \ --cap-drop ALL \ --cap-add DAC_OVERRIDE --cap-add CHOWN --cap-add FOWNER \ --security-opt no-new-privileges \ --entrypoint /usr/local/bin/gosu \ hermes-claude:latest hermes id # -> error: failed switching to 'hermes': operation not permitted # Same command with SETUID+SETGID added: # -> uid=10000(hermes) gid=10000(hermes) groups=10000(hermes) Tests ----- Added `test_security_args_include_setuid_setgid_for_gosu_drop` that asserts both caps are present and the overall hardening posture (cap-drop ALL + no-new-privileges) is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:13:14 -07:00
Teknium	7d8b2eee63	fix(delegate): default inherit_mcp_toolsets=true, drop version bump Follow-up on helix4u's PR #14211: - Flip default to true: narrowing toolsets=['web','browser'] expresses 'I want these extras', not 'silently strip MCP'. Parent MCP tools (registered at runtime) should survive narrowing by default. - Drop _config_version bump (22->23); additive nested key under delegation.* is handled by _deep_merge, no migration needed. - Update tests to reflect new default behavior.	2026-04-22 17:45:48 -07:00
helix4u	3e96c87f37	fix(delegate): make MCP toolset inheritance configurable	2026-04-22 17:45:48 -07:00
Yukipukii1	44a16c5d9d	guard terminal_tool import-time env parsing	2026-04-22 14:45:50 -07:00
kshitijk4poor	d6ed35d047	feat(security): add global toggle to allow private/internal URL resolution Adds security.allow_private_urls / HERMES_ALLOW_PRIVATE_URLS toggle so users on OpenWrt routers, TUN-mode proxies (Clash/Mihomo/Sing-box), corporate split-tunnel VPNs, and Tailscale networks — where DNS resolves public domains to 198.18.0.0/15 or 100.64.0.0/10 — can use web_extract, browser, vision URL fetching, and gateway media downloads. Single toggle in tools/url_safety.py; all 23 is_safe_url() call sites inherit automatically. Cached for process lifetime. Cloud metadata endpoints stay ALWAYS blocked regardless of the toggle: 169.254.169.254 (AWS/GCP/Azure/DO/Oracle), 169.254.170.2 (AWS ECS task IAM creds), 169.254.169.253 (Azure IMDS wire server), 100.100.100.200 (Alibaba), fd00:ec2::254 (AWS IPv6), the entire 169.254.0.0/16 link-local range, and the metadata.google.internal / metadata.goog hostnames (checked pre-DNS so they can't be bypassed on networks where those names resolve to local IPs). Supersedes #3779 (narrower HERMES_ALLOW_RFC2544 for the same class of users). Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-04-22 14:38:59 -07:00
Yukipukii1	40619b393f	tools: normalize file tool pagination bounds	2026-04-22 06:11:41 -07:00
Teknium	8f167e8791	fix(tts): use per-provider input-character caps instead of global 4000 (#13743 ) A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at 4000 chars, causing long inputs to be silently chopped even though the underlying APIs allow much more: - OpenAI: 4096 - xAI: 15000 - MiniMax: 10000 - ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware) - Gemini: ~5000 - Edge: ~5000 The schema description also told the model 'Keep under 4000 characters', which encouraged the agent to self-chunk long briefs into multiple TTS calls (producing 3 separate audio files instead of one). New behavior: - PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH encode the documented per-provider limits. - _resolve_max_text_length(provider, cfg) resolves: 1. tts.<provider>.max_text_length user override 2. ElevenLabs model_id lookup 3. provider default 4. 4000 fallback - text_to_speech_tool() and stream_tts_to_speaker() both call the resolver; old MAX_TEXT_LENGTH alias kept for back-compat. - Schema description no longer hardcodes 4000. Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253 voice-command/voice-cli tests still pass.	2026-04-21 17:49:39 -07:00
Teknium	9c9d9b7ddf	feat(delegate): cross-agent file state coordination for concurrent subagents (#13718 ) * feat(models): hide OpenRouter models that don't advertise tool support Port from Kilo-Org/kilocode#9068. hermes-agent is tool-calling-first — every provider path assumes the model can invoke tools. Models whose OpenRouter supported_parameters doesn't include 'tools' (e.g. image-only or completion-only models) cannot be driven by the agent loop and fail at the first tool call. Filter them out of fetch_openrouter_models() so they never appear in the model picker (`hermes model`, setup wizard, /model slash command). Permissive when the field is missing — OpenRouter-compatible gateways (Nous Portal, private mirrors, older snapshots) don't always populate supported_parameters. Treat missing as 'unknown → allow' rather than silently emptying the picker on those gateways. Only hide models whose supported_parameters is an explicit list that omits tools. Tests cover: tools present → kept, tools absent → dropped, field missing → kept, malformed non-list → kept, non-dict item → kept, empty list → dropped. * feat(delegate): cross-agent file state coordination for concurrent subagents Prevents mangled edits when concurrent subagents touch the same file (same process, same filesystem — the mangle scenario from #11215). Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1: 1. FileStateRegistry (tools/file_state.py) — process-wide singleton tracking per-agent read stamps and the last writer globally. check_stale() names the sibling subagent in the warning when a non-owning agent wrote after this agent's last read. 2. Per-path threading.Lock wrapped around the read-modify-write region in write_file_tool and patch_tool. Concurrent siblings on the same path serialize; different paths stay fully parallel. V4A multi-file patches lock in sorted path order (deadlock-free). 3. Delegate-completion reminder in tools/delegate_tool.py: after a subagent returns, writes_since(parent, child_start, parent_reads) appends '[NOTE: subagent modified files the parent previously read — re-read before editing: ...]' to entry.summary when the child touched anything the parent had already seen. Complements (does not replace) the existing path-overlap check in run_agent._should_parallelize_tool_batch — batch check prevents same-file parallel dispatch within one agent's turn (cheap prevention, zero API cost), registry catches cross-subagent and cross-turn staleness at write time (detection). Behavior is warning-only, not hard-failing — matches existing project style. Errors surface naturally: sibling writes often invalidate the old_string in patch operations, which already errors cleanly. Tests: tests/tools/test_file_state_registry.py — 16 tests covering registry state transitions, per-path locking, per-path-not-global locking, writes_since filtering, kill switch, and end-to-end integration through the real read_file/write_file/patch handlers.	2026-04-21 16:41:26 -07:00
pefontana	48ecb98f8a	feat(delegate): orchestrator role and configurable spawn depth (default flat) Adds role='leaf'\|'orchestrator' to delegate_task. With max_spawn_depth>=2, an orchestrator child retains the 'delegation' toolset and can spawn its own workers; leaf children cannot delegate further (identical to today). Default posture is flat — max_spawn_depth=1 means a depth-0 parent's children land at the depth-1 floor and orchestrator role silently degrades to leaf. Users opt into nested delegation by raising max_spawn_depth to 2 or 3 in config.yaml. Also threads acp_command/acp_args through the main agent loop's delegate dispatch (previously silently dropped in the schema) via a new _dispatch_delegate_task helper, and adds a DelegateEvent enum with legacy-string back-compat for gateway/ACP/CLI progress consumers. Config (hermes_cli/config.py defaults): delegation.max_concurrent_children: 3 # floor-only, no upper cap delegation.max_spawn_depth: 1 # 1=flat (default), 2-3 unlock nested delegation.orchestrator_enabled: true # global kill switch Salvaged from @pefontana's PR #11215. Overrides vs. the original PR: concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only, no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which silently enabled one level of orchestration for every user). Co-authored-by: pefontana <fontana.pedro93@gmail.com>	2026-04-21 14:23:45 -07:00
Teknium	5ffae9228b	feat(image-gen): add GPT Image 2 to FAL catalog (#13677 ) Adds OpenAI's new GPT Image 2 model via FAL.ai, selectable through `hermes tools` → Image Generation. SOTA text rendering (including CJK) and world-aware photorealism. - FAL_MODELS entry with image_size_preset style - 4:3 presets on all aspect ratios — 16:9 (1024x576) falls below GPT-Image-2's 655,360 min-pixel floor and would be rejected - quality pinned to medium (same rule as gpt-image-1.5) for predictable Nous Portal billing - BYOK (openai_api_key) deliberately omitted from supports so all users stay on shared FAL billing - 6 new tests covering preset mapping, quality pinning, and supports-whitelist integrity - Docs table + aspect-ratio map updated Live-tested end-to-end: 39.9s cold request, clean 1024x768 PNG	2026-04-21 13:35:31 -07:00
Teknium	ba4357d13b	fix(env_passthrough): reject Hermes provider credentials from skill passthrough (#13523 ) A skill declaring `required_environment_variables: [ANTHROPIC_TOKEN]` in its SKILL.md frontmatter silently bypassed the `execute_code` sandbox's credential-scrubbing guarantee. `register_env_passthrough` had no blocklist, so any name a skill chose flipped `is_env_passthrough(name) => True`, which shortcircuits the sandbox's secret filter. Fix: reject registration when the name appears in `_HERMES_PROVIDER_ENV_BLOCKLIST` (the canonical list of Hermes-managed credentials — provider keys, gateway tokens, etc.). Log a warning naming GHSA-rhgp-j443-p4rf so operators see the rejection in logs. Non-Hermes third-party API keys (TENOR_API_KEY for gif-search, NOTION_TOKEN for notion skills, etc.) remain legitimately registerable — they were never in the sandbox scrub list in the first place. Tests: 16 -> 17 passing. Two old tests that documented the bypass (`test_passthrough_allows_blocklisted_var`, `test_make_run_env_passthrough`) are rewritten to assert the new fail-closed behavior. New `test_non_hermes_api_key_still_registerable` locks in that legitimate third-party keys are unaffected. Reported in GHSA-rhgp-j443-p4rf by @q1uf3ng. Hardening; not CVE-worthy on its own per the decision matrix (attacker must already have operator consent to install a malicious skill).	2026-04-21 06:14:25 -07:00
Ben	724377c429	test(mcp): add failing tests for circuit-breaker recovery The MCP circuit breaker in tools/mcp_tool.py has no half-open state and no reset-on-reconnect behavior, so once it trips after 3 consecutive failures it stays tripped for the process lifetime. These tests lock in the intended recovery behavior: 1. test_circuit_breaker_half_opens_after_cooldown — after the cooldown elapses, the next call must actually probe the session; success closes the breaker. 2. test_circuit_breaker_reopens_on_probe_failure — a failed probe re-arms the cooldown instead of letting every subsequent call through. 3. test_circuit_breaker_cleared_on_reconnect — a successful OAuth recovery resets the breaker even if the post-reconnect retry fails (a successful reconnect is sufficient evidence the server is viable again). All three currently fail, as expected.	2026-04-21 05:19:03 -07:00
JackTheGit	77061ac995	Normalize FAL_KEY env handling (ignore whitespace-only values) Treat whitespace-only FAL_KEY the same as unset so users who export FAL_KEY=" " (or CI that leaves a blank token) get the expected 'not set' error path instead of a confusing downstream fal_client failure. Applied to the two direct FAL_KEY checks in image_generation_tool.py: image_generate_tool's upfront credential check and check_fal_api_key(). Both keep the existing managed-gateway fallback intact. Adapted the original whitespace/valid tests to pin the managed gateway to None so the whitespace assertion exercises the direct-key path rather than silently relying on gateway absence.	2026-04-21 02:04:21 -07:00
Teknium	5e6427a42c	fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage Follow-ups on top of @teyrebaz33's cherry-picked commit: 1. New shared helper format_no_match_hint() in fuzzy_match.py with a startswith('Could not find') gate so the snippet only appends to genuine no-match errors — not to 'Found N matches' (ambiguous), 'Escape-drift detected', or 'identical strings' errors, which would all mislead the model. 2. file_tools.patch_tool suppresses the legacy generic '[Hint: old_string not found...]' string when the rich 'Did you mean?' snippet is already attached — no more double-hint. 3. Wire the same helper into patch_parser.py (V4A patch mode, both _validate_operations and _apply_update) and skill_manager_tool.py so all three fuzzy callers surface the hint consistently. Tests: 7 new gating tests in TestFormatNoMatchHint cover every error class (ambiguous, drift, identical, non-zero match count, None error, no similar content, happy path). 34/34 test_fuzzy_match, 96/96 test_file_tools + test_patch_parser + test_skill_manager_tool pass. E2E verified across all four scenarios: no-match-with-similar, no-match-no-similar, ambiguous, success. V4A mode confirmed end-to-end with a non-matching hunk.	2026-04-21 02:03:46 -07:00
teyrebaz33	15abf4ed8f	feat(patch): add 'did you mean?' feedback when patch fails to match When patch_replace() cannot find old_string in a file, the error message now includes the closest matching lines from the file with line numbers and context. This helps the LLM self-correct without a separate read_file call. Implements Phase 1 of #536: enhanced patch error feedback with no architectural changes. - tools/fuzzy_match.py: new find_closest_lines() using SequenceMatcher - tools/file_operations.py: attach closest-lines hint to patch errors - tests/tools/test_fuzzy_match.py: 5 new tests for find_closest_lines	2026-04-21 02:03:46 -07:00
Teknium	2d7ff9c5bd	feat(tts): complete KittenTTS integration (tools/setup/docs/tests) Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the provider behaves like every other TTS backend end to end. - tools/tts_tool.py: `_check_kittentts_available()` helper and wire into `check_tts_requirements()`; extend Opus-conversion list to include kittentts (WAV → Opus for Telegram voice bubbles); point the missing-package error at `hermes setup tts`. - hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech" toolset picker, with a `kittentts` post_setup hook that auto-installs the wheel + soundfile via pip. - hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install flow in `_setup_tts_provider()`, provider_labels entry, and status row in the `hermes setup` summary. - website/docs/user-guide/features/tts.md: add KittenTTS to the provider table, config example, ffmpeg note, and the zero-config voice-bubble tip. - tests/tools/test_tts_kittentts.py: 10 unit tests covering generation, model caching, config passthrough, ffmpeg conversion, availability detection, and the missing-package dispatcher branch. E2E verified against the real `kittentts` wheel: - WAV direct output (pcm_s16le, 24kHz mono) - MP3 conversion via ffmpeg (from WAV) - Telegram flow (provider in Opus-conversion list) produces `codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the `[[audio_as_voice]]` marker - check_tts_requirements() returns True when kittentts is installed	2026-04-21 01:28:32 -07:00
Teknium	328223576b	feat(skills+terminal): make bundled skill scripts runnable out of the box (#13384 ) * feat(skills): inject absolute skill dir and expand ${HERMES_SKILL_DIR} templates When a skill loads, the activation message now exposes the absolute skill directory and substitutes ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in the SKILL.md body, so skills with bundled scripts can instruct the agent to run them by absolute path without an extra skill_view round-trip. Also adds opt-in inline-shell expansion: !`cmd` snippets in SKILL.md are pre-executed (with the skill directory as CWD) and their stdout is inlined into the message before the agent reads it. Off by default — enable via skills.inline_shell in config.yaml — because any snippet runs on the host without approval. Changes: - agent/skill_commands.py: template substitution, inline-shell expansion, absolute skill-dir header, supporting-files list now shows both relative and absolute forms. - hermes_cli/config.py: new skills.template_vars, skills.inline_shell, skills.inline_shell_timeout knobs. - tests/agent/test_skill_commands.py: coverage for header, both template tokens (present and missing session id), template_vars disable, inline-shell default-off, enabled, CWD, and timeout. - website/docs/developer-guide/creating-skills.md: documents the template tokens, the absolute-path header, and the opt-in inline shell with its security caveat. Validation: tests/agent/ 1591 passed (includes 9 new tests). E2E: loaded a real skill in an isolated HERMES_HOME; confirmed ${HERMES_SKILL_DIR} resolves to the absolute path, ${HERMES_SESSION_ID} resolves to the passed task_id, !`date` runs when opt-in is set, and stays literal when it isn't. * feat(terminal): source ~/.bashrc (and user-listed init files) into session snapshot bash login shells don't source ~/.bashrc, so tools that install themselves there — nvm, asdf, pyenv, cargo, custom PATH exports — stay invisible to the environment snapshot Hermes builds once per session. Under systemd or any context with a minimal parent env, that surfaces as 'node: command not found' in the terminal tool even though the binary is reachable from every interactive shell on the machine. Changes: - tools/environments/local.py: before the login-shell snapshot bootstrap runs, prepend guarded 'source <file>' lines for each resolved init file. Missing files are skipped, each source is wrapped with a '[ -r ... ] && . ... \|\| true' guard so a broken rc can't abort the bootstrap. - hermes_cli/config.py: new terminal.shell_init_files (explicit list, supports ~ and ${VAR}) and terminal.auto_source_bashrc (default on) knobs. When shell_init_files is set it takes precedence; when it's empty and auto_source_bashrc is on, ~/.bashrc gets auto-sourced. - tests/tools/test_local_shell_init.py: 10 tests covering the resolver (auto-bashrc, missing file, explicit override, ~/${VAR} expansion, opt-out) and the prelude builder (quoting, guarded sourcing), plus a real-LocalEnvironment snapshot test that confirms exports in the init file land in subsequent commands' environment. - website/docs/reference/faq.md: documents the fix in Troubleshooting, including the zsh-user pattern of sourcing ~/.zshrc or nvm.sh directly via shell_init_files. Validation: 10/10 new tests pass; tests/tools/test_local_*.py 40/40 pass; tests/agent/ 1591/1591 pass; tests/hermes_cli/test_config.py 50/50 pass. E2E in an isolated HERMES_HOME: confirmed that a fake ~/.bashrc setting a marker var and PATH addition shows up in a real LocalEnvironment().execute() call, that auto_source_bashrc=false suppresses it, that an explicit shell_init_files entry wins over the auto default, and that a missing bashrc is silently skipped.	2026-04-21 00:39:19 -07:00
helix4u	b48ea41d27	feat(voice): add cli beep toggle	2026-04-21 00:29:29 -07:00
Teknium	62cbeb6367	test: stop testing mutable data — convert change-detectors to invariants (#13363 ) Catalog snapshots, config version literals, and enumeration counts are data that changes as designed. Tests that assert on those values add no behavioral coverage — they just break CI on every routine update and cost engineering time to 'fix.' Replace with invariants where one exists, delete where none does. Deleted (pure snapshots): - TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al - TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models' - test_browser_camofox_state::test_config_version_matches_current_schema (docstring literally said it would break on unrelated bumps) Relaxed (keep plumbing check, drop snapshot): - Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists: now assert 'provider exists and has >= 1 entry' instead of specific names - HuggingFace main/models.py consistency test: drop 'len >= 6' floor Dynamicized (follow source, not a literal): - 3x test_config.py migration tests: raw['_config_version'] == DEFAULT_CONFIG['_config_version'] instead of hardcoded 21 Fixed stale tests against intentional behavior changes: - test_insights::test_gateway_format_hides_cost: name matches new behavior (no dollar figures); remove contradicting '$' in text assertion - test_config::prefers_api_then_url_then_base_url: flipped per PR #9332; rename + update to base_url > url > api - test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to assert called — contract is 'credential flowed through' - test_interrupt_propagation: add provider/model/_base_url to bare-agent fixture so the stale-timeout code path resolves Fixed stale integration tests against opt-in plugin gate: - transform_tool_result + transform_terminal_output: write plugins.enabled allow-list to config.yaml and reset the plugin manager singleton Source fix (real consistency invariant): - agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length (262144, same as K2.5). test_model_metadata_has_context_lengths was correctly catching the gap. Policy: - AGENTS.md Testing section: new subsection 'Don't write change-detector tests' with do/don't examples. Reviewers should reject catalog-snapshot assertions in new tests. Covers every test that failed on the last completed main CI run (24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present + test_terminal_and_file_toolsets_resolve_all_tools, which now pass both alone and with the full tests/tools/ directory (xdist ordering flake that resolved itself).	2026-04-20 23:20:33 -07:00
Junass1	735996d2ad	fix(tools/delegate): propagate resolved ACP runtime settings to child agents	2026-04-20 20:47:01 -07:00
cdanis	4a424f1fbb	feat(send_message): add media delivery support for Signal Cherry-picked from PR #13159 by @cdanis. Adds native media attachment delivery to Signal via signal-cli JSON-RPC attachments param. Signal messages with media now follow the same early-return pattern as Telegram/Discord/Matrix — attachments are sent only with the last chunk to avoid duplicates. Follow-up fixes on top of the original PR: - Moved Signal into its own early-return block above the restriction check (matches Telegram/Discord/Matrix pattern) - Fixed media_files being sent on every chunk in the generic loop - Restored restriction/warning guards to simple form (Signal exits early) - Fixed non-hermetic test writing to /tmp instead of tmp_path	2026-04-20 13:24:15 -07:00
Teknium	5a2118a70b	test: add _resolve_path tests + AUTHOR_MAP entry for aniruddhaadak80	2026-04-20 12:29:31 -07:00
Mibayy	3273f301b7	fix(stt): map cloud-only model names to valid local size for faster-whisper (#2544 ) Cherry-picked from PR #2545 by @Mibayy. The setup wizard could leave stt.model: "whisper-1" in config.yaml. When using the local faster-whisper provider, this crashed with "Invalid model size 'whisper-1'". Voice messages were silently ignored. _normalize_local_model() now detects cloud-only names (whisper-1, gpt-4o-transcribe, etc.) and maps them to the default local model with a warning. Valid local sizes (tiny, base, small, medium, large-v3) pass through unchanged. - Renamed _normalize_local_command_model -> _normalize_local_model (backward-compat wrapper preserved) - 6 new tests including integration test - Added lowercase AUTHOR_MAP alias for @Mibayy Closes #2544	2026-04-20 05:18:48 -07:00
Teknium	04068c5891	feat(plugins): add transform_tool_result hook for generic tool-result rewriting (#12972 ) Closes #8933 more fully, extending the per-tool transform_terminal_output hook from #12929 to a generic seam that fires after every tool dispatch. Plugins can rewrite any tool's result string (normalize formats, redact fields, summarize verbose output) without wrapping individual tools. Changes - hermes_cli/plugins.py: add "transform_tool_result" to VALID_HOOKS - model_tools.py: invoke the hook in handle_function_call after post_tool_call (which remains observational); first valid str return replaces the result; fail-open - tests/test_transform_tool_result_hook.py: 9 new tests covering no-op, None return, non-string return, first-match wins, kwargs, hook exception fallback, post_tool_call observation invariant, ordering vs post_tool_call, and an end-to-end real-plugin integration - tests/hermes_cli/test_plugins.py: assert new hook in VALID_HOOKS - tests/test_model_tools.py: extend the hook-call-sequence assertion to include the new hook Design - transform_tool_result runs AFTER post_tool_call so observers always see the original (untransformed) result. This keeps post_tool_call's observational contract. - transform_terminal_output (from #12929) still runs earlier, inside terminal_tool, so plugins can canonicalize BEFORE the 50k truncation drops middle content. Both hooks coexist; they target different layers.	2026-04-20 03:48:08 -07:00
Alexazhu	64a1368210	fix(tools): keep SSH ControlMaster socket path under macOS 104-byte limit On macOS, Unix domain socket paths are capped at 104 bytes (sun_path). SSH appends a 16-byte random suffix to the ControlPath when operating in ControlMaster mode. With an IPv6 host embedded literally in the filename and a deeply-nested macOS $TMPDIR like /var/folders/XX/YYYYYYYYYYYY/T/, the full path reliably exceeds the limit — every terminal/file-op tool call then fails immediately with ``unix_listener: path "…" too long for Unix domain socket``. Swap the ``user@host:port.sock`` filename for a sha256-derived 16-char hex digest. The digest is deterministic for a given (user, host, port) triple, so ControlMaster reuse across reconnects is preserved, and the full path fits comfortably under the limit even after SSH's random suffix. Collision space is 2^64 — effectively unreachable for the handful of concurrent connections any single Hermes process holds. Regression tests cover: path length under realistic macOS $TMPDIR with the IPv6 host from the issue report, determinism for reconnects, and distinctness across different (user, host, port) triples. Closes #11840	2026-04-20 03:07:32 -07:00
sjz-ks	2081b71c42	feat(tools): add terminal output transform hook	2026-04-20 03:04:06 -07:00
Teknium	be472138f3	fix(send_message): accept E.164 phone numbers for signal/sms/whatsapp (#12936 ) Follow-up to #12704. The SignalAdapter can resolve +E164 numbers to UUIDs via listContacts, but _parse_target_ref() in the send_message tool rejected '+' as non-digit and fell through to channel-name resolution — which fails for contacts without a prior session entry. Adds an E.164 branch in _parse_target_ref for phone-based platforms (signal, sms, whatsapp) that preserves the leading '+' so downstream adapters keep the format they expect. Non-phone platforms are unaffected. Reported by @qdrop17 on Discord after pulling #12704.	2026-04-20 03:02:44 -07:00
teyrebaz33	2d59afd3da	fix(docker): pass docker_mount_cwd_to_workspace and docker_forward_env to container_config in file_tools file_tools._get_file_ops() built a container_config dict for Docker/ Singularity/Modal/Daytona backends but omitted docker_mount_cwd_to_workspace and docker_forward_env. Both are read by _create_environment() from container_config, so file tools (read_file, write_file, patch, search) silently ignored those config values when running in Docker. Add the two missing keys to match the container_config already built by terminal_tool.terminal_tool(). Fixes #2672.	2026-04-20 00:58:16 -07:00
helix4u	6ab78401c9	fix(aux): add session_search extra_body and concurrency controls Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy OpenAI-compatible providers can receive provider-specific request fields (e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds session_search summary fan-out with auxiliary.session_search.max_concurrency (default 3, clamped 1-5) to avoid 429 bursts on small providers. - agent/auxiliary_client.py: extract _get_auxiliary_task_config helper, add _get_task_extra_body, merge config+explicit extra_body with explicit winning - hermes_cli/config.py: extra_body defaults on all aux tasks + session_search.max_concurrency; _config_version 19 -> 20 - tools/session_search_tool.py: semaphore around _summarize_all gather - tests: coverage in test_auxiliary_client, test_session_search, test_aux_config - docs: user-guide/configuration.md + fallback-providers.md Co-authored-by: Teknium <teknium@nousresearch.com>	2026-04-20 00:47:39 -07:00
kshitijk4poor	fd5df5fe8e	fix(camofox): honor auxiliary vision temperature\n\n- forward auxiliary.vision.temperature in camofox screenshot analysis\n- add regression tests for configured and default behavior	2026-04-20 00:32:09 -07:00
kshitijk4poor	9d88bdaf11	fix(browser): honor auxiliary.vision.temperature for screenshot analysis\n\n- mirror the vision tool's config bridge in browser_vision - add regression tests for configured and default temperature forwarding	2026-04-20 00:32:09 -07:00
kshitijk4poor	098d554aac	test: cover vision config temperature wiring\n\n- add regression tests for auxiliary.vision.temperature and timeout\n- add bugkill3r to AUTHOR_MAP for the salvaged commit	2026-04-20 00:32:09 -07:00
Teknium	323e827f4a	test: remove 8 flaky tests that fail under parallel xdist scheduling (#12784 ) These tests all pass in isolation but fail in CI due to test-ordering pollution on shared xdist workers. Each has a different root cause: - tests/tools/test_send_message_tool.py (4 tests): racing session ContextVar pollution — get_session_env returns '' instead of 'cli' default when an earlier test on the same worker leaves HERMES_SESSION_PLATFORM set. - tests/tools/test_skills_tool.py (2 tests): KeyError: 'gateway_setup_hint' from shared skill state mutation. - tests/tools/test_tts_mistral.py::test_telegram_produces_ogg_and_voice_compatible: pre-existing intermittent failure. - tests/hermes_cli/test_update_check.py::test_get_update_result_timeout: racing a background git-fetch thread that writes a real commits-behind value into module-level _update_result before assertion. All 8 have been failing on main for multiple runs with no clear path to a safe fix that doesn't require restructuring the tests' isolation story. Removing is cheaper than chasing — the code paths they cover are exercised elsewhere (send_message has 73+ other tests, skills_tool has extensive coverage, TTS has other backend tests, update check has other tests for check_for_updates proper). Validation: all 4 files now pass cleanly: 169/169 under CI-parity env.	2026-04-19 19:38:02 -07:00
Teknium	c9b833feb3	fix(ci): unblock test suite + cut ~2s of dead Z.AI probes from every AIAgent CI on main had 7 failing tests. Five were stale test fixtures; one (agent cache spillover timeout) was covering up a real perf regression in AIAgent construction. The perf bug: every AIAgent.__init__ calls _check_compression_model_feasibility → resolve_provider_client('auto') → _resolve_api_key_provider which iterates PROVIDER_REGISTRY. When it hits 'zai', it unconditionally calls resolve_api_key_provider_credentials → _resolve_zai_base_url → probes 8 Z.AI endpoints with an empty Bearer token (all 401s), ~2s of pure latency per agent, even when the user has never touched Z.AI. Landed in `9e844160` (PR for credential-pool Z.AI auto-detect) — the short-circuit when api_key is empty was missing. _resolve_kimi_base_url had the same shape; fixed too. Test fixes: - tests/gateway/test_voice_command.py: _make_adapter helpers were missing self._voice_locks (added in PR #12644, 7 call sites — all updated). - tests/test_toolsets.py: test_hermes_platforms_share_core_tools asserted equality, but hermes-discord has discord_server (DISCORD_BOT_TOKEN-gated, discord-only by design). Switched to subset check. - tests/run_agent/test_streaming.py: test_tool_name_not_duplicated_when_resent_per_chunk missing api_key/base_url — classic pitfall (PR #11619 fixed 16 of these; this one slipped through on a later commit). - tests/tools/test_discord_tool.py: TestConfigAllowlist caplog assertions fail in parallel runs because AIAgent(quiet_mode=True) globally sets logging.getLogger('tools').setLevel(ERROR) and xdist workers are persistent. Autouse fixture resets the 'tools' and 'tools.discord_tool' levels per test. Validation: tests/cron + voice + agent_cache + streaming + toolsets + command_guards + discord_tool: 550/550 pass tests/hermes_cli + tests/gateway: 5713/5713 pass AIAgent construction without Z.AI creds: 2.2s → 0.24s (9x)	2026-04-19 19:18:19 -07:00
handsdiff	abfc1847b7	fix(terminal): rewrite `A && B &` to `A && { B & }` to prevent subshell leak bash parses `A && B &` with `&&` tighter than `&`, so it forks a subshell for the compound and backgrounds the subshell. Inside the subshell, B runs foreground, so the subshell waits for B. When B is a process that doesn't naturally exit (`python3 -m http.server`, `yes > /dev/null`, a long-running daemon), the subshell is stuck in `wait4` forever and leaks as an orphan reparented to init. Observed in production: agents running `cd X && python3 -m http.server 8000 &>/dev/null & sleep 1 && curl ...` as a "start a local server, then verify it" one-liner. Outer bash exits cleanly; the subshell never does. Across ~3 days of use, 8 unique stuck-terminal events and 7 leaked bash+server pairs accumulated on the fleet, with some sessions appearing hung from the user's perspective because the subshell's open stdout pipe kept the terminal tool's drain thread blocked. This is distinct from the `set +m` fix in `933fbd8f` (which addressed interactive-shell job-control waiting at exit). `set +m` doesn't help here because `bash -c` is non-interactive and job control is already off; the problem is the subshell's own internal wait for its foreground B, not the outer shell's job-tracking. The fix: walk the command shell-aware (respecting quotes, parens, brace groups, `&>`/`>&` redirects), find `A && B &` / `A \|\| B &` at depth 0 and rewrite the tail to `A && { B & }`. Brace groups don't fork a subshell — they run in the current shell. `B &` inside the group is a simple background (no subshell wait). The outer `&` is absorbed into the group, so the compound no longer needs an explicit subshell. `&&` error-propagation is preserved exactly: if A fails, `&&` short-circuits and B never runs. - Skips quoted strings, comment lines, and `(…)` subshells - Handles `&>/dev/null`, `2>&1`, `>&2` without mistaking them for `&` - Resets chain state at `;`, `\|`, and newlines - Tracks brace depth so already-rewritten output is idempotent - Walks using the existing `_read_shell_token` tokenizer, matching the pattern of `_rewrite_real_sudo_invocations` Called once from `BaseEnvironment.execute` right after `_prepare_command`, so it runs for every backend (local, ssh, docker, modal, etc.) with no per-backend plumbing. 34 new tests covering rewrite cases, preservation cases, redirect edge-cases, quoting/parens/backticks, idempotency, and empty/edge inputs. End-to-end verified on a test VM: the exact vela-incident command now returns in ~1.3s with no leaked bash, only the intentional backgrounded server. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:53:11 -07:00
etherman-os	d50a9b20d2	terminal: steer long-lived server commands to background mode	2026-04-19 16:47:20 -07:00
Teknium	a3a4932405	fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025 ) (#12717 ) * [verified] fix(mcp-oauth): bridge httpx auth_flow bidirectional generator HermesMCPOAuthProvider.async_auth_flow wrapped the SDK's auth_flow with 'async for item in super().async_auth_flow(request): yield item', which discards httpx's .asend(response) values and resumes the inner generator with None. This broke every OAuth MCP server on the first HTTP response with 'NoneType' object has no attribute 'status_code' crashing at mcp/client/auth/oauth2.py:505. Replace with a manual bridge that forwards .asend() values into the inner generator, preserving httpx's bidirectional auth_flow contract. Add tests/tools/test_mcp_oauth_bidirectional.py with two regression tests that drive the flow through real .asend() round-trips. These catch the bug at the unit level; prior tests only exercised _initialize() and disk-watching, never the full generator protocol. Verified against BetterStack MCP: Before: 'Connection failed (11564ms): NoneType...' after 3 retries After: 'Connected (2416ms); Tools discovered: 83' Regression from #11383. * [verified] fix(mcp-oauth): seed token_expiry_time + pre-flight AS discovery on cold-load PR #11383's consolidation fixed external-refresh reloading and 401 dedup but left two latent bugs that surfaced on BetterStack and any other OAuth MCP with a split-origin authorization server: 1. HermesTokenStorage persisted only a relative 'expires_in', which is meaningless after a process restart. The MCP SDK's OAuthContext does NOT seed token_expiry_time in _initialize, so is_token_valid() returned True for any reloaded token regardless of age. Expired tokens shipped to servers, and app-level auth failures (e.g. BetterStack's 'No teams found. Please check your authentication.') were invisible to the transport-layer 401 handler. 2. Even once preemptive refresh did fire, the SDK's _refresh_token falls back to {server_url}/token when oauth_metadata isn't cached. For providers whose AS is at a different origin (BetterStack: mcp.betterstack.com for MCP, betterstack.com/oauth/token for the token endpoint), that fallback 404s and drops into full browser re-auth on every process restart. Fix set: - HermesTokenStorage.set_tokens persists an absolute wall-clock expires_at alongside the SDK's OAuthToken JSON (time.time() + TTL at write time). - HermesTokenStorage.get_tokens reconstructs expires_in from max(expires_at - now, 0), clamping expired tokens to zero TTL. Legacy files without expires_at fall back to file-mtime as a best-effort wall-clock proxy, self-healing on the next set_tokens. - HermesMCPOAuthProvider._initialize calls super(), then update_token_expiry on the reloaded tokens so token_expiry_time reflects actual remaining TTL. If tokens are loaded but oauth_metadata is missing, pre-flight PRM + ASM discovery runs via httpx.AsyncClient using the MCP SDK's own URL builders and response handlers (build_protected_resource_metadata_discovery_urls, handle_auth_metadata_response, etc.) so the SDK sees the correct token_endpoint before the first refresh attempt. Pre-flight is skipped when there are no stored tokens to keep fresh-install paths zero-cost. Test coverage (tests/tools/test_mcp_oauth_cold_load_expiry.py): - set_tokens persists absolute expires_at - set_tokens skips expires_at when token has no expires_in - get_tokens round-trips expires_at -> remaining expires_in - expired tokens reload with expires_in=0 - legacy files without expires_at fall back to mtime proxy - _initialize seeds token_expiry_time from stored tokens - _initialize flags expired-on-disk tokens as is_token_valid=False - _initialize pre-flights PRM + ASM discovery with mock transport - _initialize skips pre-flight when no tokens are stored Verified against BetterStack MCP: hermes mcp test betterstack -> Connected (2508ms), 83 tools mcp_betterstack_telemetry_list_teams_tool -> real team data, not 'No teams found. Please check your authentication.' Reference: mcp-oauth-token-diagnosis skill, Fix A. * chore: map hermes@noushq.ai to benbarclay in AUTHOR_MAP Needed for CI attribution check on cherry-picked commits from PR #12025. --------- Co-authored-by: Hermes Agent <hermes@noushq.ai>	2026-04-19 16:31:07 -07:00
Teknium	aa5bd09232	fix(tests): unstick CI — sweep stale tests from recent merges (#12670 ) One source fix (web_server category merge) + five test updates that didn't travel with their feature PRs. All 13 failures on the 04-19 CI run on main are now accounted for (5 already self-healed on main; 8 fixed here). Changes - web_server.py: add code_execution → agent to _CATEGORY_MERGE (new singleton section from #11971 broke no-single-field-category invariant). - test_browser_camofox_state: bump hardcoded _config_version 18 → 19 (also from #11971). - test_registry: add browser_cdp_tool (#12369) and discord_tool (#4753) to the expected built-in tool set. - test_run_agent::test_tool_call_accumulation: rewrite fragment chunks — #`0f778f77` switched streaming name-accumulation from += to = to fix MiniMax/NIM duplication; the test still encoded the old fragment-per-chunk premise. - test_concurrent_interrupt::_Stub: no-op _apply_pending_steer_to_tool_results — #12116 added this call after concurrent tool batches; the hand-rolled stub was missing it. - test_codex_cli_model_picker: drop the two obsolete tests that asserted auto-import from ~/.codex/auth.json into the Hermes auth store. #12360 explicitly removed that behavior (refresh-token reuse races with Codex CLI / VS Code); adoption is now explicit via `hermes auth openai-codex`. Remaining 3 tests in the file (normal path, Claude Code fallback, negative case) still cover the picker. Validation - scripts/run_tests.sh across all 6 affected files + surrounding tests (54 tests total) all green locally.	2026-04-19 12:39:58 -07:00
Teknium	d2c2e34469	fix(patch): catch silent persistence failures and escape-drift in tool-call transport (#12669 ) Two hardening layers in the patch tool, triggered by a real silent failure in the previous session: (1) Post-write verification in patch_replace — after write_file succeeds, re-read the file and confirm the bytes on disk match the intended write. If not, return an error instead of the current success-with-diff. Catches silent persistence failures from any cause (backend FS oddities, stdin pipe truncation, concurrent task races, mount drift). (2) Escape-drift guard in fuzzy_find_and_replace — when a non-exact strategy matches and both old_string and new_string contain literal \' or \" sequences but the matched file region does not, reject the patch with a clear error pointing at the likely cause (tool-call serialization adding a spurious backslash around apostrophes/quotes). Exact matches bypass the guard, and legitimate edits that add or preserve escape sequences in files that already have them still work. Why: in a prior tool call, old_string was sent with \' where the file has ' (tool-call transport drift). The fuzzy matcher's block_anchor strategy matched anyway and produced a diff the tool reported as successful — but the file was never modified on disk. The agent moved on believing the edit landed when it hadn't. Tests: added TestPatchReplacePostWriteVerification (3 cases) and TestEscapeDriftGuard (6 cases). All pass, existing fuzzy match and file_operations tests unaffected.	2026-04-19 12:27:34 -07:00
Teknium	ef73367fc5	feat: add Discord server introspection and management tool (#4753 ) * feat: add Discord server introspection and management tool Add a discord_server tool that gives the agent the ability to interact with Discord servers when running on the Discord gateway. Uses Discord REST API directly with the bot token — no dependency on the gateway adapter's discord.py client. The tool is only included in the hermes-discord toolset (zero cost for users on other platforms) and gated on DISCORD_BOT_TOKEN via check_fn. Actions (14): - Introspection: list_guilds, server_info, list_channels, channel_info, list_roles, member_info, search_members - Messages: fetch_messages, list_pins, pin_message, unpin_message - Management: create_thread, add_role, remove_role This addresses a gap where users on Discord could not ask Hermes to review server structure, channels, roles, or members — a task competing agents (OpenClaw) handle out of the box. Files changed: - tools/discord_tool.py (new): Tool implementation + registration - model_tools.py: Add to discovery list - toolsets.py: Add to hermes-discord toolset only - tests/tools/test_discord_tool.py (new): 43 tests covering all actions, validation, error handling, registration, and toolset scoping * feat(discord): intent-aware schema filtering + config allowlist + schema cleanup - _detect_capabilities() hits GET /applications/@me once per process to read GUILD_MEMBERS / MESSAGE_CONTENT privileged intent bits. - Schema is rebuilt per-session in model_tools.get_tool_definitions: hides search_members / member_info when GUILD_MEMBERS intent is off, annotates fetch_messages description when MESSAGE_CONTENT is off. - New config key discord.server_actions (comma-separated or YAML list) lets users restrict which actions the agent can call, intersected with intent availability. Unknown names are warned and dropped. - Defense-in-depth: runtime handler re-checks the allowlist so a stale cached schema cannot bypass a tightened config. - Schema description rewritten as an action-first manifest (signature per action) instead of per-parameter 'required for X, Y, Z' cross-refs. ~25% shorter; model can see each action's required params at a glance. - Added bounds: limit gets minimum=1 maximum=100, auto_archive_duration becomes an enum of the 4 valid Discord values. - 403 enrichment: runtime 403 errors are mapped to actionable guidance (which permission is missing and what to do about it) instead of the raw Discord error body. - 36 new tests: capability detection with caching and force refresh, config allowlist parsing (string/list/invalid/unknown), intent+allowlist intersection, dynamic schema build, runtime allowlist enforcement, 403 enrichment, and model_tools integration wiring.	2026-04-19 11:52:19 -07:00
Teknium	f336ae3d7d	fix(environments): use incremental UTF-8 decoder in select-based drain The first draft of the fix called `chunk.decode("utf-8")` directly on each 4096-byte `os.read()` result, which corrupts output whenever a multi-byte UTF-8 character straddles a read boundary: * `UnicodeDecodeError` fires on the valid-but-truncated byte sequence. * The except handler clears ALL previously-decoded output and replaces the whole buffer with `[binary output detected ...]`. Empirically: 10000 '日' chars (30001 bytes) through the wrapper loses all 10000 characters on the first draft; the baseline TextIOWrapper drain (which uses `encoding='utf-8', errors='replace'` on Popen) preserves them all. This regression affects any command emitting non-ASCII output larger than one chunk — CJK/Arabic/emoji in `npm install`, `pip install`, `docker logs`, `kubectl logs`, etc. Fix: swap to `codecs.getincrementaldecoder('utf-8')(errors='replace')`, which buffers partial multi-byte sequences across chunks and substitutes U+FFFD for genuinely invalid bytes. Flush on drain exit via `decoder.decode(b'', final=True)` to emit any trailing replacement character for a dangling partial sequence. Adds two regression tests: * test_utf8_multibyte_across_read_boundary — 10000 U+65E5 chars, verifies count round-trips and no fallback fires. * test_invalid_utf8_uses_replacement_not_fallback — deliberate \xff\xfe between valid ASCII, verifies surrounding text survives.	2026-04-19 11:27:50 -07:00
Teknium	0a02fbd842	fix(environments): prevent terminal hang when commands background children (#8340 ) When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`, etc.), the backgrounded grandchild inherits the write-end of our stdout pipe via fork(). The old `for line in proc.stdout` drain never EOF'd until the grandchild closed the pipe — so for a uvicorn server, the terminal tool hung indefinitely (users reported the whole session deadlocking when asking the agent to restart a backend). Fix: switch _drain() to select()-based non-blocking reads and stop draining shortly after bash exits even if the pipe hasn't EOF'd. Any output the grandchild writes after that point goes to an orphaned pipe, which is exactly what the user asked for when they said '&'. Adds regression tests covering the issue's exact repro and 5 related patterns (plain bg, setsid+disown, streaming output, high volume, timeout, UTF-8).	2026-04-19 11:27:50 -07:00
Teknium	ea0bd81b84	feat(skills): consolidate find-nearby into maps as a single location skill find-nearby and the (new) maps optional skill both used OpenStreetMap's Overpass + Nominatim to answer the same question — 'what's near this location?' — so shipping both would be duplicate code for overlapping capability. Consolidate into one active-by-default skill at skills/productivity/maps/ that is a strict superset of find-nearby. Moves + deletions: - optional-skills/productivity/maps/ → skills/productivity/maps/ (active, no install step needed) - skills/leisure/find-nearby/ → DELETED (fully superseded) Upgrades to maps_client.py so it covers everything find-nearby did: - Overpass server failover — tries overpass-api.de then overpass.kumi.systems so a single-mirror outage doesn't break the skill (new overpass_query helper, used by both nearby and bbox) - nearby now accepts --near "<address>" as a shortcut that auto-geocodes, so one command replaces the old 'search → copy coords → nearby' chain - nearby now accepts --category (repeatable) for multi-type queries in one call (e.g. --category restaurant --category bar), results merged and deduped by (osm_type, osm_id), sorted by distance, capped at --limit - Each nearby result now includes maps_url (clickable Google Maps search link) and directions_url (Google Maps directions from the search point — only when a ref point is known) - Promoted commonly-useful OSM tags to top-level fields on each result: cuisine, hours (opening_hours), phone, website — instead of forcing callers to dig into the raw tags dict SKILL.md: - Version bumped 1.1.0 → 1.2.0, description rewritten to lead with capability surface - New 'Working With Telegram Location Pins' section replacing find-nearby's equivalent workflow - metadata.hermes.supersedes: [find-nearby] so tooling can flag any lingering references to the old skill External references updated: - optional-skills/productivity/telephony/SKILL.md — related_skills find-nearby → maps - website/docs/reference/skills-catalog.md — removed the (now-empty) 'leisure' section, added 'maps' row under productivity - website/docs/user-guide/features/cron.md — find-nearby example usages swapped to maps - tests/tools/test_cronjob_tools.py, tests/hermes_cli/test_cron.py, tests/cron/test_scheduler.py — fixture string values swapped - cli.py:5290 — /cron help-hint example swapped Not touched: - RELEASE_v0.2.0.md — historical record, left intact E2E-verified live (Nominatim + Overpass, one query each): - nearby --near "Times Square" --category restaurant --category bar → 3 results, sorted by distance, all with maps_url, directions_url, cuisine, phone, website where OSM had the tags All 111 targeted tests pass across tests/cron/, tests/tools/, tests/hermes_cli/.	2026-04-19 05:19:22 -07:00
Teknium	ce410521b3	feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369 ) Agents can now send arbitrary CDP commands to the browser. The tool is gated on a reachable CDP endpoint at session start — it only appears in the toolset when BROWSER_CDP_URL is set (from '/browser connect') or 'browser.cdp_url' is configured in config.yaml. Backends that don't currently expose CDP to the Python side (Camofox, default local agent-browser, cloud providers whose per-session cdp_url is not yet surfaced) do not see the tool at all. Tool schema description links to the CDP method reference at https://chromedevtools.github.io/devtools-protocol/ so the agent can web_extract specific method docs on demand. Stateless per call. Browser-level methods (Target., Browser., Storage.*) omit target_id. Page-level methods attach to the target with flatten=true and dispatch the method on the returned sessionId. Clean errors when the endpoint becomes unreachable mid-session or the URL isn't a WebSocket. Tests: 19 unit (mock CDP server + gate checks) + E2E against real headless Chrome (Target.getTargets, Browser.getVersion, Runtime.evaluate with target_id, Page.navigate + re-eval, bogus method, bogus target_id, missing endpoint) + E2E of the check_fn gate (tool hidden without CDP URL, visible with it, hidden again after unset).	2026-04-19 00:03:10 -07:00
Teknium	762f7e9796	feat: configurable approval mode for cron jobs (approvals.cron_mode) Add approvals.cron_mode config option that controls how cron jobs handle dangerous commands. Previously, cron jobs silently auto-approved all dangerous commands because there was no user present to approve them. Now the behavior is configurable: - deny (default): block dangerous commands and return a message telling the agent to find an alternative approach. The agent loop continues — it just can't use that specific command. - approve: auto-approve all dangerous commands (previous behavior). When a command is blocked, the agent receives the same response format as a user denial in the CLI — exit_code=-1, status=blocked, with a message explaining why and pointing to the config option. This keeps the agent loop running and encourages it to adapt. Implementation: - config.py: add approvals.cron_mode to DEFAULT_CONFIG - scheduler.py: set HERMES_CRON_SESSION=1 env var before agent runs - approval.py: both check_command_approval() and check_all_command_guards() now check for cron sessions and apply the configured mode - 21 new tests covering config parsing, deny/approve behavior, and interaction with other bypass mechanisms (yolo, containers)	2026-04-18 19:24:35 -07:00
Teknium	285bb2b915	feat(execute_code): add project/strict execution modes, default to project (#11971 ) Weaker models (Gemma-class) repeatedly rediscover and forget that execute_code uses a different CWD and Python interpreter than terminal(), causing them to flip-flop on whether user files exist and to hit import errors on project dependencies like pandas. Adds a new 'code_execution.mode' config key (default 'project') that brings execute_code into line with terminal()'s filesystem/interpreter: project (new default): - cwd = session's TERMINAL_CWD (falls back to os.getcwd()) - python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python with a Python 3.8+ version check; falls back cleanly to sys.executable if no venv or the candidate fails - result : 'import pandas' works, '.env' resolves, matches terminal() strict (opt-in): - cwd = staging tmpdir (today's behavior) - python = sys.executable (today's behavior) - result : maximum reproducibility and isolation; project deps won't resolve Security-critical invariants are identical across both modes and covered by explicit regression tests: - env scrubbing (strips _API_KEY, _TOKEN, _SECRET, _PASSWORD, _CREDENTIAL, _PASSWD, *_AUTH substrings) - SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no delegate_task, no MCP from inside scripts) - resource caps (5-min timeout, 50KB stdout, 50 tool calls) Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool descriptions (regression from commit `39b83f34` where agents on local backends falsely believed they were sandboxed and refused networking). Override via env var: HERMES_EXECUTE_CODE_MODE=strict\|project	2026-04-18 01:46:25 -07:00
Teknium	598cba62ad	test: update stale tests to match current code (#11963 ) Seven test files were asserting against older function signatures and behaviors. CI has been red on main because of accumulated test debt from other PRs; this catches the tests up. - tests/agent/test_subagent_progress.py: _build_child_progress_callback now takes (task_index, goal, parent_agent, task_count=1); update all call sites and rewrite tests that assumed the old 'batch-only' relay semantics (now relays per-tool AND flushes a summary at BATCH_SIZE). Renamed test_thinking_not_relayed_to_gateway → test_thinking_relayed_to_gateway since thinking IS now relayed as subagent.thinking. - tests/tools/test_delegate.py: _build_child_agent now requires task_count; add task_count=1 to all 8 call sites. - tests/cli/test_reasoning_command.py: AIAgent gained _stream_callback; stub it on the two test agent helpers that use spec=AIAgent / __new__. - tests/hermes_cli/test_cmd_update.py: cmd_update now runs npm install in repo root + ui-tui/ + web/ and 'npm run build' in web/; assert all four subprocess calls in the expected order. - tests/hermes_cli/test_model_validation.py: dissimilar unknown models now return accepted=False (previously True with warning); update both affected tests. - tests/tools/test_registry.py: include feishu_doc_tool and feishu_drive_tool in the expected builtin tool set. - tests/gateway/test_voice_command.py: missing-voice-deps message now suggests 'pip install PyNaCl' not 'hermes-agent[messaging]'. 411/411 pass locally across these 7 files.	2026-04-17 21:35:30 -07:00

1 2 3 4 5 ...

572 commits