Commit graph

1950 commits

Author SHA1 Message Date
Teknium
3207b9bda0
test: speed up slow tests (backoff + subprocess + IMDS network) (#11797)
Cuts shard-3 local runtime in half by neutralizing real wall-clock
waits across three classes of slow test:

## 1. Retry backoff mocks

- tests/run_agent/conftest.py (NEW): autouse fixture mocks
  jittered_backoff to 0.0 so the `while time.time() < sleep_end`
  busy-loop exits immediately. No global time.sleep mock (would
  break threading tests).
- test_anthropic_error_handling, test_413_compression,
  test_run_agent_codex_responses, test_fallback_model: per-file
  fixtures mock time.sleep / asyncio.sleep for retry / compression
  paths.
- test_retaindb_plugin: cap the retaindb module's bound time.sleep
  to 0.05s via a per-test shim (background writer-thread retries
  sleep 2s after errors; tests don't care about exact duration).
  Plus replace arbitrary time.sleep(N) waits with short polling
  loops bounded by deadline.

## 2. Subprocess sleeps in production code

- test_update_gateway_restart: mock time.sleep. Production code
  does time.sleep(3) after `systemctl restart` to verify the
  service survived. Tests mock subprocess.run \u2014 nothing actually
  restarts \u2014 so the wait is dead time.

## 3. Network / IMDS timeouts (biggest single win)

- tests/conftest.py: add AWS_EC2_METADATA_DISABLED=true plus
  AWS_METADATA_SERVICE_TIMEOUT=1 and ATTEMPTS=1. boto3 falls back
  to IMDS (169.254.169.254) when no AWS creds are set. Any test
  hitting has_aws_credentials() / resolve_aws_auth_env_var() (e.g.
  test_status, test_setup_copilot_acp, anything that touches
  provider auto-detect) burned ~2-4s waiting for that to time out.
- test_exit_cleanup_interrupt: explicitly mock
  resolve_runtime_provider which was doing real network auto-detect
  (~4s). Tests don't care about provider resolution \u2014 the agent
  is already mocked.
- test_timezone: collapse the 3-test "TZ env in subprocess" suite
  into 2 tests by checking both injection AND no-leak in the same
  subprocess spawn (was 3 \u00d7 3.2s, now 2 \u00d7 4s).

## Validation

| Test | Before | After |
|---|---|---|
| test_anthropic_error_handling (8 tests) | ~80s | ~15s |
| test_413_compression (14 tests) | ~18s | 2.3s |
| test_retaindb_plugin (67 tests) | ~13s | 1.3s |
| test_status_includes_tavily_key | 4.0s | 0.05s |
| test_setup_copilot_acp_skips_same_provider_pool_step | 8.0s | 0.26s |
| test_update_gateway_restart (5 tests) | ~18s total | ~0.35s total |
| test_exit_cleanup_interrupt (2 tests) | 8s | 1.5s |
| **Matrix shard 3 local** | **108s** | **50s** |

No behavioral contract changed \u2014 tests still verify retry happens,
service restart logic runs, etc.; they just don't burn real seconds
waiting for it.

Supersedes PR #11779 (those changes are included here).
2026-04-17 14:21:22 -07:00
Teknium
eb07c05646
fix(gateway): prune stale SessionStore entries to bound memory + disk (#11789)
SessionStore._entries grew unbounded.  Every unique
(platform, chat_id, thread_id, user_id) tuple ever seen was kept in
RAM and rewritten to sessions.json on every message.  A Discord bot
in 100 servers x 100 channels x ~100 rotating users accumulates on
the order of 10^5 entries after a few months; each sessions.json
write becomes an O(n) fsync.  Nothing trimmed this — there was no
TTL, no cap, no eviction path.

Changes
-------
* SessionStore.prune_old_entries(max_age_days) — drops entries whose
  updated_at is older than the cutoff.  Preserves:
    - suspended entries (user paused them via /stop for later resume)
    - entries with an active background process attached
  Pruning is functionally identical to a natural reset-policy expiry:
  SQLite transcript stays, session_key -> session_id mapping dropped,
  returning user gets a fresh session.

* GatewayConfig.session_store_max_age_days (default 90; 0 disables).
  Serialized in to_dict/from_dict, coerced from bad types / negatives
  to safe defaults.  No migration needed — missing field -> 90 days.

* _session_expiry_watcher calls prune_old_entries once per hour
  (first tick is immediate).  Uses the existing watcher loop so no
  new background task is created.

Why not more aggressive
-----------------------
90 days is long enough that legitimate long-idle users (seasonal,
vacation, etc.) aren't surprised — pruning just means they get a
fresh session on return, same outcome they'd get from any other
reset-policy trigger.  Admins can lower it via config; 0 disables.

Tests
-----
tests/gateway/test_session_store_prune.py — 17 cases covering:
  * entry age based on updated_at, not created_at
  * max_age_days=0 disables; negative coerces to 0
  * suspended + active-process entries are skipped
  * _save fires iff something was removed
  * disk JSON reflects post-prune state
  * thread safety against concurrent readers
  * config field roundtrips + graceful fallback on bad values
  * watcher gate logic (first tick prunes, subsequent within 1h don't)

119 broader session/gateway tests remain green.
2026-04-17 13:48:49 -07:00
Teknium
f362083c64 fix(providers): complete NVIDIA NIM parity with other providers
Follow-up on the native NVIDIA NIM provider salvage. The original PR wired
PROVIDER_REGISTRY + HERMES_OVERLAYS correctly but missed several touchpoints
required for full parity with other OpenAI-compatible providers (xai,
huggingface, deepseek, zai).

Gaps closed:

- hermes_cli/main.py:
  - Add 'nvidia' to the _model_flow_api_key_provider dispatch tuple so
    selecting 'NVIDIA NIM' in `hermes model` actually runs the api-key
    provider flow (previously fell through silently).
  - Add 'nvidia' to `hermes chat --provider` argparse choices so the
    documented test command (`hermes chat --provider nvidia --model ...`)
    parses successfully.

- hermes_cli/config.py: Register NVIDIA_API_KEY and NVIDIA_BASE_URL in
  OPTIONAL_ENV_VARS so setup wizard can prompt for them and they're
  auto-added to the subprocess env blocklist.

- hermes_cli/doctor.py: Add NVIDIA NIM row to `_apikey_providers` so
  `hermes doctor` probes https://integrate.api.nvidia.com/v1/models.

- hermes_cli/dump.py: Add NVIDIA_API_KEY → 'nvidia' mapping for
  `hermes dump` credential masking.

- tests/tools/test_local_env_blocklist.py: Extend registry_vars fixture
  with NVIDIA_API_KEY to verify it's blocked from leaking into subprocesses.

- agent/model_metadata.py: Add 'nemotron' → 131072 context-length entry
  so all Nemotron variants get 128K context via substring match (rather
  than falling back to MINIMUM_CONTEXT_LENGTH).

- hermes_cli/models.py: Fix hallucinated model ID
  'nvidia/nemotron-3-nano-8b-a4b' → 'nvidia/nemotron-3-nano-30b-a3b'
  (verified against live integrate.api.nvidia.com/v1/models catalog).
  Expand curated list from 5 to 9 agentic models mapping to OpenRouter
  defaults per provider-guide convention: add qwen3.5-397b-a17b,
  deepseek-v3.2, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-120b.

- cli-config.yaml.example: Document 'nvidia' provider option.

- scripts/release.py: Map asurla@nvidia.com → anniesurla in AUTHOR_MAP
  for CI attribution.

E2E verified: `hermes chat --provider nvidia ...` now reaches NVIDIA's
endpoint (returns 401 with bogus key instead of argparse error);
`hermes doctor` detects NVIDIA NIM when NVIDIA_API_KEY is set.
2026-04-17 13:47:46 -07:00
asurla
3b569ff576 feat(providers): add native NVIDIA NIM provider
Adds NVIDIA NIM as a first-class provider: ProviderConfig in
auth.py, HermesOverlay in providers.py, curated models
(Nemotron plus other open source models hosted on
build.nvidia.com), URL mapping in model_metadata.py, aliases
(nim, nvidia-nim, build-nvidia, nemotron), and env var tests.

Docs updated: providers page, quickstart table, fallback
providers table, and README provider list.
2026-04-17 13:47:46 -07:00
Teknium
cc3aa76675
build(deps): add qrcode to dingtalk + feishu extras (parity with messaging) (#11627)
#4b1567f4 (anthhub) added qrcode to the messaging extra for Weixin's
QR login. The same package is needed by:

  * hermes_cli/dingtalk_auth.py — QR device-flow auth shipped in #11574
  * gateway/platforms/feishu.py:3962 — Feishu QR login

These extras are independent of [messaging] (users can install
hermes-agent[dingtalk] or hermes-agent[feishu] without [messaging]),
so the dep needs to be declared on each.

Pin matches anthhub's choice (>=7.0,<8) for consistency. The all
extra inherits from all three, so it picks up qrcode transitively.

Adds parallel tests to tests/test_project_metadata.py — same shape
as test_messaging_extra_includes_qrcode_for_weixin_setup.

Refs #9431.
2026-04-17 13:31:53 -07:00
Teknium
2ff1ef6ae6
fix(surrogates): sanitize reasoning/reasoning_content/reasoning_details fields (#11628)
Byte-level reasoning models (xiaomi/mimo-v2-pro, kimi, glm) can emit lone
surrogates in reasoning output. The proactive sanitizer walked content/
name/tool_calls but not extra fields like reasoning or the nested
reasoning_details array. Surrogates in those fields survived the
proactive pass, crashed json.dumps() in the OpenAI SDK, and the recovery
block's _sanitize_messages_surrogates(messages) call also didn't check
those fields — so 'found' was False, no retry happened, and after 3
attempts the user saw:

  API call failed after 3 retries. 'utf-8' codec can't encode characters
  in position N-M: surrogates not allowed

Changes:
- _sanitize_messages_surrogates: walk any extra string fields (reasoning,
  reasoning_content, etc.) and recurse into nested dict/list values
  (reasoning_details). Mirrors _sanitize_messages_non_ascii coverage
  added in PR #10537.
- _sanitize_structure_surrogates: new recursive walker, mirror of
  _sanitize_structure_non_ascii but for surrogate recovery.
- UnicodeEncodeError recovery block: also sanitize api_messages,
  api_kwargs, and prefill_messages (not just the canonical messages
  list — the API-copy carries reasoning_content transformed from
  reasoning and that's what the SDK actually serializes). Always
  retry on detected surrogate errors, not only when we found
  something to strip — gate on error type per PR #10537's pattern.

Tests: extended tests/cli/test_surrogate_sanitization.py with
coverage for reasoning, reasoning_content, reasoning_details (flat
and deeply nested), structure walker, and an integration case that
reproduces the exact api_messages shape that was crashing.
2026-04-17 13:30:47 -07:00
Henkey
cb883f9e97 fix(acp): improve zed integration 2026-04-17 13:29:26 -07:00
Teknium
d0e1388ca9
fix(tests): make AIAgent constructor calls self-contained (#11755)
* fix(tests): make AIAgent constructor calls self-contained (no env leakage)

Tests in tests/run_agent/ were constructing AIAgent() without passing
both api_key and base_url, then relying on leaked state from other
tests in the same xdist worker (or process-level env vars) to keep
provider resolution happy. Under hermetic conftest + pytest-split,
that state is gone and the tests fail with 'No LLM provider configured'.

Fix: pass both api_key and base_url explicitly on 47 AIAgent()
construction sites across 13 files. AIAgent.__init__ with both set
takes the direct-construction path (line 960 in run_agent.py) and
skips the resolver entirely.

One call site (test_none_base_url_passed_as_none) left alone — that
test asserts behavior for base_url=None specifically.

This is a prerequisite for any future matrix-split or stricter
isolation work, and lands cleanly on its own.

Validation:
- tests/run_agent/ full: 760 passed, 0 failed (local)
- Previously relied on cross-test pollution; now self-contained

* fix(tests): update opencode-go model order assertion to match kimi-k2.5-first

commit 78a74bb promoted kimi-k2.5 to first position in model suggestion
lists but didn't update this test, which has been failing on main since.
Reorder expected list to match the new canonical order.
2026-04-17 12:32:03 -07:00
Young Sherlock
8dcd08d8bb Fix Weixin media uploads and refresh lockfile 2026-04-17 06:50:36 -07:00
anthhub
4b1567f425 fix(packaging): include qrcode in messaging extra 2026-04-17 06:50:36 -07:00
Teknium
3f3d8a7b24 fix(discord): strip mention syntax from auto-thread names
Previously a message like `<@&1490963422786093149> help` would spawn a
thread literally named `<@&1490963422786093149> help`, exposing raw
Discord mention markers in the thread list. Only user mentions
(`<@id>`) were being stripped upstream — role mentions (`<@&id>`) and
channel mentions (`<#id>`) leaked through.

Fix: strip all three mention patterns in `_auto_create_thread` before
building the thread name. Collapse runs of whitespace left by the
removal. If the entire content was mention-only, fall back to 'Hermes'
instead of an empty title.

Fixes #6336.

Tests: two new regression guards in test_discord_slash_commands.py
covering mixed-mention content and mention-only content.
2026-04-17 06:46:52 -07:00
sgaofen
32a694ad5f fix(discord): fall back when auto-thread creation fails 2026-04-17 06:46:52 -07:00
OwenYWT
f5dc4e905d fix(discord): skip auto-threading reply messages 2026-04-17 06:46:52 -07:00
Matteo De Agazio
93fe4b357d fix(discord): free-response channels skip auto-threading
Free-response channels already bypassed the @mention gate so users could
chat inline with the bot, but auto-threading still fired on every
message — spinning off a thread per message and defeating the
lightweight-chat purpose.

Fix: fold `is_free_channel` into `skip_thread` so threading is skipped
whenever the channel is in DISCORD_FREE_RESPONSE_CHANNELS (via env or
discord.free_response_channels in config.yaml).

Net change: one line in _handle_message + one regression test.

Partially addresses #9399. Authored by @Hypn0sis (salvaged from PR #9650;
the bundled 'smart' auto-thread mode from that PR was dropped in favor
of deterministic true/false semantics).
2026-04-17 06:46:52 -07:00
Teknium
8d7b7feb0d
fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction (#11565)
* fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction

The per-session AIAgent cache was unbounded. Each cached AIAgent holds
LLM clients, tool schemas, memory providers, and a conversation buffer.
In a long-lived gateway serving many chats/threads, cached agents
accumulated indefinitely — entries were only evicted on /new, /model,
or session reset.

Changes:
- Cache is now an OrderedDict so we can pop least-recently-used entries.
- _enforce_agent_cache_cap() pops entries beyond _AGENT_CACHE_MAX_SIZE=64
  when a new agent is inserted. LRU order is refreshed via move_to_end()
  on cache hits.
- _sweep_idle_cached_agents() evicts entries whose AIAgent has been idle
  longer than _AGENT_CACHE_IDLE_TTL_SECS=3600s. Runs from the existing
  _session_expiry_watcher so no new background task is created.
- The expiry watcher now also pops the cache entry after calling
  _cleanup_agent_resources on a flushed session — previously the agent
  was shut down but its reference stayed in the cache dict.
- Evicted agents have _cleanup_agent_resources() called on a daemon
  thread so the cache lock isn't held during slow teardown.

Both tuning constants live at module scope so tests can monkeypatch
them without touching class state.

Tests: 7 new cases in test_agent_cache.py covering LRU eviction,
move_to_end refresh, cleanup thread dispatch, idle TTL sweep,
defensive handling of agents without _last_activity_ts, and plain-dict
test fixture tolerance.

* tweak: bump _AGENT_CACHE_MAX_SIZE 64 -> 128

* fix(gateway): never evict mid-turn agents; live spillover tests

The prior commit could tear down an active agent if its session_key
happened to be LRU when the cap was exceeded.  AIAgent.close() kills
process_registry entries for the task, tears down the terminal
sandbox, closes the OpenAI client (sets self.client = None), and
cascades .close() into any active child subagents — all fatal if
the agent is still processing a turn.

Changes:
- _enforce_agent_cache_cap and _sweep_idle_cached_agents now look at
  GatewayRunner._running_agents and skip any entry whose AIAgent
  instance is present (identity via id(), so MagicMock doesn't
  confuse lookup in tests).  _AGENT_PENDING_SENTINEL is treated
  as 'not active' since no real agent exists yet.
- Eviction only considers the LRU-excess window (first size-cap
  entries).  If an excess slot is held by a mid-turn agent, we skip
  it WITHOUT compensating by evicting a newer entry.  A freshly
  inserted session (zero cache history) shouldn't be punished to
  protect a long-lived one that happens to be busy.
- Cache may therefore stay transiently over cap when load spikes;
  a WARNING is logged so operators can see it, and the next insert
  re-runs the check after some turns have finished.

New tests (TestAgentCacheActiveSafety + TestAgentCacheSpilloverLive):
- Active LRU entry is skipped; no newer entry compensated
- Mixed active/idle excess window: only idle slots go
- All-active cache: no eviction, WARNING logged, all clients intact
- _AGENT_PENDING_SENTINEL doesn't block other evictions
- Idle-TTL sweep skips active agents
- End-to-end: active agent's .client survives eviction attempt
- Live fill-to-cap with real AIAgents, then spillover
- Live: CAP=4 all active + 1 newcomer — cache grows to 5, no teardown
- Live: 8 threads racing 160 inserts into CAP=16 — settles at 16
- Live: evicted session's next turn gets a fresh agent that works

30 tests pass (13 pre-existing + 17 new).  Related gateway suites
(model switch, session reset, proxy, etc.) all green.

* fix(gateway): cache eviction preserves per-task state for session resume

The prior commits called AIAgent.close() on cache-evicted agents, which
tears down process_registry entries, terminal sandbox, and browser
daemon for that task_id — permanently. Fine for session-expiry (session
ended), wrong for cache eviction (session may resume).

Real-world scenario: a user leaves a Telegram session open for 2+ hours,
idle TTL evicts the cached AIAgent, user returns and sends a message.
Conversation history is preserved via SessionStore, but their terminal
sandbox (cwd, env vars, bg shells) and browser state were destroyed.

Fix: split the two cleanup modes.

  close()               Full teardown — session ended. Kills bg procs,
                        tears down terminal sandbox + browser daemon,
                        closes LLM client. Used by session-expiry,
                        /new, /reset (unchanged).

  release_clients()     Soft cleanup — session may resume. Closes
                        LLM client only. Leaves process_registry,
                        terminal sandbox, browser daemon intact
                        for the resuming agent to inherit via
                        shared task_id.

Gateway cache eviction (_enforce_agent_cache_cap, _sweep_idle_cached_agents)
now dispatches _release_evicted_agent_soft on the daemon thread instead
of _cleanup_agent_resources. All session-expiry call sites of
_cleanup_agent_resources are unchanged.

Tests (TestAgentCacheIdleResume, 5 new cases):
- release_clients does NOT call process_registry.kill_all
- release_clients does NOT call cleanup_vm / cleanup_browser
- release_clients DOES close the LLM client (agent.client is None after)
- close() vs release_clients() — semantic contract pinned
- Idle-evicted session's rebuild with same session_id gets same task_id

Updated test_cap_triggers_cleanup_thread to assert the soft path fires
and the hard path does NOT.

35 tests pass in test_agent_cache.py; 67 related tests green.
2026-04-17 06:36:34 -07:00
Jorge
86f02d8d71 refactor(cli): align model picker viewport with PR #11260 vocabulary
Match the row-budget naming introduced in PR #11260 for the approval and
clarify panels: rename chrome_reserve=14 into reserved_below=6 (input
chrome below the panel) + panel_chrome=6 (this panel's borders, blanks,
and hint row) + min_visible=3 (floor on visible items). Same arithmetic
as before, but a reviewer reading both files now sees the same handle.

Compact-chrome mode is intentionally not adopted — that pattern fits the
"fixed mandatory content might overflow" shape of approval/clarify
(solved by truncating with a marker), whereas the picker's overflow is
already handled by the scrolling viewport.
2026-04-17 06:33:21 -07:00
Jorge
5fbe16635b fix(cli): scroll the /model picker viewport so long catalogs aren't clipped
The /model picker rendered every choice into a prompt_toolkit Window
with no max height. Providers with many models (e.g. Ollama Cloud's 36+)
overflowed the terminal, clipping the bottom border and the last items.

- Add HermesCLI._compute_model_picker_viewport() to slide a scroll
  offset that keeps the cursor on screen, sized from the live terminal
  rows minus chrome reserved for input/status/border.
- Render only the visible slice in _get_model_picker_display() and
  persist the offset on _model_picker_state across redraws.
- Bind ESC (eager) to close the picker, matching the Cancel button.
- Cover the viewport math with 8 unit tests in
  tests/hermes_cli/test_model_picker_viewport.py.
2026-04-17 06:33:21 -07:00
Teknium
f64241ed90 feat(cron+tests): extend origin fallback to email/dingtalk/qqbot + fix Weixin test mocks
Cron origin fallback extension (builds on #9193's _HOME_TARGET_ENV_VARS):
adds the three remaining origin-fallback-eligible platforms that have
home channel env vars configured in gateway/config.py but use non-generic
env var names:

- email    → EMAIL_HOME_ADDRESS   (non-standard suffix)
- dingtalk → DINGTALK_HOME_CHANNEL
- qqbot    → QQ_HOME_CHANNEL      (non-standard prefix: QQ_ not QQBOT_)

Picks up the completeness intent of @Xowiek's PR #11317 using the
architecturally-correct dict-based lookup from #9193, so platforms with
non-standard env var names actually resolve instead of silently missing.
Extended the parametrized regression test to cover the new three.

Weixin test mock alignment (builds on #10091's _send_session split):
Three test sites added in Batch 1 (TestWeixinSendImageFileParameterName)
and Batch 3 (TestWeixinVoiceSending) mocked only adapter._session, but
#10091 switched the send paths to check self._send_session. Added the
companion setter so the tests stay green with the session split in place.
2026-04-17 06:26:43 -07:00
bde3249023
b46db048c3 fix(cron): align home target env lookup 2026-04-17 06:26:43 -07:00
bde3249023
f696b4745a fix(cron): restore origin fallback for feishu home channels 2026-04-17 06:26:43 -07:00
Ubuntu
5ca52bae5b fix(gateway/weixin): split poll/send sessions, reuse live adapter for cron & send_message
- gateway/platforms/weixin.py:
  - Split aiohttp.ClientSession into _poll_session and _send_session
  - Add _LIVE_ADAPTERS registry so send_weixin_direct() reuses the connected gateway adapter instead of creating a competing session
  - Fixes silent message loss when gateway is running (iLink token contention)

- cron/scheduler.py:
  - Support comma-separated deliver values (e.g. 'feishu,weixin') for multi-target delivery
  - Delay pconfig/enabled check until standalone fallback so live adapters work even when platform is not in gateway config

- tools/send_message_tool.py:
  - Synthesize PlatformConfig from WEIXIN_* env vars when gateway config lacks a weixin entry
  - Fall back to WEIXIN_HOME_CHANNEL env var for home channel resolution

- tests/gateway/test_weixin.py:
  - Update mocks to include _send_session
2026-04-17 06:26:43 -07:00
Teknium
c60b6dc317 test(dingtalk): cover get_connected_platforms + null platform_toolsets
Follow-ups to the salvaged commits in this PR:

* gateway/config.py — strip trailing whitespace from youngDoo's diff
  (line 315 had ~140 trailing spaces).

* hermes_cli/tools_config.py — replace `config.get("platform_toolsets", {})`
  with `config.get("platform_toolsets") or {}`. Handles the case where the
  YAML key is present but explicitly null (parses as None, previously
  crashed with AttributeError on the next line's .get(platform)).
  Cherry-picked from yyq4193's #9003 with attribution.

* tests/gateway/test_config.py — 4 new tests for TestGetConnectedPlatforms
  covering DingTalk via extras, via env vars, disabled, and missing creds.

* tests/hermes_cli/test_tools_config.py — regression test for the null
  platform_toolsets edge case.

* scripts/release.py — add kagura-agent, youngDoo, yyq4193 to AUTHOR_MAP.

Co-authored-by: yyq4193 <39405770+yyq4193@users.noreply.github.com>
2026-04-17 06:26:18 -07:00
kagura-agent
47a0dd1024 fix(dingtalk): fire-and-forget message processing & session_webhook fallback
Fixes #11463: DingTalk channel receives messages but fails to reply
with 'No session_webhook available'.

Two changes:

1. **Fire-and-forget message processing**: process() now dispatches
   _on_message as a background task via asyncio.create_task instead of
   awaiting it. This ensures the SDK ACK is returned immediately,
   preventing heartbeat timeouts and disconnections when message
   processing takes longer than the SDK's ACK deadline.

2. **session_webhook extraction fallback**: If ChatbotMessage.from_dict()
   fails to map the sessionWebhook field (possible across SDK versions),
   the handler now falls back to extracting it directly from the raw
   callback data dict using both 'sessionWebhook' and 'session_webhook'
   key variants.

Added 3 tests covering webhook extraction, fallback behavior, and
fire-and-forget ACK timing.
2026-04-17 06:26:18 -07:00
Teknium
d404849351
test: make test env hermetic; enforce CI parity via scripts/run_tests.sh (#11577)
* test: make test env hermetic; enforce CI parity via scripts/run_tests.sh

Fixes the recurring 'works locally, fails in CI' (and vice versa) class
of flakes by making tests hermetic and providing a canonical local runner
that matches CI's environment.

## Layer 1 — hermetic conftest.py (tests/conftest.py)

Autouse fixture now unsets every credential-shaped env var before every
test, so developer-local API keys can't leak into tests that assert
'auto-detect provider when key present'.

Pattern: unset any var ending in _API_KEY, _TOKEN, _SECRET, _PASSWORD,
_CREDENTIALS, _ACCESS_KEY, _PRIVATE_KEY, etc. Plus an explicit list of
credential names that don't fit the suffix pattern (AWS_ACCESS_KEY_ID,
FAL_KEY, GH_TOKEN, etc.) and all the provider BASE_URL overrides that
change auto-detect behavior.

Also unsets HERMES_* behavioral vars (HERMES_YOLO_MODE, HERMES_QUIET,
HERMES_SESSION_*, etc.) that mutate agent behavior.

Also:
  - Redirects HOME to a per-test tempdir (not just HERMES_HOME), so
    code reading ~/.hermes/* directly can't touch the real dir.
  - Pins TZ=UTC, LANG=C.UTF-8, LC_ALL=C.UTF-8, PYTHONHASHSEED=0 to
    match CI's deterministic runtime.

The old _isolate_hermes_home fixture name is preserved as an alias so
any test that yields it explicitly still works.

## Layer 2 — scripts/run_tests.sh canonical runner

'Always use scripts/run_tests.sh, never call pytest directly' is the
new rule (documented in AGENTS.md). The script:
  - Unsets all credential env vars (belt-and-suspenders for callers
    who bypass conftest — e.g. IDE integrations)
  - Pins TZ/LANG/PYTHONHASHSEED
  - Uses -n 4 xdist workers (matches GHA ubuntu-latest; -n auto on
    a 20-core workstation surfaces test-ordering flakes CI will never
    see, causing the infamous 'passes in CI, fails locally' drift)
  - Finds the venv in .venv, venv, or main checkout's venv
  - Passes through arbitrary pytest args

Installs pytest-split on demand so the script can also be used to run
matrix-split subsets locally for debugging.

## Remove 3 module-level dotenv stubs that broke test isolation

tests/hermes_cli/test_{arcee,xiaomi,api_key}_provider.py each had a
module-level:

    if 'dotenv' not in sys.modules:
        fake_dotenv = types.ModuleType('dotenv')
        fake_dotenv.load_dotenv = lambda *a, **kw: None
        sys.modules['dotenv'] = fake_dotenv

This patches sys.modules['dotenv'] to a fake at import time with no
teardown. Under pytest-xdist LoadScheduling, whichever worker collected
one of these files first poisoned its sys.modules; subsequent tests in
the same worker that imported load_dotenv transitively (e.g.
test_env_loader.py via hermes_cli.env_loader) got the no-op lambda and
saw their assertions fail.

dotenv is a required dependency (python-dotenv>=1.2.1 in pyproject.toml),
so the defensive stub was never needed. Removed.

## Validation

- tests/hermes_cli/ alone: 2178 passed, 1 skipped, 0 failed (was 4
  failures in test_env_loader.py before this fix)
- tests/test_plugin_skills.py, tests/hermes_cli/test_plugins.py,
  tests/test_hermes_logging.py combined: 123 passed (the caplog
  regression tests from PR #11453 still pass)
- Local full run shows no F/E clusters in the 0-55% range that were
  previously present before the conftest hardening

## Background

See AGENTS.md 'Testing' section for the full list of drift sources
this closes. Matrix split (closed as #11566) will be re-attempted
once this foundation lands — cross-test pollution was the root cause
of the shard-3 hang in that PR.

* fix(conftest): don't redirect HOME — it broke CI subprocesses

PR #11577's autouse fixture was setting HOME to a per-test tempdir.
CI started timing out at 97% complete with dozens of E/F markers and
orphan python processes at cleanup — tests (or transitive deps)
spawn subprocesses that expect a stable HOME, and the redirect broke
them in non-obvious ways.

Env-var unsetting and TZ/LANG/hashseed pinning (the actual CI-drift
fixes) are unchanged and still in place. HERMES_HOME redirection is
also unchanged — that's the canonical way to isolate tests from
~/.hermes/, not HOME.

Any code in the codebase reading ~/.hermes/* via `Path.home() / ".hermes"`
instead of `get_hermes_home()` is a bug to fix at the callsite, not
something to paper over in conftest.
2026-04-17 06:09:09 -07:00
Teknium
e5b880264b fix(discord): harden DISCORD_ALLOWED_ROLES and cover gateway layer
Two follow-ups to the cherry-picked PR #9873 (`e3bcc819`):

1. `_is_allowed_user` now uses `getattr(self, '_allowed_*_ids', set())`
   so test fixtures that build the adapter via `object.__new__`
   (skipping __init__) don't crash with AttributeError.
   See AGENTS.md pitfall #17 — same pattern as gateway.run.

2. New 3-case regression coverage in test_discord_bot_auth_bypass.py:
   - role-only config bypasses the gateway 'no allowlists' branch
   - roles + users combined still authorizes user-allowlist matches
   - the role bypass does NOT leak to other platforms (Telegram, etc.)

3. Autouse fixture in test_discord_bot_auth_bypass.py clears all Discord
   auth env vars before each test so DISCORD_ALLOWED_ROLES leakage from
   a previous test in the session can't flip later 'should-reject' tests
   into false-pass.

Required because the bare cherry-pick of #9873 only added the adapter-
level role check — it didn't cover the gateway-level _is_user_authorized,
which still rejected role-only setups via the 'no allowlists configured'
branch.
2026-04-17 05:48:26 -07:00
Teknium
7d888ab49c test(discord): regression guard for DISCORD_ALLOW_BOTS auth bypass
Six test cases covering:
- DISCORD_ALLOW_BOTS=mentions + bot not in DISCORD_ALLOWED_USERS → authorized
- DISCORD_ALLOW_BOTS=all + bot not in DISCORD_ALLOWED_USERS → authorized
- DISCORD_ALLOW_BOTS=none → bots still rejected (preserves security)
- DISCORD_ALLOW_BOTS unset → same as 'none'
- Humans still checked against allowlist even with allow_bots=all
- Bot bypass is Discord-specific — doesn't leak to other platforms

Guards against a regression where the is_bot bypass in _is_user_authorized
gets moved, removed, or accidentally extended to other platforms.
2026-04-17 05:42:04 -07:00
Teknium
d7fb435e0e
fix(discord): flat /skill command with autocomplete — fits 8KB limit trivially (#11580)
Closes #11321, closes #10259.

## Problem

The nested /skill command group (category subcommand groups + skill
subcommands) serialized to ~14KB with the default 75-skill catalog,
exceeding Discord's ~8000-byte per-command registration payload. The
entire tree.sync() rejected with error 50035 — ALL slash commands
including the 27 base commands failed to register.

## Fix

Replace the nested Group layout with a single flat Command:

    /skill name:<autocomplete> args:<optional string>

Autocomplete options are fetched dynamically by Discord when the user
types — they do NOT count against the per-command registration budget.
So this single command registers at ~200 bytes regardless of how many
skills exist. Scales to thousands of skills with no size calculations,
no splitting, no hidden skills.

UX improvements:
- Discord live-filters by user's typed prefix against BOTH name and
  description, so '/skill pdf' finds 'ocr-and-documents' via its
  description. More discoverable than clicking through category menus.
- Unknown skill name → ephemeral error pointing user at autocomplete.
- Stable alphabetical ordering across restarts.

## Why not the other proposed approaches

Three prior PRs tried to fit within the 8KB limit by modifying the
nested layout:

- #10214 (njiangk): truncated all descriptions to 'Run <name>' and
  category descriptions to 'Skills'. Works but destroys slash picker UX.
- #11385 (LeonSGP43): 40-char description clamp + iterative
  trim-largest-category fallback. Works but HIDES skills the user can
  no longer invoke via slash — functional regression.
- #10261 (zeapsu): adaptive split into /skill-<cat> top-level groups.
  Preserves all skills but pollutes the slash namespace with 20
  top-level commands.

All three work around the symptom. The flat autocomplete design
dissolves the problem — there is no payload-size pressure to manage.

## Tests

tests/gateway/test_discord_slash_commands.py — 5 new test cases replace
the 3 old nested-structure tests:

- flat-not-nested structure assertion
- empty skills → no command registered
- callback dispatches the right cmd_key by name
- unknown name → ephemeral error, no dispatch
- large-catalog regression guard (500 skills) — command payload stays
  under 500 bytes regardless

E2E validated against real discord.py 2.7.1:
- Command registers as discord.app_commands.Command (not Group).
- Autocomplete filters by name AND description (verified across several
  queries including description-only matches like 'pdf' → OCR skill).
- 500-skill catalog returns max 25 results per autocomplete query
  (Discord's hard cap), filtered correctly.
- Choice labels formatted as 'name — description' clamped to 100 chars.
2026-04-17 05:19:14 -07:00
Teknium
13f2d997b0 test(dingtalk): cover QR device-flow auth + OpenClaw branding disclosure
Adds 15 regression tests for hermes_cli/dingtalk_auth.py covering:
  * _api_post — network error mapping, errcode-nonzero mapping, success path
  * begin_registration — 2-step chain, missing-nonce/device_code/uri
    error cases
  * wait_for_registration_success — success path, missing-creds guard,
    on_waiting callback invocation
  * render_qr_to_terminal — returns False when qrcode missing, prints
    when available
  * Configuration — BASE_URL default + override, SOURCE default

Also adds a one-line disclosure in dingtalk_qr_auth() telling users
the scan page will be OpenClaw-branded. Interim measure: DingTalk's
registration portal is hardcoded to route all sources to /openapp/
registration/openClaw, so users see OpenClaw branding regardless of
what 'source' value we send. We keep 'openClaw' as the source token
until DingTalk-Real-AI registers a Hermes-specific template.

Also adds meng93 to scripts/release.py AUTHOR_MAP.
2026-04-17 05:08:07 -07:00
Berny Linville
6ee65b4d61 fix(weixin): preserve native markdown rendering
- stop rewriting markdown tables, headings, and links before delivery
- keep markdown table blocks and headings together during chunking
- update Weixin tests and docs for native markdown rendering

Closes #10308
2026-04-17 05:01:29 -07:00
Patrick Wang
4ed6e4c1a5 refactor(weixin): drop pilk dependency from voice fallback 2026-04-17 05:01:29 -07:00
Patrick Wang
649f38390c fix: force Weixin voice fallback to file attachments 2026-04-17 05:01:29 -07:00
Patrick Wang
678b69ec1b fix(weixin): use Tencent SILK encoding for voice replies 2026-04-17 05:01:29 -07:00
Teknium
53da34a4fc
fix(discord): route attachment downloads through authenticated bot session (#11568)
Three open issues — #8242, #6587, #11345 — all trace to the same root
cause: the image / audio / document download paths in
`DiscordAdapter._handle_message` used plain, unauthenticated HTTP to
fetch `att.url`. That broke in three independent ways:

  #8242  cdn.discordapp.com attachment URLs increasingly require the
         bot session to download; unauthenticated httpx sees 403
         Forbidden, image/voice analysis fail silently.

  #6587  Some user environments (VPNs, corporate DNS, tunnels) resolve
         cdn.discordapp.com to private-looking IPs. Our is_safe_url()
         guard correctly blocks them as SSRF risks, but the user
         environment is legitimate — image analysis and voice STT die.

  #11345 The document download path skipped is_safe_url() entirely —
         raw aiohttp.ClientSession.get(att.url) with no SSRF check,
         inconsistent with the image/audio branches.

Unified fix: use `discord.Attachment.read()` as the primary download
path on all three branches. `att.read()` routes through discord.py's
own authenticated HTTPClient, so:

  - Discord CDN auth is handled (#8242 resolved).
  - Our is_safe_url() gate isn't consulted for the attachment path at
    all — the bot session handles networking internally (#6587 resolved).
  - All three branches now share the same code path, eliminating the
    document-path SSRF gap (#11345 resolved).

Falls back to the existing cache_*_from_url helpers (image/audio) or an
SSRF-gated aiohttp fetch (documents) when `att.read()` is unavailable
or fails — preserves defense-in-depth for any future payload-schema
drift that could slip a non-CDN URL into att.url.

New helpers on DiscordAdapter:
  - _read_attachment_bytes(att)  — safe att.read() wrapper
  - _cache_discord_image(att, ext)     — primary + URL fallback
  - _cache_discord_audio(att, ext)     — primary + URL fallback
  - _cache_discord_document(att, ext)  — primary + SSRF-gated aiohttp fallback

Tests:
  - tests/gateway/test_discord_attachment_download.py — 12 new cases
    covering all three helpers: primary path, fallback on missing
    .read(), fallback on validator rejection, SSRF guard on document
    fallback, aiohttp fallback happy-path, and an E2E case via
    _handle_message confirming cache_image_from_url is never invoked
    when att.read() succeeds.
  - All 11 existing document-handling tests continue to pass via the
    aiohttp fallback path (their SimpleNamespace attachments have no
    .read(), which triggers the fallback — now SSRF-gated).

Closes #8242, closes #6587, closes #11345.
2026-04-17 04:59:03 -07:00
LehaoLin
504e7eb9e5 fix(gateway): wait for reconnection before dropping WebSocket sends
When a WebSocket-based platform adapter (e.g. QQ Bot) temporarily
loses its connection, send() now polls is_connected for up to 15s
instead of immediately returning a non-retryable failure. If the
auto-reconnect completes within the window, the message is delivered
normally. On timeout, the SendResult is marked retryable=True so the
base class retry mechanism can attempt re-delivery.

Same treatment applied to _send_media().

Adds 4 async tests covering:
- Successful send after simulated reconnection
- Retryable failure on timeout
- Immediate success when already connected
- _send_media reconnection wait

Fixes #11163
2026-04-17 04:22:40 -07:00
dieutx
995177d542 fix(gateway): honor QQ_GROUP_ALLOWED_USERS in runner auth 2026-04-17 04:22:40 -07:00
Pedro Gonzalez
590c9964e1 Fix QQ voice attachment SSRF validation 2026-04-17 04:22:40 -07:00
yeyitech
a97b08e30c fix: allow trusted QQ CDN benchmark IP resolution 2026-04-17 04:22:40 -07:00
Teknium
aca81ac7bb test(dingtalk): cover require_mention + allowed_users gating
Adds 16 regression tests for the gating logic introduced in the
salvaged commit:

  * TestAllowedUsersGate — empty/wildcard/case-insensitive matching,
    staff_id vs sender_id, env var CSV population
  * TestMentionPatterns — compilation, case-insensitivity, invalid
    regex is skipped-not-raised, JSON env var, newline fallback
  * TestShouldProcessMessage — DM always accepted, group gating via
    require_mention / is_in_at_list / wake-word pattern / free_response_chats

Also adds yule975 to scripts/release.py AUTHOR_MAP (release CI blocks
unmapped emails).
2026-04-17 04:21:49 -07:00
Teknium
29d5d36b14
fix(copilot): normalize vendor-prefixed and dash-notation model IDs (#6879) (#11561)
The Copilot API returns HTTP 400 "model_not_supported" when it receives a
model ID it doesn't recognize (vendor-prefixed like
`anthropic/claude-sonnet-4.6` or dash-notation like `claude-sonnet-4-6`).
Two bugs combined to leave both formats unhandled:

1. `_COPILOT_MODEL_ALIASES` in hermes_cli/models.py only covered bare
   dot-notation and vendor-prefixed dot-notation.  Hermes' default Claude
   IDs elsewhere use hyphens (anthropic native format), and users with an
   aggregator-style config who switch `model.provider` to `copilot`
   inherit `anthropic/claude-X-4.6` — neither case was in the table.

2. The Copilot branch of `normalize_model_for_provider()` only stripped
   the vendor prefix when it matched the target provider (`copilot/`) or
   was the special-cased `openai/` for openai-codex.  Every other vendor
   prefix survived to the Copilot request unchanged.

Fix:

- Add dash-notation aliases (`claude-{opus,sonnet,haiku}-4-{5,6}` and the
  `anthropic/`-prefixed variants) to the alias table.
- Rewire the Copilot / Copilot-ACP branch of
  `normalize_model_for_provider()` to delegate to the existing
  `normalize_copilot_model_id()`.  That function already does alias
  lookups, catalog-aware resolution, and vendor-prefix fallback — it was
  being bypassed for the generic normalisation entry point.

Because `switch_model()` already calls `normalize_model_for_provider()`
for every `/model` switch (line 685 in model_switch.py), this single fix
covers the CLI startup path (cli.py), the `/model` slash command path,
and the gateway load-from-config path.

Closes #6879

Credits dsr-restyn (#6743) who independently diagnosed the dash-notation
case; their aliases are folded into this consolidated fix alongside the
vendor-prefix stripping repair.
2026-04-17 04:19:36 -07:00
Teknium
eabe14af1c test(discord): update reply_mode fixture for new to_reference() wrapping
Follow-up to the reply-reference fix: `_make_discord_adapter` used to return
the raw fetched `Message` as the expected reference, but the adapter now
wraps it via `ref_msg.to_reference(fail_if_not_exists=False)` so Discord
treats a deleted target as 'send without reply chip'. Update the fixture
to return the MessageReference sentinel so the 4 chunk-reference-identity
tests assert against the right object.

No production behavior change; only aligns the stale test fixture.
2026-04-17 04:17:56 -07:00
Teknium
ef37aa7cce test(discord): add regression guard for non-reference send errors
Follow-up to the reply-reference fix: ensure errors unrelated to the reply
reference (e.g. 50013 Missing Permissions) do NOT trigger the no-reference
retry path and still surface as a failed SendResult. Keeps the wider retry
condition from silently swallowing unrelated API errors.

Proposed in the original issue writeup (#11342) as test case
`test_non_reference_errors_still_propagate`.
2026-04-17 04:17:56 -07:00
LeonSGP43
a448e7a04d fix(discord): drop invalid reply references 2026-04-17 04:17:56 -07:00
Asunfly
7c932c5aa4 fix(dingtalk): close websocket on disconnect 2026-04-17 04:11:30 -07:00
Teknium
f268215019
fix(auth): codex auth remove no longer silently undone by auto-import (#11485)
* feat(skills): add 'hermes skills reset' to un-stick bundled skills

When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.

Adds an escape hatch for this case.

  hermes skills reset <name>
      Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
      re-baselines against the user's current copy. Future 'hermes update'
      runs accept upstream changes again. Non-destructive.

  hermes skills reset <name> --restore
      Also deletes the user's copy and re-copies the bundled version.
      Use when you want the pristine upstream skill back.

Also available as /skills reset in chat.

- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
  handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
  repro, --restore, unknown-skill error, upstream-removed-skill, and
  no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
  section explaining the origin-hash mechanic + reset usage

* fix(auth): codex auth remove no longer silently undone by auto-import

'hermes auth remove openai-codex' appeared to succeed but the credential
reappeared on the next command.  Two compounding bugs:

1. _seed_from_singletons() for openai-codex unconditionally re-imports
   tokens from ~/.codex/auth.json whenever the Hermes auth store is
   empty (by design — the Codex CLI and Hermes share that file).  There
   was no suppression check, unlike the claude_code seed path.

2. auth_remove_command's cleanup branch only matched
   removed.source == 'device_code' exactly.  Entries added via
   'hermes auth add openai-codex' have source 'manual:device_code', so
   for those the Hermes auth store's providers['openai-codex'] state was
   never cleared on remove — the next load_pool() re-seeded straight
   from there.

Net effect: there was no way to make a codex removal stick short of
manually editing both ~/.hermes/auth.json and ~/.codex/auth.json before
opening Hermes again.

Fix:

- Add unsuppress_credential_source() helper (mirrors
  suppress_credential_source()).
- Gate the openai-codex branch in _seed_from_singletons() with
  is_source_suppressed(), matching the claude_code pattern.
- Broaden auth_remove_command's codex match to handle both
  'device_code' and 'manual:device_code' (via endswith check), always
  call suppress_credential_source(), and print guidance about the
  unchanged ~/.codex/auth.json file.
- Clear the suppression marker in auth_add_command's openai-codex
  branch so re-linking via 'hermes auth add openai-codex' works.

~/.codex/auth.json is left untouched — that's the Codex CLI's own
credential store, not ours to delete.

Tests cover: unsuppress helper behavior, remove of both source
variants, add clears suppression, seed respects suppression.  E2E
verified: remove → load → add → load flow now behaves correctly.
2026-04-17 04:10:17 -07:00
赵晨飞
82969615bb test(weixin): add regression test for send_image_file parameter name
Add TestWeixinSendImageFileParameterName test class with two tests:
- test_send_image_file_uses_image_path_parameter: verifies the correct
  parameter name (image_path) is used when gateway calls send_image_file
- test_send_image_file_works_without_optional_params: ensures minimal
  params work correctly

This prevents the interface from drifting again as noted by Copilot.
2026-04-17 04:09:21 -07:00
Michel Belleau
efa6c9f715 fix(discord): default allowed_mentions to block @everyone and role pings
discord.py does not apply a default AllowedMentions to the client, so any
reply whose content contains @everyone/@here or a role mention would ping
the whole server — including verbatim echoes of user input or LLM output
that happens to contain those tokens.

Set a safe default on commands.Bot: everyone=False, roles=False,
users=True, replied_user=True. Operators can opt back in via four
DISCORD_ALLOW_MENTION_* env vars or discord.allow_mentions.* in
config.yaml. No behavior change for normal user/reply pings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 04:08:42 -07:00
Teknium
2367c6ffd5
test: remove 169 change-detector tests across 21 files (#11472)
First pass of test-suite reduction to address flaky CI and bloat.

Removed tests that fall into these change-detector patterns:

1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests
   that call inspect.getsource() on production modules and grep for string
   literals. Break on any refactor/rename even when behavior is correct.

2. Platform enum tautologies (every gateway/test_X.py): assertions like
   `Platform.X.value == 'x'` duplicated across ~9 adapter test files.

3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that
   only verify a key exists in a dict. Data-layout tests, not behavior.

4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing
   _fallback): tests that do parser.parse_args([...]) then assert args.field.
   Tests Python's argparse, not our code.

5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch
   cmd_X, call plugins_command with matching action, assert mock called.
   Tests the if/elif chain, not behavior.

6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests,
   test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests
   that mock the external API client, call our function, and assert exact
   kwargs. Break on refactor even when behavior is preserved.

7. Schedule-internal "function-was-called" tests (acp/test_server scheduling
   tests): tests that patch own helper method, then assert it was called.

Kept behavioral tests throughout: error paths (pytest.raises), security
tests (path traversal, SSRF, redaction), message alternation invariants,
provider API format conversion, streaming logic, memory contract, real
config load/merge tests.

Net reduction: 169 tests removed. 38 empty classes cleaned up.

Collected before: 12,522 tests
Collected after:  12,353 tests
2026-04-17 01:05:09 -07:00
Teknium
e33cb65a98
fix(insights): hide cache read/write and cost metrics from display (#11477)
The cache-read, cache-write, and total estimated-cost values shown in
/insights (and the per-model Cost column) were unreliable. Hide them from
both terminal and gateway renderings.

The underlying data pipeline is untouched — sessions still store
cache_read_tokens, cache_write_tokens, and estimated_cost_usd; the web
server, /usage command, and status bar are unaffected. Only the
InsightsEngine display layer is trimmed.

Changes:
- format_terminal: drop 'Cache read / Cache write' line, drop 'Est. cost'
  from the Total tokens row, drop per-model 'Cost' column, drop the
  '* Cost N/A for custom/self-hosted' footnote.
- format_gateway: drop cache breakdown from Tokens line, drop 'Est. cost'
  line, drop per-model cost suffix.
- Tests updated to assert these strings are now absent.
2026-04-17 01:02:06 -07:00
Teknium
3f74dafaee
fix(nous): respect 'Skip (keep current)' after OAuth login (#11476)
* feat(skills): add 'hermes skills reset' to un-stick bundled skills

When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.

Adds an escape hatch for this case.

  hermes skills reset <name>
      Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
      re-baselines against the user's current copy. Future 'hermes update'
      runs accept upstream changes again. Non-destructive.

  hermes skills reset <name> --restore
      Also deletes the user's copy and re-copies the bundled version.
      Use when you want the pristine upstream skill back.

Also available as /skills reset in chat.

- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
  handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
  repro, --restore, unknown-skill error, upstream-removed-skill, and
  no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
  section explaining the origin-hash mechanic + reset usage

* fix(nous): respect 'Skip (keep current)' after OAuth login

When a user already set up on another provider (e.g. OpenRouter) runs
`hermes model` and picks Nous Portal, OAuth succeeds and then a model
picker is shown.  If the user picks 'Skip (keep current)', the previous
provider + model should be preserved.

Previously, \_update_config_for_provider was called unconditionally after
login, which flipped config.yaml model.provider to 'nous' while keeping
the old model.default (e.g. anthropic/claude-opus-4.6 from OpenRouter),
leaving the user with a mismatched provider/model pair on the next
request.

Fix: snapshot the prior active_provider before login, and if no model is
selected (Skip, or no models available, or fetch failure), restore the
prior active_provider and leave config.yaml untouched.  The Nous OAuth
tokens stay saved so future `hermes model` -> Nous works without
re-authenticating.

Test plan:
- New tests cover Skip path (preserves provider+model, saves creds),
  pick-a-model path (switches to nous), and fresh-install Skip path
  (active_provider cleared, not stuck as 'nous').
2026-04-17 00:52:42 -07:00
Teknium
3438d274f6 fix(dingtalk): repair _extract_text for dingtalk-stream >= 0.20 SDK shape
The cherry-picked SDK compat fix (previous commit) wired process() to
parse CallbackMessage.data into a ChatbotMessage, but _extract_text()
was still written against the pre-0.20 payload shape:

  * message.text changed from dict {content: ...} → TextContent object.
    The old code's str(text) fallback produced 'TextContent(content=...)'
    as the agent's input, so every received message came in mangled.
  * rich_text moved from message.rich_text (list) to
    message.rich_text_content.rich_text_list.

This preserves legacy fallbacks (dict-shaped text, bare rich_text list)
while handling the current SDK layout via hasattr(text, 'content').

Adds regression tests covering:
  * webhook domain allowlist (api.*, oapi.*, and hostile lookalikes)
  * _IncomingHandler.process is a coroutine function
  * _extract_text against TextContent object, dict, rich_text_content,
    legacy rich_text, and empty-message cases

Also adds kevinskysunny to scripts/release.py AUTHOR_MAP (release CI
blocks unmapped emails).
2026-04-17 00:52:35 -07:00