llm-wiki was the only shipped skill using metadata.hermes.config, which
caused 'hermes update' and 'hermes config migrate' to prompt for a wiki
directory on every run — even for users who have never touched the skill
— because 'enabled' is opt-out (all shipped skills count as enabled unless
explicitly disabled). Declining the prompt didn't persist anything, so
the nag fired again on every update.
Switch llm-wiki to the env var + runtime default pattern that obsidian and
google-workspace already use: WIKI_PATH env var, default $HOME/wiki. No
prompting infrastructure, no config.yaml touch, no nag loop.
Changes:
- skills/research/llm-wiki/SKILL.md: remove metadata.hermes.config,
document WIKI_PATH env var in the Wiki Location section, update the
orientation snippet and initialization guidance.
- Docs: replace llm-wiki's wiki.path examples with a generic 'myplugin.path'
placeholder across configuration.md, features/skills.md, and
creating-skills.md so users don't try to set skills.config.wiki.path
expecting llm-wiki to use it.
- skills-catalog.md: mention WIKI_PATH instead of skills.config.wiki.path.
E2E verified: discover_all_skill_config_vars() and get_missing_skill_config_vars()
both return 0 entries after this change, so the prompt branch in migrate_config()
no longer fires.
The metadata.hermes.config feature stays in place for third-party skills
that genuinely need structured config, but built-ins now prefer env vars.
Before: entry.tsx imports @hermes/ink (394KB bundle) + App + GatewayClient
in declaration order, then calls `gw.start()` at ~T=220ms. Python fork +
server.py import starts then.
After: only `GatewayClient` is statically imported (5ms, node builtins
only). `gw.start()` fires at ~T=5ms. @hermes/ink + App load in parallel
via `Promise.all(import(...))`. Python gets ~215ms of free runway to do
its own module import before node even finishes loading.
Net: session.info arrives ~150ms earlier in cold start. First React frame
timing is unchanged (still ~240ms — still gated by ink+app imports).
Removed a previously-tried warm-thread in server.py that pre-imported
`run_agent` in the background. Measured variance showed occasional
5-10s outliers (GIL thrashing); median gain was <100ms. Not worth the
non-determinism.
The TUI is fully interactive from the first frame but `session.create`
(agent + tools + MCP) takes ~2s. Plain-text messages typed before the
session is live used to fail with "session not ready yet"; slash and
shell commands worked but agent prompts were dropped.
Now:
- `dispatchSubmission` enqueues plain text when `sid` is null (slash/shell
still short-circuit first)
- `useMainApp` tracks sid transitions and kicks off one `sendQueued()`
when the session first becomes ready; subsequent queued messages drain
on `message.complete` as before
- Fixed pre-existing double-Enter bug that dequeued without sid check
User flow: type `hello` → shows in `queuedDisplay` preview → 2s later
agent wakes → message auto-sends → reply streams. Zero wasted input.
Previously `historyItems` was seeded empty and the intro (with Banner +
SessionPanel) was only pushed after Python's `session.create` returned —
~1.8s of agent + tools + MCP init with nothing on screen. Base CLI feels
instant because it prints the banner as its first action.
Seed `historyItems` with an info-less intro on mount. `appLayout` now
renders the Banner unconditionally for `kind === 'intro'` and gates only
the SessionPanel on `info` being present. Gateway.ready swaps the skin
(~200ms) and session.info fills in the panel when the agent is ready.
Net: first usable frame drops from ~2s to ~300ms (node + module graph +
React mount). No behavior change — intro message is replaced in place
by `introMsg(info)` when `newSession()` / `resumeById()` resolve.
- New page: user-guide/features/tool-gateway.md covering eligibility,
setup (hermes model, hermes tools, manual config), how use_gateway
works, precedence, switching back, status checking, self-hosted
gateway env vars, and FAQ
- Added to sidebar under Features (top-level, before Core category)
- Cross-references from: overview.md, tools.md, browser.md,
image-generation.md, tts.md, providers.md, environment-variables.md
- Added Nous Tool Gateway subsection to env vars reference with
TOOL_GATEWAY_DOMAIN, TOOL_GATEWAY_SCHEME, TOOL_GATEWAY_USER_TOKEN,
and FIRECRAWL_GATEWAY_URL
Replace the HERMES_ENABLE_NOUS_MANAGED_TOOLS env-var feature flag with
subscription-based detection. The Tool Gateway is now available to any
paid Nous subscriber without needing a hidden env var.
Core changes:
- managed_nous_tools_enabled() checks get_nous_auth_status() +
check_nous_free_tier() instead of an env var
- New use_gateway config flag per tool section (web, tts, browser,
image_gen) records explicit user opt-in and overrides direct API
keys at runtime
- New prefers_gateway(section) shared helper in tool_backend_helpers.py
used by all 4 tool runtimes (web, tts, image gen, browser)
UX flow:
- hermes model: after Nous login/model selection, shows a curses
prompt listing all gateway-eligible tools with current status.
User chooses to enable all, enable only unconfigured tools, or skip.
Defaults to Enable for new users, Skip when direct keys exist.
- hermes tools: provider selection now manages use_gateway flag —
selecting Nous Subscription sets it, selecting any other provider
clears it
- hermes status: renamed section to Nous Tool Gateway, added
free-tier upgrade nudge for logged-in free users
- curses_radiolist: new description parameter for multi-line context
that survives the screen clear
Runtime behavior:
- Each tool runtime (web_tools, tts_tool, image_generation_tool,
browser_use) checks prefers_gateway() before falling back to
direct env-var credentials
- get_nous_subscription_features() respects use_gateway flags,
suppressing direct credential detection when the user opted in
Removed:
- HERMES_ENABLE_NOUS_MANAGED_TOOLS env var and all references
- apply_nous_provider_defaults() silent TTS auto-set
- get_nous_subscription_explainer_lines() static text
- Override env var warnings (use_gateway handles this properly now)
Python's slash worker already prints every echo/panel command through Rich.
TS was reformatting the same data client-side for 23 commands. Delete those
shadows; let the `slash.exec` fallback in `createSlashHandler` route the
worker's text (via `<Ansi>`) and page-wrap long output.
TS registry now contains 23 commands (down from 45) — only those that:
- mutate React-local state (composer, transcript, overlays, uiStore)
- touch the terminal (OSC52 copy, `$EDITOR`, clipboard)
- open pickers (`/model`, `/resume`)
- trigger history surgery (`/undo`, `/retry`, `/compress`, `/personality`)
- need TS-only composition (`/help` merges HOTKEYS + catalog)
Deleted shadows:
session: yolo, skin, verbose, reasoning, provider, stop, reload-mcp,
save, title, insights, debug, fast, platforms, snapshot,
usage, history, profile
ops: plugins, rollback, agents, tasks, cron, config, toolsets,
browser, skills (list/browse only; `/tools configure` kept
for its history-reset side effect)
Side effects:
- Drops `slash/shared.ts` + `SlashShared` + `shared`/`SLASH_OUTPUT_PAGE` —
generic slash.exec fallback handles titled paging via `createSlashHandler`.
- Prunes 17 now-unreferenced `*Response` interfaces from gatewayTypes.ts.
- `createSlashHandler` fallback now pages long output (len>180 || lines>2)
and uses the command name as title.
session.ts: 670 -> 199 (-70%)
ops.ts: 460 -> 52 (-88%)
gatewayTypes.ts: 450 -> 302 (-33%)
run_agent.py passes httpx.Timeout(connect=30, read=120, write=1800,
pool=30) as the timeout kwarg on the streaming path. The OpenAI SDK
handles this natively, but CopilotACPClient._create_chat_completion()
called float(timeout or default), which raises TypeError because
httpx.Timeout doesn't implement __float__.
Normalize the timeout before passing to _run_prompt: plain floats/ints
pass through, httpx.Timeout objects get their largest component
extracted (write=1800s is the correct wall-clock budget for the ACP
subprocess), and None falls back to the 900s default.
Regression from #11161 (Claude Opus 4.7 migration, commit 0517ac3e).
The Opus 4.7 migration changed `ADAPTIVE_EFFORT_MAP["xhigh"]` from "max"
(the pre-migration alias) to "xhigh" to preserve the new 4.7 effort level
as distinct from max. This is correct for 4.7, but Opus/Sonnet 4.6 only
expose 4 levels (low/medium/high/max) — sending "xhigh" there now 400s:
BadRequestError [HTTP 400]: This model does not support effort
level 'xhigh'. Supported levels: high, low, max, medium.
Users who set reasoning_effort=xhigh as their default (xhigh is the
recommended default for coding/agentic on 4.7 per the Anthropic migration
guide) now 400 every request the moment they switch back to a 4.6 model
via `/model` or config. Verified live against the Anthropic API on
`anthropic==0.94.0`.
Fix: make the mapping model-aware. Add `_supports_xhigh_effort()`
predicate (matches 4-7/4.7 substrings, mirroring the existing
`_supports_adaptive_thinking` / `_forbids_sampling_params` pattern).
On pre-4.7 adaptive models, downgrade xhigh→max (the strongest effort
those models accept, restoring pre-migration behavior). On 4.7+, keep
xhigh as a distinct level.
Per Anthropic's migration guide, xhigh is 4.7-only:
https://platform.claude.com/docs/en/about-claude/models/migration-guide
> Opus 4.7 effort levels: max, xhigh (new), high, medium, low.
> Opus 4.6 effort levels: max, high, medium, low.
SDK typing confirms: `anthropic.types.OutputConfigParam.effort: Literal[
"low", "medium", "high", "max"]` (v0.94.0 not yet updated for xhigh).
## Test plan
Verified live on macOS 15.5 / anthropic==0.94.0:
claude-opus-4-6 + effort=xhigh → output_config.effort=max → 200 OK
claude-opus-4-7 + effort=xhigh → output_config.effort=xhigh → 200 OK
claude-opus-4-6 + effort=max → output_config.effort=max → 200 OK
claude-opus-4-7 + effort=max → output_config.effort=max → 200 OK
`tests/agent/test_anthropic_adapter.py` — 120 pass (replaced 1 bugged
test that asserted the broken behavior, added 1 for 4.7 preservation).
Full adapter suite: 120 passed in 1.05s.
Broader suite (agent + run_agent + cli/gateway reasoning): 2140 passed
(2 pre-existing failures on clean upstream/main, unrelated).
## Platforms
Tested on macOS 15.5. No platform-specific code paths touched.
Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guideFixesNousResearch/hermes-agent#11137
Breaking-change coverage:
1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
_supports_adaptive_thinking() (extends previous 4.6-only gate).
2. Sampling parameter stripping — 4.7 returns 400 for any non-default
temperature / top_p / top_k. build_anthropic_kwargs drops them as a
safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
and AnthropicCompletionsAdapter.create() both early-exit before
setting temperature for 4.7+ models. This keeps flush_memories and
structured-JSON aux paths that hardcode temperature from 400ing
when the aux model is flipped to 4.7.
3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
which silently hides reasoning text from Hermes's CLI activity feed
during long tool runs. Restoring "summarized" preserves 4.6 UX.
4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
silently over-efforted every coding/agentic request). max is now a
distinct ceiling per Anthropic's 5-level effort model.
5. New stop_reason values — refusal and model_context_window_exceeded
were silently collapsed to "stop" (end_turn) by the adapter's
stop_reason_map. Now mapped to "content_filter" and "length"
respectively, matching upstream finish-reason handling already in
bedrock_adapter.
6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
catalog (recommended), claude-opus-4-7 added to model_metadata
DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).
7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
prefill (400).
8. Tests — 4 new tests in test_anthropic_adapter covering display
default, xhigh preservation, max on 4.7, refusal / context-overflow
stop_reason mapping, plus the sampling-param predicate. test_model_metadata
accepts 4.7 at 1M context.
Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.
Hoist turn state from a 286-line hook into $turnState atom + turnController
singleton. createGatewayEventHandler becomes a typed dispatch over the
controller; its ctx shrinks from 30 fields to 5. Event-handler refs and 16
threaded actions are gone.
Fold three createSlash*Handler factories into a data-driven SlashCommand[]
registry under slash/commands/{core,session,ops}.ts. Aliases are data;
findSlashCommand does name+alias lookup. Shared guarded/guardedErr combinator
in slash/guarded.ts.
Split constants.ts + app/helpers.ts into config/ (timing/limits/env),
content/ (faces/placeholders/hotkeys/verbs/charms/fortunes), domain/ (roles/
details/messages/paths/slash/viewport/usage), protocol/ (interpolation/paste).
Type every RPC response in gatewayTypes.ts (26 new interfaces); drop all
`(r: any)` across slash + main app.
Shrink useMainApp from 1216 -> 646 lines by extracting useSessionLifecycle,
useSubmission, useConfigSync. Add <Fg> themed primitive and strip ~50
`as any` color casts.
Tests: 50 passing. Build + type-check clean.
Models may send whitespace-only strings like {"conclusion": " "} which
pass bool() but create meaningless conclusions. Strip both inputs so
whitespace-only values are treated as empty.
Adds tests for whitespace-only conclusion and delete_id.
Reviewed-by: @erosika
Improve honcho_conclude tool descriptions to explicitly tell the model
not to send both params together. Add runtime validation that rejects
calls with both or neither of conclusion/delete_id. Add schema
regression test and both-params rejection test.
Consolidates #10847 by @ygd58, #10864 by @cola-runner,
#10870 by @vominh1919, and #10952 by @ogzerber.
The anyOf removal itself was already merged; this adds the
runtime validation and tests those PRs contributed.
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
Co-authored-by: cola-runner <cola-runner@users.noreply.github.com>
Co-authored-by: vominh1919 <vominh1919@users.noreply.github.com>
The numbered form (82637225+kshitijk4poor@) was already mapped but
the bare form (kshitijk4poor@users.noreply.github.com) used by
cherry-pick commits was missing, causing check-attribution CI to fail.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Ensure _align_boundary_backward never pushes the last user message
into the compressed region. Without this, compression could delete
the user active task instruction mid-session.
Cherry-picked from #10969 by @sontianye. Fixes#10896.
Initialize next_channel_prompt before the pending_event check and use
getattr with None default, matching the existing pattern for
next_source/next_message/next_message_id. Prevents AttributeError
when pending_event is None (interrupt path).
Cherry-picked from #10953 by @jackjin1997.
Shallow-copy client_kwargs at the top of _create_openai_client() to
prevent in-place mutation from leaking back into self._client_kwargs.
Defensive fix that locks the contract for future httpx/transport work.
Cherry-picked from #10978 by @taeuk178.
Switch from fragile Markdown V1 to HTML parse mode with html.escape()
for exec approval messages. Add fallback to text-based approval when
the formatted send fails.
Cherry-picked from #10999 by @danieldoderlein.
resolve_vision_provider_client() was receiving the raw call_llm
parameters instead of the resolved provider/model/key/url from
_resolve_task_provider_model(). This caused config overrides
(auxiliary.vision.provider, etc.) to be silently discarded.
Cherry-picked from #10901 by @lrawnsley.
Add 11 community contributors whose work was cherry-picked via
salvage PRs during the April 16 triage session. Without these
entries, contributor_audit strict mode fails for release attribution.
Contributors: sontianye, jackjin1997, danieldoderlein, lrawnsley,
taeuk178, ogzerber, cola-runner, ygd58, vominh1919, LeonSGP43,
Lubrsy706
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
provider_model_ids() and list_authenticated_providers() had no case for
"ollama-cloud", so the /model slash command showed 0 models despite
fetch_ollama_cloud_models() being fully implemented. The CLI subcommand
worked because it called fetch_ollama_cloud_models() directly.
- Add ollama-cloud case to provider_model_ids() in models.py
- Populate curated dict for ollama-cloud in list_authenticated_providers()
- Add tests for both code paths
The completion-line printing block (idx = entry['task_index'] etc.)
was outside the 'for future in done:' loop but referenced 'entry'
which is only assigned inside that loop. When concurrent.futures.wait()
returns with an empty 'done' set (timeout expired, no futures finished),
the loop body never executes and 'entry' is unbound.
Moved the completion-line printing and spinner-update code inside
the for loop so each completed future gets its own status line,
and empty poll cycles simply loop back without accessing 'entry'.
When a cron job's agent run completes but produces an empty final_response
(e.g. API 404 from invalid model name), the scheduler now marks last_status
as "error" instead of "ok", so the failure is visible in job listings.
Previously, any run that didn't raise an exception was marked "ok" regardless
of whether the agent actually produced output.
Group A (3 tests): 'No LLM provider configured' RuntimeError
- test_user_message_surrogates_sanitized, test_counters_initialized_in_init,
test_openai_prompt_tokens_unchanged
- Root cause: AIAgent.__init__ now requires base_url alongside api_key to
skip resolve_provider_client() (which returns None when API keys are
blanked in CI). Added base_url='http://localhost:1234/v1' to test
agent construction.
Group B (5 tests): Discord slash command auto-registration
- test_auto_registers_missing_gateway_commands, test_auto_registered_command_*,
test_register_skill_group_*
- Root cause: xdist workers that loaded a discord mock WITHOUT
app_commands.Command/Group caused _register_slash_commands() to fail
silently. Added comprehensive shared discord mock in
tests/gateway/conftest.py (same pattern as existing telegram mock).
Group C (5 errors): Discord reply mode 'NoneType has no DMChannel'
- All TestReplyToText tests
- Root cause: FakeDMChannel was not a subclass of real discord.DMChannel,
so isinstance() checks in _handle_message failed when running in full
suite (real discord installed). Made FakeDMChannel inherit from
discord.DMChannel when available. Removed fragile monkeypatch approach.
Group D (2 tests): detect_provider_for_model wrong provider
- test_openrouter_slug_match (got 'ai-gateway'), test_bare_name_gets_
openrouter_slug (got 'copilot')
- Root cause: ai-gateway, copilot, and kilocode are multi-vendor
aggregators that list other providers' models (OpenRouter-style slugs).
They were being matched in Step 1 before OpenRouter. Added all three
to _AGGREGATORS set so they're skipped like nous/openrouter.
Group E (1 test): model_flow_custom StopIteration
- test_model_flow_custom_saves_verified_v1_base_url
- Root cause: 'Display name' prompt was added after the test was written.
The input iterator had 5 answers but the flow now asks 6 questions.
Added 6th empty string answer.
Group F (1 test): Telegram proxy env assertion
- test_uses_proxy_env_for_primary_and_fallback_transports
- Root cause: _resolve_proxy_url() now checks TELEGRAM_PROXY first
(via resolve_proxy_url('TELEGRAM_PROXY')). Test didn't clear this
env var, allowing potential leakage from other tests in xdist workers.
Added TELEGRAM_PROXY to the cleanup list.
config.yaml terminal.cwd is now the single source of truth for working
directory. MESSAGING_CWD and TERMINAL_CWD in .env are deprecated with a
migration warning.
Changes:
1. config.py: Remove MESSAGING_CWD from OPTIONAL_ENV_VARS (setup wizard
no longer prompts for it). Add warn_deprecated_cwd_env_vars() that
prints a migration hint when deprecated env vars are detected.
2. gateway/run.py: Replace all MESSAGING_CWD reads with TERMINAL_CWD
(which is bridged from config.yaml terminal.cwd). MESSAGING_CWD is
still accepted as a backward-compat fallback with deprecation warning.
Config bridge skips cwd placeholder values so they don't clobber
the resolved TERMINAL_CWD.
3. cli.py: Guard against lazy-import clobbering — when cli.py is
imported lazily during gateway runtime (via delegate_tool), don't
let load_cli_config() overwrite an already-resolved TERMINAL_CWD
with os.getcwd() of the service's working directory. (#10817)
4. hermes_cli/main.py: Add 'hermes memory reset' command with
--target all/memory/user and --yes flags. Profile-scoped via
HERMES_HOME.
Migration path for users with .env settings:
Remove MESSAGING_CWD / TERMINAL_CWD from .env
Add to config.yaml:
terminal:
cwd: /your/project/path
Addresses: #10225, #4672, #10817, #7663
The gateway compression notifications were already removed in commit cc63b2d1
(PR #4139), but the agent-level context pressure warnings (85%/95% tiered
alerts via _emit_context_pressure) were still firing on both CLI and gateway.
Removed:
- _emit_context_pressure method and all call sites in run_conversation()
- Class-level dedup state (_context_pressure_last_warned, _CONTEXT_PRESSURE_COOLDOWN)
- Instance attribute _context_pressure_warned_at
- Pressure reset logic in _compress_context
- format_context_pressure and format_context_pressure_gateway from agent/display.py
- Orphaned ANSI constants that only served these functions
- tests/run_agent/test_context_pressure.py (all 361 lines)
Compression itself continues to run silently in the background.
Closes#3784
- Extract duplicated activity-callback polling into shared
touch_activity_if_due() helper in tools/environments/base.py
- Use helper from both base.py _wait_for_process and
code_execution_tool.py local polling loop (DRY)
- Add test assertion that timeout output field contains the
timeout message and emoji (#10807)
- Add stream_consumer test for tool-boundary fallback scenario
where continuation is empty but final_text differs from
visible prefix (#10807)
When execute_code times out, the result JSON had status="timeout" and an
error field, but the output field was empty. Many models treat empty
output as "nothing happened" and produce an empty/minimal response. The
gateway stream consumer then considers the response "already sent" (from
pre-tool streaming) and silently drops it — leaving the user staring at
silence.
Three changes:
1. Include the timeout message in the output field (both local and remote
paths) so the model always has visible content to relay to the user.
2. Add periodic activity callbacks to the local execution polling loop so
the gateway's inactivity monitor knows execute_code is alive during
long runs.
3. Fix stream_consumer._send_fallback_final to not silently drop content
when the continuation appears empty but the final text differs from
what was previously streamed (e.g. after a tool boundary reset).
display: null or display: <non-dict> in config.yaml crashed skin init
with AttributeError. Now falls back to default skin gracefully.
Cherry-picked from #10867 by @Bartok9. Consolidates #10876 by @cola-runner.
Co-authored-by: cola-runner <cola-runner@users.noreply.github.com>
When the LLM returns an empty completion, gateway/run.py replaced
final_response with the literal string '(No response generated)'.
This defeated cron/scheduler.py's empty-response skip guard, causing
the placeholder to be delivered to home channels.
Changes:
- gateway/run.py: return empty string instead of placeholder when
there is no error and no response content
- cron/scheduler.py: defensively strip the placeholder text in case
any upstream path still produces it
FixesNousResearch/hermes-agent#9270