Commit graph

2288 commits

Author SHA1 Message Date
Abner
b66644f0ec feat(hindsight): richer session-scoped retain metadata
- Add configurable retain_tags / retain_source / retain_user_prefix /
  retain_assistant_prefix knobs for native Hindsight.
- Thread gateway session identity (user_name, chat_id, chat_name,
  chat_type, thread_id) through AIAgent and MemoryManager into
  MemoryProvider.initialize kwargs so providers can scope and tag
  retained memories.
- Hindsight attaches the new identity fields as retain metadata,
  merges per-call tool tags with configured default tags, and uses
  the configurable transcript labels for auto-retained turns.

Co-authored-by: Abner <abner.the.foreman@agentmail.to>
2026-04-22 05:27:10 -07:00
Teknium
b8663813b6
feat(state): auto-prune old sessions + VACUUM state.db at startup (#13861)
* feat(state): auto-prune old sessions + VACUUM state.db at startup

state.db accumulates every session, message, and FTS5 index entry forever.
A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages
causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM
brought it to 43MB. The prune command and VACUUM are not wired to run
automatically anywhere — sessions grew unbounded until users noticed.

Changes:
- hermes_state.py: new state_meta key/value table, vacuum() method, and
  maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in
  state_meta so it only actually executes once per min_interval_hours
  across all Hermes processes for a given HERMES_HOME. Never raises.
- hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG
  (auto_prune=True, retention_days=90, vacuum_after_prune=True,
  min_interval_hours=24). Added to _KNOWN_ROOT_KEYS.
- cli.py: call maintenance once at HermesCLI init (shared helper
  _run_state_db_auto_maintenance reads config and delegates to DB).
- gateway/run.py: call maintenance once at GatewayRunner init.
- Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section.

Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed
pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays
bloated forever. VACUUM only runs when the prune actually removed rows,
so tight DBs don't pay the I/O cost.

Tests: 10 new tests in tests/test_hermes_state.py covering state_meta,
vacuum, idempotency, interval skipping, VACUUM-only-when-needed,
corrupt-marker recovery. All 246 existing state/config/gateway tests
still pass.

Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG
exposes the new block, load_config() returns it for fresh installs,
first call prunes+vacuums, second call within min_interval_hours skips,
and the state_meta marker persists across connection close/reopen.

* sessions.auto_prune defaults to false (opt-in)

Session history powers session_search recall across past conversations,
so silently pruning on startup could surprise users. Ship the machinery
disabled and let users opt in when they notice state.db is hurting
performance.

- DEFAULT_CONFIG.sessions.auto_prune: True → False
- Call-site fallbacks in cli.py and gateway/run.py match the new default
  (so unmigrated configs still see off)
- Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff
2026-04-22 05:21:49 -07:00
helix4u
a7d78d3bfd fix: preserve reasoning_content on Kimi replay 2026-04-22 04:31:59 -07:00
hengm3467
c6b1ef4e58 feat: add Step Plan provider support (salvage #6005)
Adds a first-class 'stepfun' API-key provider surfaced as Step Plan:

- Support Step Plan setup for both International and China regions
- Discover Step Plan models live from /step_plan/v1/models, with a
  small coding-focused fallback catalog when discovery is unavailable
- Thread StepFun through provider metadata, setup persistence, status
  and doctor output, auxiliary routing, and model normalization
- Add tests for provider resolution, model validation, metadata
  mapping, and StepFun region/model persistence

Based on #6005 by @hengm3467.

Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>
2026-04-22 02:59:58 -07:00
Teknium
ff9752410a
feat(plugins): pluggable image_gen backends + OpenAI provider (#13799)
* feat(plugins): pluggable image_gen backends + OpenAI provider

Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:

- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
  Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
  incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
  `image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
  `plugin.yaml` of their own).

Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.

FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.

- 41 unit tests (scanner recursion, kind parsing, gate logic,
  registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
  `_handle_image_generate` routes to OpenAI when configured

* fix(image_gen/openai): don't send response_format to gpt-image-*

The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.

* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog

gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.

Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.

* feat(image_gen/openai): expose gpt-image-2 as three quality tiers

Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:

  gpt-image-2-low     ~15s   fast iteration
  gpt-image-2-medium  ~40s   default
  gpt-image-2-high    ~2min  highest fidelity

Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.

Config:
  image_gen.openai.model: gpt-image-2-high
  # or
  image_gen.model: gpt-image-2-low
  # or env var for scripts/tests
  OPENAI_IMAGE_MODEL=gpt-image-2-medium

Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.

* feat(tools_config): plugin image_gen providers inject themselves into picker

'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.

Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
  (name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
  every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
  Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
  writes image_gen.provider and routes to the plugin's list_models()
  catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
  FAL key when any plugin provider reports is_available().

FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.

Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.

397 tests pass across plugins/, tools_config, registry, and picker.

* fix(image_gen): close final gaps for plugin-backend parity with FAL

Two small places that still hardcoded FAL:

- hermes_cli/setup.py status line: an OpenAI-only setup showed
  'Image Generation: missing FAL_KEY'. Now probes plugin providers
  and reports '(OpenAI)' when one is_available() — or falls back to
  'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.

- image_generate tool schema description: said 'using FAL.ai, default
  FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
  user-configured' — and notes the 'image' field can be a URL or an
  absolute path, which the gateway delivers either way via
  extract_local_files().
2026-04-21 21:30:10 -07:00
Teknium
410f33a728
fix(kimi): don't send Anthropic thinking to api.kimi.com/coding (#13826)
Kimi's /coding endpoint speaks the Anthropic Messages protocol but has
its own thinking semantics: when thinking.enabled is sent, Kimi validates
the history and requires every prior assistant tool-call message to carry
OpenAI-style reasoning_content. The Anthropic path never populates that
field, and convert_messages_to_anthropic strips Anthropic thinking blocks
on third-party endpoints — so after one tool-calling turn the next request
fails with:

  HTTP 400: thinking is enabled but reasoning_content is missing in
  assistant tool call message at index N

Kimi on chat_completions handles thinking via extra_body in
ChatCompletionsTransport (#13503). On the Anthropic route, drop the
parameter entirely and let Kimi drive reasoning server-side.

build_anthropic_kwargs now gates the reasoning_config -> thinking block
on not _is_kimi_coding_endpoint(base_url).

Tests: 8 new parametric tests cover /coding, /coding/v1, /coding/anthropic,
/coding/ (trailing slash), explicit disabled, other third-party endpoints
still getting thinking (MiniMax), native Anthropic unaffected, and the
non-/coding Kimi root route.
2026-04-21 21:19:14 -07:00
kshitijk4poor
57411fca24 feat: add BedrockTransport + wire all Bedrock transport paths
Fourth and final transport — completes the transport layer with all four
api_modes covered.  Wraps agent/bedrock_adapter.py behind the ProviderTransport
ABC, handles both raw boto3 dicts and already-normalized SimpleNamespace.

Wires all transport methods to production paths in run_agent.py:
- build_kwargs: _build_api_kwargs bedrock branch
- validate_response: response validation, new bedrock_converse branch
- finish_reason: new bedrock_converse branch in finish_reason extraction

Based on PR #13467 by @kshitijk4poor, with one adjustment: the main normalize
loop does NOT add a bedrock_converse branch to invoke normalize_response on
the already-normalized response.  Bedrock's normalize_converse_response runs
at the dispatch site (run_agent.py:5189), so the response already has the
OpenAI-compatible .choices[0].message shape by the time the main loop sees
it.  Falling through to the chat_completions else branch is correct and
sidesteps a redundant NormalizedResponse rebuild.

Transport coverage — complete:
| api_mode           | Transport                | build_kwargs | normalize | validate |
|--------------------|--------------------------|:------------:|:---------:|:--------:|
| anthropic_messages | AnthropicTransport       |             |          |         |
| codex_responses    | ResponsesApiTransport    |             |          |         |
| chat_completions   | ChatCompletionsTransport |             |          |         |
| bedrock_converse   | BedrockTransport         |             |          |         |

17 new BedrockTransport tests pass.  117 transport tests total pass.
160 bedrock/converse tests across tests/agent/ pass.  Full tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
pre-existing test_concurrent_interrupt flake on origin/main).
2026-04-21 20:58:37 -07:00
kshitijk4poor
83d86ce344 feat: add ChatCompletionsTransport + wire all default paths
Third concrete transport — handles the default 'chat_completions' api_mode used
by ~16 OpenAI-compatible providers (OpenRouter, Nous, NVIDIA, Qwen, Ollama,
DeepSeek, xAI, Kimi, custom, etc.). Wires build_kwargs + validate_response to
production paths.

Based on PR #13447 by @kshitijk4poor, with fixes:
- Preserve tool_call.extra_content (Gemini thought_signature) via
  ToolCall.provider_data — the original shim stripped it, causing 400 errors
  on multi-turn Gemini 3 thinking requests.
- Preserve reasoning_content distinctly from reasoning (DeepSeek/Moonshot) so
  the thinking-prefill retry check (_has_structured) still triggers.
- Port Kimi/Moonshot quirks (32000 max_tokens, top-level reasoning_effort,
  extra_body.thinking) that landed on main after the original PR was opened.
- Keep _qwen_prepare_chat_messages_inplace alive and call it through the
  transport when sanitization already deepcopied (avoids a second deepcopy).
- Skip the back-compat SimpleNamespace shim in the main normalize loop — for
  chat_completions, response.choices[0].message is already the right shape
  with .content/.tool_calls/.reasoning/.reasoning_content/.reasoning_details
  and per-tool-call .extra_content from the OpenAI SDK.

run_agent.py: -239 lines in _build_api_kwargs default branch extracted to the
transport. build_kwargs now owns: codex-field sanitization, Qwen portal prep,
developer role swap, provider preferences, max_tokens resolution (ephemeral >
user > NVIDIA 16384 > Qwen 65536 > Kimi 32000 > anthropic_max_output), Kimi
reasoning_effort + extra_body.thinking, OpenRouter/Nous/GitHub reasoning,
Nous product attribution tags, Ollama num_ctx, custom-provider think=false,
Qwen vl_high_resolution_images, request_overrides.

39 new transport tests (8 build_kwargs, 5 Kimi, 4 validate, 4 normalize
including extra_content regression, 3 cache stats, 3 basic). Tests/run_agent/
targeted suite passes (885/885 + 15 skipped; the 1 remaining failure is the
test_concurrent_interrupt flake present on origin/main).
2026-04-21 20:50:02 -07:00
emozilla
29693f9d8e feat(aux): use Portal /api/nous/recommended-models for auxiliary models
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.

hermes_cli/models.py
  - fetch_nous_recommended_models(portal_base_url, force_refresh=False)
    10-minute TTL cache, keyed per portal URL (staging vs prod don't
    collide).  Public endpoint, no auth required.  Returns {} on any
    failure so callers always get a dict.
  - get_nous_recommended_aux_model(vision, free_tier=None, ...)
    Tier-aware pick from the payload:
      - Paid tier → paidRecommended{Vision,Compaction}Model, falling back
        to freeRecommended* when the paid field is null (common during
        staged rollouts of new paid models).
      - Free tier → freeRecommended* only, never leaks paid models.
    When free_tier is None, auto-detects via the existing
    check_nous_free_tier() helper (already cached 3 min against
    /api/oauth/account).  Detection errors default to paid so we never
    silently downgrade a paying user.

agent/auxiliary_client.py — _try_nous()
  - Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
    to get_nous_recommended_aux_model(vision=vision).
  - Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
    Portal is unreachable or returns a null recommendation.
  - The Portal is now the source of truth for aux model selection; the
    xiaomi allowlist we used to carry is effectively dead.

Tests (15 new)
  - tests/hermes_cli/test_models.py::TestNousRecommendedModels
    Fetch caching, per-portal keying, network failure, force_refresh;
    paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
    auto-detect, detection-error → paid default, null/blank modelName
    handling.
  - tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
    _try_nous honors Portal recommendation for text + vision, falls
    back to google/gemini-3-flash-preview on None or exception.

Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
2026-04-21 20:35:16 -07:00
emozilla
c22f4a76de remove Nous Portal free-model allowlist
Drop _NOUS_ALLOWED_FREE_MODELS + filter_nous_free_models and its two call
sites. Whatever Nous Portal prices as free now shows up in the picker as-is
— no local allowlist gatekeeping. Free-tier partitioning (paid vs free in
the menu) still runs via partition_nous_models_by_tier.
2026-04-21 20:35:16 -07:00
kshitijk4poor
c832ebd67c feat: add ResponsesApiTransport + wire all Codex transport paths
Add ResponsesApiTransport wrapping codex_responses_adapter.py behind the
ProviderTransport ABC. Auto-registered via _discover_transports().

Wire ALL Codex transport methods to production paths in run_agent.py:
- build_kwargs: main _build_api_kwargs codex branch (50 lines extracted)
- normalize_response: main loop + flush + summary + retry (4 sites)
- convert_tools: memory flush tool override
- convert_messages: called internally via build_kwargs
- validate_response: response validation gate
- preflight_kwargs: request sanitization (2 sites)

Remove 7 dead legacy wrappers from AIAgent (_responses_tools,
_chat_messages_to_responses_input, _normalize_codex_response,
_preflight_codex_api_kwargs, _preflight_codex_input_items,
_extract_responses_message_text, _extract_responses_reasoning_text).
Keep 3 ID manipulation methods still used by _build_assistant_message.

Update 18 test call sites across 3 test files to call adapter functions
directly instead of through deleted AIAgent wrappers.

24 new tests. 343 codex/responses/transport tests pass (0 failures).

PR 4 of the provider transport refactor.
2026-04-21 19:48:56 -07:00
Teknium
b2ba351380 fix(kimi): reconcile sk-kimi- routing with Anthropic SDK URL semantics
Follow-ups after salvaging xiaoqiang243's kimi-for-coding patches:

- KIMI_CODE_BASE_URL: drop trailing /v1 (was /coding/v1).
  The /coding endpoint speaks Anthropic Messages, and the Anthropic SDK
  appends /v1/messages internally. /coding/v1 + SDK suffix produced
  /coding/v1/v1/messages (a 404). /coding + SDK suffix now yields
  /coding/v1/messages correctly.
- kimi-coding ProviderConfig: keep legacy default api.moonshot.ai/v1 so
  non-sk-kimi- moonshot keys still authenticate. sk-kimi- keys are
  already redirected to api.kimi.com/coding via _resolve_kimi_base_url.
- doctor.py: update Kimi UA to claude-code/0.1.0 (was KimiCLI/1.30.0)
  and rewrite /coding base URLs to /coding/v1 for the /models health
  check (Anthropic surface has no /models).
- test_kimi_env_vars: accept KIMI_CODING_API_KEY as a secondary env var.

E2E verified:
  sk-kimi-<key>  → https://api.kimi.com/coding/v1/messages (Anthropic)
  sk-<legacy>    → https://api.moonshot.ai/v1/chat/completions (OpenAI)
  UA: claude-code/0.1.0, x-api-key: <sk-kimi-*>
2026-04-21 19:48:39 -07:00
Teknium
84449d9afe
fix(prompt): tell CLI agents not to emit MEDIA:/path tags (#13766)
The CLI has no attachment channel — MEDIA:<path> tags are only
intercepted on messaging gateway platforms (Telegram, Discord,
Slack, WhatsApp, Signal, BlueBubbles, email, etc.). On the CLI
they render as literal text, which is confusing for users.

The CLI platform hint was the one PLATFORM_HINTS entry that said
nothing about file delivery, so models trained on the messaging
hints would default to MEDIA: tags on the CLI too. Tool schemas
(browser_tool, tts_tool, etc.) also recommend MEDIA: generically.

Extend the CLI hint to explicitly discourage MEDIA: tags and tell
the agent to reference files by plain absolute path instead.

Add a regression test asserting the CLI hint carries negative
guidance about MEDIA: while messaging hints keep positive guidance.
2026-04-21 19:36:05 -07:00
Teknium
8f167e8791
fix(tts): use per-provider input-character caps instead of global 4000 (#13743)
A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at
4000 chars, causing long inputs to be silently chopped even though the
underlying APIs allow much more:

  - OpenAI:     4096
  - xAI:        15000
  - MiniMax:    10000
  - ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware)
  - Gemini:     ~5000
  - Edge:       ~5000

The schema description also told the model 'Keep under 4000 characters',
which encouraged the agent to self-chunk long briefs into multiple TTS
calls (producing 3 separate audio files instead of one).

New behavior:
  - PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH
    encode the documented per-provider limits.
  - _resolve_max_text_length(provider, cfg) resolves:
      1. tts.<provider>.max_text_length user override
      2. ElevenLabs model_id lookup
      3. provider default
      4. 4000 fallback
  - text_to_speech_tool() and stream_tts_to_speaker() both call the
    resolver; old MAX_TEXT_LENGTH alias kept for back-compat.
  - Schema description no longer hardcodes 4000.

Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253
voice-command/voice-cli tests still pass.
2026-04-21 17:49:39 -07:00
brooklyn!
90fca3c7e0
Merge pull request #13724 from NousResearch/bb/tui-resume-all-sources
fix(tui): /resume picker shows telegram/discord/etc sessions
2026-04-21 18:59:12 -05:00
Brooklyn Nicholson
bd046220b3 fix(tui): narrow /resume sources to human adapters
Follow-up on #13724: showing literally every source was too noisy.\n\n now fetches a wider window (, larger limit) and then filters to a curated allowlist of human-facing sources (tui/cli plus chat adapters like telegram/discord/slack/whatsapp/etc). This keeps row #7 fixed (telegram sessions visible in /resume) without surfacing internal source kinds such as tool/acp.
2026-04-21 18:52:26 -05:00
Teknium
9c9d9b7ddf
feat(delegate): cross-agent file state coordination for concurrent subagents (#13718)
* feat(models): hide OpenRouter models that don't advertise tool support

Port from Kilo-Org/kilocode#9068.

hermes-agent is tool-calling-first — every provider path assumes the
model can invoke tools. Models whose OpenRouter supported_parameters
doesn't include 'tools' (e.g. image-only or completion-only models)
cannot be driven by the agent loop and fail at the first tool call.

Filter them out of fetch_openrouter_models() so they never appear in
the model picker (`hermes model`, setup wizard, /model slash command).

Permissive when the field is missing — OpenRouter-compatible gateways
(Nous Portal, private mirrors, older snapshots) don't always populate
supported_parameters. Treat missing as 'unknown → allow' rather than
silently emptying the picker on those gateways. Only hide models
whose supported_parameters is an explicit list that omits tools.

Tests cover: tools present → kept, tools absent → dropped, field
missing → kept, malformed non-list → kept, non-dict item → kept,
empty list → dropped.

* feat(delegate): cross-agent file state coordination for concurrent subagents

Prevents mangled edits when concurrent subagents touch the same file
(same process, same filesystem — the mangle scenario from #11215).

Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1:

1. FileStateRegistry (tools/file_state.py) — process-wide singleton
   tracking per-agent read stamps and the last writer globally.
   check_stale() names the sibling subagent in the warning when a
   non-owning agent wrote after this agent's last read.

2. Per-path threading.Lock wrapped around the read-modify-write
   region in write_file_tool and patch_tool. Concurrent siblings on
   the same path serialize; different paths stay fully parallel.
   V4A multi-file patches lock in sorted path order (deadlock-free).

3. Delegate-completion reminder in tools/delegate_tool.py: after a
   subagent returns, writes_since(parent, child_start, parent_reads)
   appends '[NOTE: subagent modified files the parent previously
   read — re-read before editing: ...]' to entry.summary when the
   child touched anything the parent had already seen.

Complements (does not replace) the existing path-overlap check in
run_agent._should_parallelize_tool_batch — batch check prevents
same-file parallel dispatch within one agent's turn (cheap prevention,
zero API cost), registry catches cross-subagent and cross-turn
staleness at write time (detection).

Behavior is warning-only, not hard-failing — matches existing project
style. Errors surface naturally: sibling writes often invalidate the
old_string in patch operations, which already errors cleanly.

Tests: tests/tools/test_file_state_registry.py — 16 tests covering
registry state transitions, per-path locking, per-path-not-global
locking, writes_since filtering, kill switch, and end-to-end
integration through the real read_file/write_file/patch handlers.
2026-04-21 16:41:26 -07:00
Brooklyn Nicholson
0dfb7b8a0d fix(tui): /resume picker shows telegram/discord/etc sessions
Reported during TUI v2 blitz retest: /resume modal only surfaced tui/cli
rows, even though `hermes --tui --resume <id>` with a pasted telegram
session id works fine.  The handler double-fetched with explicit
`source="tui"` and `source="cli"` filters and dropped everything else on
the floor.

Drop the filter — list_sessions_rich(source=None) already excludes
child sessions (subagents, compression continuations) via its default,
and users want to resume messenger sessions from inside the TUI.

Adds gateway regression coverage.
2026-04-21 18:28:40 -05:00
helix4u
7ba9c22cde fix(vision): route Nous main-provider vision through tier-aware backend 2026-04-21 14:42:32 -07:00
brooklyn!
e6e993552a
Merge pull request #13622 from NousResearch/bb/tui-model-switch-sticks
fix(model-switch): /model --provider X sticks instead of silently falling back
2026-04-21 16:34:19 -05:00
brooklyn!
3e198f37c9
Merge pull request #13641 from NousResearch/bb/tui-at-folder-filter
fix(tui): @folder: / @file: completions respect the explicit prefix
2026-04-21 16:33:30 -05:00
Teknium
ef589b1a23 test(approval): regression guards for thread-local callback contract
Two unit tests that pin down the threading.local semantics the CLI freeze
fix (#13617 / #13618) relies on:

- main-thread registration must be invisible to child threads (documents
  the underlying bug — if this ever starts passing visible, ACP's
  GHSA-qg5c-hvr5-hjgr race has returned)
- child-thread registration must be visible from that same thread AND
  cleared by the finally block (documents the fix pattern used by
  cli.py's run_agent closure and acp_adapter/server.py)

Pairs with the fix in the preceding commit by @Societus.
2026-04-21 14:29:08 -07:00
helix4u
392b2bb17b fix(auxiliary): refresh Nous runtime credentials after aux 401s 2026-04-21 14:25:57 -07:00
pefontana
48ecb98f8a feat(delegate): orchestrator role and configurable spawn depth (default flat)
Adds role='leaf'|'orchestrator' to delegate_task. With max_spawn_depth>=2,
an orchestrator child retains the 'delegation' toolset and can spawn its
own workers; leaf children cannot delegate further (identical to today).

Default posture is flat — max_spawn_depth=1 means a depth-0 parent's
children land at the depth-1 floor and orchestrator role silently
degrades to leaf. Users opt into nested delegation by raising
max_spawn_depth to 2 or 3 in config.yaml.

Also threads acp_command/acp_args through the main agent loop's delegate
dispatch (previously silently dropped in the schema) via a new
_dispatch_delegate_task helper, and adds a DelegateEvent enum with
legacy-string back-compat for gateway/ACP/CLI progress consumers.

Config (hermes_cli/config.py defaults):
  delegation.max_concurrent_children: 3   # floor-only, no upper cap
  delegation.max_spawn_depth: 1           # 1=flat (default), 2-3 unlock nested
  delegation.orchestrator_enabled: true   # global kill switch

Salvaged from @pefontana's PR #11215. Overrides vs. the original PR:
concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only,
no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which
silently enabled one level of orchestration for every user).

Co-authored-by: pefontana <fontana.pedro93@gmail.com>
2026-04-21 14:23:45 -07:00
pefontana
7c3c7e50c5 test(delegate): make default_toolsets regression test robust to user config
The prior form of this test asserted on CLI_CONFIG["delegation"] after
importing cli, which only passed by accident of pytest-xdist worker
scheduling. cli._hermes_home is frozen at module import time (cli.py:76),
before the tests/conftest.py autouse HERMES_HOME-isolation fixture can
fire, so CLI_CONFIG ends up populated by deep-merging the contributor's
actual ~/.hermes/config.yaml over the defaults (cli.py:359-366). Any
contributor (like me) who still has the legacy key set in their own
config causes a false failure the moment another test file in the same
xdist worker imports cli at module level.

Asserting on the source of load_cli_config() instead sidesteps all of
that: the test now checks the defaults literal directly and is
independent of user config, HERMES_HOME, import order, and worker
scheduling.

Demonstrated failure mode before this fix:
  pytest tests/hermes_cli/test_config_drift.py \
         tests/hermes_cli/test_skills_hub.py -o addopts=""
  -> FAILED (CLI_CONFIG["delegation"] contained "default_toolsets"
     from the user's ~/.hermes/config.yaml)

Part of Initiative 2 / M0.5.
2026-04-21 13:44:27 -07:00
pefontana
631e8793f4 refactor(delegate): drop dead default_toolsets from CLI default config
delegation.default_toolsets was declared in cli.py's CLI_CONFIG default
dict and documented in cli-config.yaml.example, but never read: none of
tools/delegate_tool.py, _load_config(), or any call site ever looked it
up. The live fallback is the DEFAULT_TOOLSETS module constant at
tools/delegate_tool.py:101, which stays as-is.

hermes_cli/config.py's DEFAULT_CONFIG["delegation"] already omits the
key — this commit aligns cli.py with that.

Adds a regression test in tests/hermes_cli/test_config_drift.py so a
future refactor that re-adds the key without wiring it up to
_load_config() fails loudly.

Part of Initiative 2 / M0.5.
2026-04-21 13:44:27 -07:00
Teknium
5ffae9228b
feat(image-gen): add GPT Image 2 to FAL catalog (#13677)
Adds OpenAI's new GPT Image 2 model via FAL.ai, selectable through
`hermes tools` → Image Generation. SOTA text rendering (including CJK)
and world-aware photorealism.

- FAL_MODELS entry with image_size_preset style
- 4:3 presets on all aspect ratios — 16:9 (1024x576) falls below
  GPT-Image-2's 655,360 min-pixel floor and would be rejected
- quality pinned to medium (same rule as gpt-image-1.5) for
  predictable Nous Portal billing
- BYOK (openai_api_key) deliberately omitted from supports so all
  users stay on shared FAL billing
- 6 new tests covering preset mapping, quality pinning, and
  supports-whitelist integrity
- Docs table + aspect-ratio map updated

Live-tested end-to-end: 39.9s cold request, clean 1024x768 PNG
2026-04-21 13:35:31 -07:00
Teknium
e889332c99
fix(gateway): always inject reply-to pointer, not just when quoted text is absent (#13676)
The [Replying to: "..."] prefix is disambiguation, not deduplication. When
a user explicitly replies to a prior message, the agent needs a pointer to
which specific message they're referencing — even when the quoted text
already exists somewhere in history. History can contain the same or
similar text multiple times; without an explicit pointer the agent has to
guess (or answer for both subjects), and the reply signal is silently
dropped.

Example: in a conversation comparing Japan and Italy, replying to the
"Japan is great for culture..." message and asking "What's the best time
to go?" — previously the found_in_history check suppressed the prefix
because the quoted text was already in history, leaving the agent to
guess which destination the user meant. Now the pointer is always present.

Drops the found_in_history guard added in #1594. Token overhead is
minimal (snippet capped at 500 chars on the new user turn; cached prefix
unaffected). Behavior becomes deterministic: reply sent ⇒ pointer present.

Thanks to smartyi for flagging this.
2026-04-21 13:33:02 -07:00
Brooklyn Nicholson
9d9db1e910 fix(tui): @folder: only yields directories, @file: only yields files
Reported during TUI v2 blitz testing: typing `@folder:` in the composer
pulled up .dockerignore, .env, .gitignore, and every other file in the
cwd alongside the actual directories. The completion loop yielded every
entry regardless of the explicit prefix and auto-rewrote each completion
to @file: vs @folder: based on is_dir — defeating the user's choice.

Also fixed a pre-existing adjacent bug: a bare `@file:` or `@folder:`
(no path) used expanded=="." as both search_dir AND match_prefix,
filtering the list to dotfiles only. When expanded is empty or ".",
search in cwd with no prefix filter.

- want_dir = prefix == "@folder:" drives an explicit is_dir filter
- preserve the typed prefix in completion text instead of rewriting
- three regression tests cover: folder-only, file-only, and the bare-
  prefix case where completions keep the `@folder:` prefix
2026-04-21 14:31:48 -05:00
Brooklyn Nicholson
f0b763c74f fix(model-switch): drop stale provider from fallback chain and env after /model
Reported during the TUI v2 blitz test: switching from openrouter to
anthropic via `/model <name> --provider anthropic` appeared to succeed,
but the next turn kept hitting openrouter — the provider the user was
deliberately moving away from.

Two gaps caused this:

1. `Agent.switch_model` reset `_fallback_activated` / `_fallback_index`
   but left `_fallback_chain` intact. The chain was seeded from
   `fallback_providers:` at agent init for the *original* primary, so
   when the new primary returned 401 (invalid/expired Anthropic key),
   `_try_activate_fallback()` picked the old provider back up without
   informing the user. Prune entries matching either the old primary
   (user is moving away) or the new primary (redundant) whenever the
   primary provider actually changes.

2. `_apply_model_switch` persisted `HERMES_MODEL` but never updated
   `HERMES_INFERENCE_PROVIDER`. Any ambient re-resolution of the runtime
   (credential pool refresh, compressor rebuild, aux clients) falls
   through to that env var in `resolve_requested_provider`, so it kept
   reporting the original provider even after an in-memory switch.

Adds three regression tests: fallback-chain prune on primary change,
no-op on same-provider model swap, and env-var sync on explicit switch.
2026-04-21 14:31:47 -05:00
IAvecilla
54c2261214
Rename test variables 2026-04-21 16:00:34 -03:00
IAvecilla
aa61831a14
fix(cli): keep snake_case underscores intact in strip markdown mode 2026-04-21 15:32:59 -03:00
kshitijk4poor
9556fef5a1 fix(tui): improve macOS paste and shortcut parity
- support Cmd-as-super and readline-style fallback shortcuts on macOS
- add layered clipboard/OSC52 paste handling and immediate image-path attach
- add IDE terminal setup helpers, terminal parity hints, and aligned docs
2026-04-21 08:00:00 -07:00
Teknium
5e0eed470f
fix(cache): enable prompt caching for Qwen on OpenCode/OpenCode-Go/Alibaba (#13528)
Qwen models on OpenCode, OpenCode Go, and direct DashScope accept
Anthropic-style cache_control markers on OpenAI-wire chat completions,
but hermes only injected markers for Claude-named models. Result: zero
cache hits on every turn, full prompt re-billed — a community user
reported burning through their OpenCode Go subscription on Qwen3.6.

Extend _anthropic_prompt_cache_policy to return (True, False) — envelope
layout, not native — for the Alibaba provider family when the model name
contains 'qwen'. Envelope layout places markers on inner content blocks
(matching pi-mono's 'alibaba' cacheControlFormat) and correctly skips
top-level markers on tool-role messages (which OpenCode rejects).

Non-Qwen models on these providers (GLM, Kimi) keep their existing
behaviour — they have automatic server-side caching and don't need
client markers.

Upstream reference: pi-mono #3392 / #3393 documented this contract for
opencode-go Qwen models.

Adds 7 regression tests covering Qwen3.5/3.6/coder on each affected
provider plus negative cases for GLM/Kimi/OpenRouter-Qwen.
2026-04-21 06:40:58 -07:00
Teknium
244ae6db15
fix(web_server,whatsapp-bridge): validate Host header against bound interface (#13530)
DNS rebinding attack: a victim browser that has the dashboard (or the
WhatsApp bridge) open could be tricked into fetching from an
attacker-controlled hostname that TTL-flips to 127.0.0.1. Same-origin
and CORS checks don't help — the browser now treats the attacker origin
as same-origin with the local service. Validating the Host header at
the app layer rejects any request whose Host isn't one we bound for.

Changes:

hermes_cli/web_server.py:
- New host_header_middleware runs before auth_middleware. Reads
  app.state.bound_host (set by start_server) and rejects requests
  whose Host header doesn't match the bound interface with HTTP 400.
- Loopback binds accept localhost / 127.0.0.1 / ::1. Non-loopback
  binds require exact match. 0.0.0.0 binds skip the check (explicit
  --insecure opt-in; no app-layer defence possible).
- IPv6 bracket notation parsed correctly: [::1] and [::1]:9119 both
  accepted.

scripts/whatsapp-bridge/bridge.js:
- Express middleware rejects non-loopback Host headers. Bridge
  already binds 127.0.0.1-only, this adds the complementary app-layer
  check for DNS rebinding defence.

Tests: 8 new in tests/hermes_cli/test_web_server_host_header.py
covering loopback/non-loopback/zero-zero binds, IPv6 brackets, case
insensitivity, and end-to-end middleware rejection via TestClient.

Reported in GHSA-ppp5-vxwm-4cf7 by @bupt-Yy-young. Hardening — not
CVE per SECURITY.md §3. The dashboard's main trust boundary is the
loopback bind + session token; DNS rebinding defeats the bind assumption
but not the token (since the rebinding browser still sees a first-party
fetch to 127.0.0.1 with the token-gated API). Host-header validation
adds the missing belt-and-braces layer.
2026-04-21 06:26:35 -07:00
Teknium
16accd44bd
fix(telegram): require TELEGRAM_WEBHOOK_SECRET in webhook mode (#13527)
When TELEGRAM_WEBHOOK_URL was set but TELEGRAM_WEBHOOK_SECRET was not,
python-telegram-bot received secret_token=None and the webhook endpoint
accepted any HTTP POST. Anyone who could reach the listener could inject
forged updates — spoofed user IDs, spoofed chat IDs, attacker-controlled
message text — and trigger handlers as if Telegram delivered them.

The fix refuses to start the adapter in webhook mode without the secret.
Polling mode (default, no webhook URL) is unaffected — polling is
authenticated by the bot token directly.

BREAKING CHANGE for webhook-mode deployments that never set
TELEGRAM_WEBHOOK_SECRET. The error message explains remediation:

  export TELEGRAM_WEBHOOK_SECRET="$(openssl rand -hex 32)"

and instructs registering it with Telegram via setWebhook's secret_token
parameter. Release notes must call this out.

Reported in GHSA-3vpc-7q5r-276h by @bupt-Yy-young. Hardening — not CVE
per SECURITY.md §3 "Public Exposure: Deploying the gateway to the
public internet without external authentication or network protection"
covers the historical default, but shipping a fail-open webhook as the
default was the wrong choice and the guard aligns us with the SECURITY.md
threat model.
2026-04-21 06:23:09 -07:00
Teknium
62348cffbe
fix(acp): wire approval callback + make it thread-local (#13525)
Two related ACP approval issues:

GHSA-96vc-wcxf-jjff — ACP's _run_agent never set HERMES_INTERACTIVE
(or any other flag recognized by tools.approval), so check_all_command_guards
took the non-interactive auto-approve path and never consulted the
ACP-supplied approval callback (conn.request_permission). Dangerous
commands executed in ACP sessions without operator approval despite
the callback being installed. Fix: set HERMES_INTERACTIVE=1 around
the agent run so check_all_command_guards routes through
prompt_dangerous_approval(approval_callback=...) — the correct shape
for ACP's per-session request_permission call. HERMES_EXEC_ASK would
have routed through the gateway-queue path instead, which requires a
notify_cb registered in _gateway_notify_cbs (not applicable to ACP).

GHSA-qg5c-hvr5-hjgr — _approval_callback and _sudo_password_callback
were module-level globals in terminal_tool. Concurrent ACP sessions
running in ThreadPoolExecutor threads each installed their own callback
into the same slot, racing. Fix: store both callbacks in threading.local()
so each thread has its own slot. CLI mode (single thread) is unaffected;
gateway mode uses a separate queue-based approval path and was never
touched.

set_approval_callback is now called INSIDE _run_agent (the executor
thread) rather than before dispatching — so the TLS write lands on the
correct thread.

Tests: 5 new in tests/acp/test_approval_isolation.py covering
thread-local isolation of both callbacks and the HERMES_INTERACTIVE
callback routing. Existing tests/acp/ (159 tests) and tests/tools/
approval-related tests continue to pass.

Fixes GHSA-96vc-wcxf-jjff
Fixes GHSA-qg5c-hvr5-hjgr
2026-04-21 06:20:40 -07:00
Teknium
ba4357d13b
fix(env_passthrough): reject Hermes provider credentials from skill passthrough (#13523)
A skill declaring `required_environment_variables: [ANTHROPIC_TOKEN]` in
its SKILL.md frontmatter silently bypassed the `execute_code` sandbox's
credential-scrubbing guarantee. `register_env_passthrough` had no
blocklist, so any name a skill chose flipped `is_env_passthrough(name) =>
True`, which shortcircuits the sandbox's secret filter.

Fix: reject registration when the name appears in
`_HERMES_PROVIDER_ENV_BLOCKLIST` (the canonical list of Hermes-managed
credentials — provider keys, gateway tokens, etc.). Log a warning naming
GHSA-rhgp-j443-p4rf so operators see the rejection in logs.

Non-Hermes third-party API keys (TENOR_API_KEY for gif-search,
NOTION_TOKEN for notion skills, etc.) remain legitimately registerable —
they were never in the sandbox scrub list in the first place.

Tests: 16 -> 17 passing. Two old tests that documented the bypass
(`test_passthrough_allows_blocklisted_var`, `test_make_run_env_passthrough`)
are rewritten to assert the new fail-closed behavior. New
`test_non_hermes_api_key_still_registerable` locks in that legitimate
third-party keys are unaffected.

Reported in GHSA-rhgp-j443-p4rf by @q1uf3ng. Hardening; not CVE-worthy
on its own per the decision matrix (attacker must already have operator
consent to install a malicious skill).
2026-04-21 06:14:25 -07:00
Teknium
7fc1e91811
security(runtime_provider): close OLLAMA_API_KEY substring-leak sweep miss (#13522)
Two call sites still used a raw substring check to identify ollama.com:

  hermes_cli/runtime_provider.py:496:
      _is_ollama_url = "ollama.com" in base_url.lower()

  run_agent.py:6127:
      if fb_base_url_hint and "ollama.com" in fb_base_url_hint.lower() ...

Same bug class as GHSA-xf8p-v2cg-h7h5 (OpenRouter substring leak), which
was fixed in commit dbb7e00e via base_url_host_matches() across the
codebase. The earlier sweep missed these two Ollama sites. Self-discovered
during April 2026 security-advisory triage; filed as GHSA-76xc-57q6-vm5m.

Impact is narrow — requires a user with OLLAMA_API_KEY configured AND a
custom base_url whose path or look-alike host contains 'ollama.com'.
Users on default provider flows are unaffected. Filed as a draft advisory
to use the private-fork flow; not CVE-worthy on its own.

Fix is mechanical: replace substring check with base_url_host_matches
at both sites. Same helper the rest of the codebase uses.

Tests: 67 -> 71 passing. 7 new host-matcher cases in
tests/test_base_url_hostname.py (path injection, lookalike host,
localtest.me subdomain, ollama.ai TLD confusion, localhost, genuine
ollama.com, api.ollama.com subdomain) + 4 call-site tests in
tests/hermes_cli/test_runtime_provider_resolution.py verifying
OLLAMA_API_KEY is selected only when base_url actually targets
ollama.com.

Fixes GHSA-76xc-57q6-vm5m
2026-04-21 06:06:16 -07:00
Teknium
4cc5065f63 fix(acp): follow-up — named-const page size, alias kwarg, tests
- Replace kwargs.get('limit', 50) with module-level _LIST_SESSIONS_PAGE_SIZE
  constant. ListSessionsRequest schema has no 'limit' field, so the kwarg
  path was dead. Constant is the single source of truth for the page cap.
- Use next_cursor= (field name) instead of nextCursor= (alias). Both work
  under the schema's populate_by_name config, but using the declared
  Python field name is the consistent style in this file.
- Add docstring explaining cwd pass-through and cursor semantics.
- Add 4 tests: first-page with next_cursor, single-page no next_cursor,
  cursor resumes after match, unknown cursor returns empty page.
2026-04-21 06:00:41 -07:00
Aniruddha Adak
ea06104a3c fix(permissions): handle None response from ACP request_permission 2026-04-21 05:57:23 -07:00
unlinearity
155b619867 fix(agent): normalize socks:// env proxies for httpx/anthropic
WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed.

Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them:
  - run_agent._get_proxy_from_env
  - agent.auxiliary_client._validate_proxy_env_urls
  - agent.anthropic_adapter.build_anthropic_client
  - gateway.platforms.base.resolve_proxy_url

Regression coverage:
  - run_agent proxy env resolution
  - auxiliary proxy env normalization
  - gateway proxy URL resolution

Verified with:
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py

39 passed.
2026-04-21 05:52:46 -07:00
teknium1
267b2faa15 test(cron): exercise _deliver_result and _send_media_via_adapter directly for timeout-cancel
The original tests replicated the try/except/cancel/raise pattern inline with
a mocked future, which tested Python's try/except semantics rather than the
scheduler's behavior. Rewrite them to invoke _deliver_result and
_send_media_via_adapter end-to-end with a real concurrent.futures.Future
whose .result() raises TimeoutError.

Mutation-verified: both tests fail when the try/except wrappers are removed
from cron/scheduler.py, pass with them in place.
2026-04-21 05:52:16 -07:00
VTRiot
18e7fd8364 fix(cron): cancel orphan coroutine on delivery timeout before standalone fallback
When the live adapter delivery path (_deliver_result) or media send path
(_send_media_via_adapter) times out at future.result(timeout=N), the
underlying coroutine scheduled via asyncio.run_coroutine_threadsafe can
still complete on the event loop, causing a duplicate send after the
standalone fallback runs.

Cancel the future on TimeoutError before re-raising, so the standalone
fallback is the sole delivery path.

Adds TestDeliverResultTimeoutCancelsFuture and
TestSendMediaTimeoutCancelsFuture.
2026-04-21 05:52:16 -07:00
Kian Meng
063bc3c1e2 fix(kimi): send max_tokens, reasoning_effort, and thinking for Kimi/Moonshot
Kimi/Moonshot endpoints require explicit parameters that Hermes was not
sending, causing 'Response truncated due to output length limit' errors
and inconsistent reasoning behavior.

Root cause analysis against Kimi CLI source (MoonshotAI/kimi-cli,
packages/kosong/src/kosong/chat_provider/kimi.py):

1. max_tokens: Kimi's API defaults to a very low value when omitted.
   Reasoning tokens share the output budget — the model exhausts it on
   thinking alone.  Send 32000, matching Kimi CLI's generate() default.

2. reasoning_effort: Kimi CLI sends this as a top-level parameter (not
   inside extra_body).  Hermes was not sending it at all because
   _supports_reasoning_extra_body() returns False for non-OpenRouter
   endpoints.

3. extra_body.thinking: Kimi CLI uses with_thinking() which sets
   extra_body.thinking={"type":"enabled"} alongside reasoning_effort.
   This is a separate control from the OpenAI-style reasoning extra_body
   that Hermes sends for OpenRouter/GitHub.  Without it, the Kimi gateway
   may not activate reasoning mode correctly.

Covers api.kimi.com (Kimi Code) and api.moonshot.ai/cn (Moonshot).

Tests: 6 new test cases for max_tokens, reasoning_effort, and
extra_body.thinking under various configs.
2026-04-21 05:32:27 -07:00
Teknium
3f72b2fe15 fix(/model): accept provider switches when /models is unreachable
Gateway /model <name> --provider opencode-go (or any provider whose /models
endpoint is down, 404s, or doesn't exist) silently failed. validate_requested_model
returned accepted=False whenever fetch_api_models returned None, switch_model
returned success=False, and the gateway never wrote _session_model_overrides —
so the switch appeared to succeed in the error message flow but the next turn
kept calling the old provider.

The validator already had static-catalog fallbacks for MiniMax and Codex
(providers without a /models endpoint). Extended the same pattern as the
terminal fallback: when the live probe fails, consult provider_model_ids()
for the curated catalog. Known models → accepted+recognized. Close typos →
auto-corrected. Unknown models → soft-accepted with a 'Not in curated
catalog' warning. Providers with no catalog at all → soft-accepted with a
generic 'Note:' warning, finally honoring the in-code comment ('Accept and
persist, but warn') that had been lying since it was written.

Tests: 7 new tests in test_opencode_go_validation_fallback.py covering the
catalog lookup, case-insensitive match, auto-correct, unknown-with-suggestion,
unknown-without-suggestion, and no-catalog paths. TestValidateApiFallback in
test_model_validation.py updated — its four 'rejected_when_api_down' tests
were encoding exactly the bug being fixed.
2026-04-21 05:19:43 -07:00
Ben
724377c429 test(mcp): add failing tests for circuit-breaker recovery
The MCP circuit breaker in tools/mcp_tool.py has no half-open state and
no reset-on-reconnect behavior, so once it trips after 3 consecutive
failures it stays tripped for the process lifetime. These tests lock
in the intended recovery behavior:

1. test_circuit_breaker_half_opens_after_cooldown — after the cooldown
   elapses, the next call must actually probe the session; success
   closes the breaker.
2. test_circuit_breaker_reopens_on_probe_failure — a failed probe
   re-arms the cooldown instead of letting every subsequent call
   through.
3. test_circuit_breaker_cleared_on_reconnect — a successful OAuth
   recovery resets the breaker even if the post-reconnect retry
   fails (a successful reconnect is sufficient evidence the server
   is viable again).

All three currently fail, as expected.
2026-04-21 05:19:03 -07:00
Teknium
c6974043ef
refactor(acp): validate method_id against advertised provider in authenticate() (#13468)
* feat(models): hide OpenRouter models that don't advertise tool support

Port from Kilo-Org/kilocode#9068.

hermes-agent is tool-calling-first — every provider path assumes the
model can invoke tools. Models whose OpenRouter supported_parameters
doesn't include 'tools' (e.g. image-only or completion-only models)
cannot be driven by the agent loop and fail at the first tool call.

Filter them out of fetch_openrouter_models() so they never appear in
the model picker (`hermes model`, setup wizard, /model slash command).

Permissive when the field is missing — OpenRouter-compatible gateways
(Nous Portal, private mirrors, older snapshots) don't always populate
supported_parameters. Treat missing as 'unknown → allow' rather than
silently emptying the picker on those gateways. Only hide models
whose supported_parameters is an explicit list that omits tools.

Tests cover: tools present → kept, tools absent → dropped, field
missing → kept, malformed non-list → kept, non-dict item → kept,
empty list → dropped.

* refactor(acp): validate method_id against advertised provider in authenticate()

Previously authenticate() accepted any method_id whenever the server had
provider credentials configured. This was not a vulnerability under the
personal-assistant trust model (ACP is stdio-only, local-trust — anything
that can reach the transport is already code-execution-equivalent to the
user), but it was sloppy API hygiene: the advertised auth_methods list
from initialize() was effectively ignored.

Now authenticate() only returns AuthenticateResponse when method_id
matches the currently-advertised provider (case-insensitive). Mismatched
or missing method_id returns None, consistent with the no-credentials
case.

Raised by xeloxa via GHSA-g5pf-8w9m-h72x. Declined as a CVE
(ACP transport is stdio, local-trust model), but the correctness fix is
worth having on its own.
2026-04-21 03:39:55 -07:00
Teknium
c1fe6339b7 test(telegram): update /cmd@botname assertion for entity-only detection
Current main's _message_mentions_bot() uses MessageEntity-only detection
(commit e330112a), so the test for '/status@hermes_bot' needs to include
a MENTION entity. Real Telegram always emits one for /cmd@botname — the
bot menu and CommandHandler rely on this mechanism.
2026-04-21 03:06:56 -07:00
pinion05
b0939d9210 fix: slash commands now respect require_mention in Telegram groups
When require_mention is enabled, slash commands no longer bypass
mention checks. Bare /command without @mention is filtered in groups,
while /command@botname (bot menu) and @botname /command still pass.

Commands still pass unconditionally when require_mention is disabled,
preserving backward compatibility.

Closes #6033
2026-04-21 03:06:56 -07:00