Follow-up to snav's PR #25463 contribution: flip default to on, broaden
scope so backfill fires whenever require_mention gates the bot (not just
shared-session channels).
Why:
- The mention-gate creates a session-transcript gap regardless of whether
the channel is shared or per-user. In per-user sessions, Alice's session
is still missing other participants' messages and her own pre-mention
messages — backfill fills both gaps.
- Threads naturally scope to thread-only history because discord.py's
channel.history() on a thread returns only that thread's messages.
- DMs still skip — every DM triggers the bot, so the session transcript
is already complete.
Changes:
- hermes_cli/config.py: discord.history_backfill default → true
- gateway/platforms/discord.py: drop the _is_shared gate, keep _is_dm
skip and _needed_mention gate; env var DISCORD_HISTORY_BACKFILL
default → 'true'
- cli-config.yaml.example + website docs: update defaults and prose;
add the DISCORD_HISTORY_BACKFILL / _LIMIT env var rows that were
documented in the PR description but missing from the env-var table
- tests/gateway/test_discord_free_response.py:
- flip test_discord_per_user_channel_does_not_backfill →
test_discord_per_user_channel_backfills_too (new behavior)
- add test_discord_dm_does_not_backfill (DM skip is invariant)
- give FakeThread a no-op history() so existing thread tests don't hit
a fake discord.Forbidden when backfill now fires on threads too
Tests: 160/160 in target files; 400/400 across all tests/gateway/ -k discord.
Adds optional channel-context backfill for Discord shared-channel sessions
so the agent can see recent messages it missed between its own turns
(typically when require_mention=true filters out most traffic).
Previously the agent only saw the @mention message that triggered it, which
led to disorienting replies in active multi-user channels where the
conversation context was invisible. With backfill enabled, a configurable
number of recent messages are fetched per-turn and prepended to the trigger
message as a context block, kept separate from sender-prefix logic so
attribution remains clean.
This re-opens the work from #13063 (approved by @OutThisLife on 2026-04-20,
closed when I closed the branch to address the simpolism:main head-branch
issue plus an ordering bug I caught later in live use). Filing against the
freshly-rewritten problem statement in #13054 so the design is grounded in
the failure mode rather than the implementation shape.
The implementation follows the **push-mode last-self-anchored** design from
the two options laid out in #13054. See the issue for the trade-off
discussion vs pull-mode (#13120 was an earlier closed PR using that shape).
Treating this as a reference implementation — happy to rewrite as
last-trigger anchoring or as a hybrid with #13120 if maintainers prefer.
Changes:
- gateway/platforms/discord.py:
- new `_discord_history_backfill()` / `_discord_history_backfill_limit()`
helpers (config.extra > env > default), mirroring the existing
`_discord_require_mention()` shape
- new `_fetch_channel_context()` that scans `channel.history()` backwards
from the trigger to the bot's last message (or limit), formats as
`[Recent channel messages] / [name] msg / ...`, respects DISCORD_ALLOW_BOTS,
skips system messages
- per-channel `_last_self_message_id` cache to narrow the fetch window
on hot paths (avoids full history scan when the bot has spoken recently)
- **IMPORTANT**: passes `oldest_first=False` explicitly to `channel.history()`.
discord.py 2.x silently flips the default to True when `after=` is supplied,
which would select the EARLIEST N messages after our last response instead
of the LATEST N before the trigger. In high-traffic windows this would
return stale tool traces and drop the actual final answer the user is
asking about. See regression test below. Caught in live use during a
Codex tool-trace burst on May 13 2026.
- gateway/config.py: discord_history_backfill + discord_history_backfill_limit
settings + yaml→env bridge
- gateway/platforms/base.py: channel_context field on MessageEvent
- gateway/run.py: prepend channel_context after sender-prefix so the
[sender name] tag applies to the trigger message alone, not to the backfill
- hermes_cli/config.py: defaults for new discord.history_backfill and
discord.history_backfill_limit keys
- cli-config.yaml.example: documented defaults
- tests/gateway/test_discord_free_response.py: 7 new tests covering
cold-start backfill, self-message stop boundary, other-bot filtering,
cache hot-path narrowing, stale-cache fallback, shared-channel +
per-user backfill paths, and the ordering regression test
(`test_fetch_channel_context_cache_uses_latest_window_when_after_set`)
- tests/gateway/test_config.py: yaml→env bridge tests
- tests/gateway/test_session.py: prefix-order edge cases
- website/docs/user-guide/messaging/discord.md: env vars + config keys +
usage docs
Tested on Ubuntu 24.04 — empirically validated in my own multi-bot Discord
research server for the past three weeks.
Fixes#13054
Supersedes #13063 (closed)
Follow-up on the salvaged feat commit:
- Keep the constructor / config / yaml-example default at 3 so existing
gateway and CLI users see no behavioural change. PR #13754 (which this
builds on) had lowered the default to 2 to chase pre-feature parity in
the system-prompt-present case, at the cost of quietly halving the
protected head for the gateway path (which strips the system prompt
before calling compress()). With the new "system prompt is implicit"
semantics, default 3 gives every caller a stable head shape.
- agent/context_engine.py: bring the ABC's protect_first_n docstring in
line with the new semantics so plugin context engines interpret the
config key the same way the built-in compressor does.
- tests: adjust the default-value test (3, not 2) and a stale comment;
per-test protect_first_n=2/3/1 values added in PR #13754 stay as-is
since those tests fix concrete head shapes.
The number of head messages preserved verbatim across context compactions
was previously hardcoded to 3 in AIAgent.__init__. Expose it as
`compression.protect_first_n` in config, matching the existing
`protect_last_n` pattern.
Motivation: users who rely on rolling compaction for long-running sessions
had the opening user/assistant exchange pinned as head forever, which
doesn't always match how they want the session framed after many
compactions. Lowering to 1 preserves the system prompt + first non-system
message; lowering to 0 preserves only the system prompt and lets the
entire first exchange age out naturally through the summary.
Semantics: `protect_first_n` counts non-system head messages protected
**in addition to** the system prompt, which is always implicitly protected
when present. Same meaning across both code paths:
protect_first_n=0 → system prompt only (or nothing if no system message)
protect_first_n=2 → system prompt + first 2 non-system messages (default)
This unifies the CLI path (which reads messages with the system prompt at
position 0) and the gateway path (where the gateway /compress handler
strips the system prompt before calling compress() — see
gateway/run.py L9150-9154 on the parent fork). Previously these two paths
disagreed:
CLI path: protect_first_n=1 → protect system prompt only
Gateway path: protect_first_n=1 → protect first USER turn forever
In practice on long-running gateway sessions the old semantics pinned
whatever stale aside happened to be the first user message, reinserting
it into every compaction summary indefinitely.
Default chosen as 2 (not 3) so that the effective protected head count
remains 3 messages in the common case — assuming a system prompt is
present, default protection becomes system + 2 non-system = 3 total,
matching the pre-feature behaviour where `protect_first_n` was hardcoded
to protect 3 messages total. Sessions without a system prompt will see a
small behaviour change (2 protected head messages instead of 3), but this
is the rare path and the new semantics make the system-prompt-present
case the well-defined one.
Changes:
- agent/context_compressor.py: redefine protect_first_n as the count of
non-system head messages protected beyond the implicit system-prompt
guarantee; both paths converge. Constructor default updated to 2.
- hermes_cli/config.py: add `compression.protect_first_n` default (2),
matching the new semantics. `show_config` label tweaked to
'Protect first: N non-system head messages' for clarity.
- run_agent.py: read protect_first_n from config; 0 is now valid (system
prompt is always implicitly protected).
- cli-config.yaml.example: document the new key and rationale.
- tests/agent/test_context_compressor.py: cover default, override, the
end-to-end `protect_first_n=0` and `protect_first_n=1` behaviour,
the no-system-prompt (gateway) path, and the new shared-semantics
regression test.
Fixes#13751
Tested on Ubuntu 24.04.
Two independent opt-in QoL toggles, both off by default.
terminal.docker_extra_args:
- List of extra flags appended verbatim to docker run after security
defaults. Useful for adding capabilities (e.g. --cap-add SETUID) or
other docker run options not exposed by existing config keys.
- Non-string entries are logged and skipped.
- Also available via TERMINAL_DOCKER_EXTRA_ARGS='[...]' env var.
display.timestamps:
- Appends [HH:MM] to user input bullet and the assistant response box
header. Single hub in _format_submitted_user_message_preview()
covers both single-line and multi-line user previews; assistant
response label gets the timestamp at box-open time.
Closes#1569 (timestamps).
Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
Adds Google Chat as a new gateway platform, shipped under
plugins/platforms/google_chat/ following the canonical bundled-plugin
pattern (Teams, IRC). Rewired from the original PR #18425 to use the
new env_enablement_fn + cron_deliver_env_var plugin interfaces landed
in the preceding commit, so the adapter touches ZERO core files.
What it does:
- Inbound DM + group messages via Cloud Pub/Sub pull subscription (no
public URL needed), with attachments (PDFs, images, audio, video)
downloaded through an SSRF-guarded Google-host allowlist.
- Outbound text replies with the 'Hermes is thinking…' patch-in-place
pattern — no tombstones.
- Native file attachment delivery via per-user OAuth. Google Chat's
media.upload endpoint rejects service-account auth, so each user
runs /setup-files once in their own DM to grant
chat.messages.create for themselves; the adapter then uploads as
them. Tokens stored per email at
~/.hermes/google_chat_user_tokens/<email>.json.
- Thread isolation: side-threads get isolated sessions, top-level DM
messages share one continuous session. Persistent thread-count
store survives gateway restart.
- Supervisor reconnect with exponential backoff.
- Multi-user out of the box.
How it plugs in (no core edits):
- env_enablement_fn seeds PlatformConfig.extra with project_id,
subscription_name, service_account_json, and the home_channel dict
(which the core hook turns into a HomeChannel dataclass). Reads
GOOGLE_CHAT_PROJECT_ID (falls back to GOOGLE_CLOUD_PROJECT),
GOOGLE_CHAT_SUBSCRIPTION_NAME (falls back to GOOGLE_CHAT_SUBSCRIPTION),
GOOGLE_CHAT_SERVICE_ACCOUNT_JSON (falls back to
GOOGLE_APPLICATION_CREDENTIALS), GOOGLE_CHAT_HOME_CHANNEL.
- cron_deliver_env_var='GOOGLE_CHAT_HOME_CHANNEL' gets cron delivery
for free — cron/scheduler.py consults the platform registry for any
name not in its hardcoded built-in sets.
- plugin.yaml's rich requires_env / optional_env blocks auto-populate
OPTIONAL_ENV_VARS via the new hermes_cli/config.py injector, so
'hermes config' UI surfaces them with description / url / prompt /
password metadata.
- Module-level Platform('google_chat') call in adapter.py triggers the
Platform._missing_() registration so Platform.GOOGLE_CHAT attribute
access works without an enum entry.
Distribution: ships inside the existing hermes-agent package. Users
opt in via 'pip install hermes-agent[google_chat]' and follow the
8-step GCP walkthrough at
website/docs/user-guide/messaging/google_chat.md.
Test coverage: 153 tests in tests/gateway/test_google_chat.py, all
passing. Spans platform registration, env config loading, Pub/Sub
envelope routing, outbound send + chunking + typing patch-in-place,
attachment send paths, SSRF guard, thread/session model,
supervisor reconnect, authorization, per-user OAuth, and the new
plugin-registry cron delivery wiring.
Credit: adapter + OAuth + tests + docs authored by @donramon77
(PR #18425). Rewire onto the new plugin hooks + salvage commit by
Teknium.
Co-Authored-By: Ramón Fernández <112875006+donramon77@users.noreply.github.com>
When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running '⏳ Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Telegram); silently no-ops elsewhere. Off by default.
Failed runs skip cleanup so bubbles stay as breadcrumbs.
Minimal plumbing: base.py's existing post_delivery_callback slot now chains
new registrations onto any existing callback (with per-callback exception
isolation) rather than clobbering. Stale-generation registrations are
rejected so they can't step on a fresher run's callbacks. This lets the
cleanup callback coexist with the background-review release hook already
registered on the same slot.
Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>
Enable OpenRouter's response caching feature (beta) via X-OpenRouter-Cache
headers. When enabled, identical API requests return cached responses for
free (zero billing), reducing both latency and cost.
Configuration via config.yaml:
openrouter:
response_cache: true # default: on
response_cache_ttl: 300 # 1-86400 seconds
Changes:
- Add openrouter config section to DEFAULT_CONFIG (response_cache + TTL)
- Add build_or_headers() in auxiliary_client.py that builds attribution
headers plus optional cache headers based on config
- Replace inline _OR_HEADERS dicts with build_or_headers() at all 5 sites:
run_agent.py __init__, _apply_client_headers_for_base_url(), and
auxiliary_client.py _try_openrouter() + _to_async_client()
- Add _check_openrouter_cache_status() method to AIAgent that reads
X-OpenRouter-Cache-Status from streaming response headers and logs
HIT/MISS status
- Document in cli-config.yaml.example
- Add 28 tests (22 unit + 6 integration)
Ref: https://openrouter.ai/docs/guides/features/response-caching
Hello! I am the maintainer of the microsoft-teams-apps Python SDK and
I built this Teams adapter to integrate Microsoft Teams into Hermes.
Adds a `plugins/platforms/teams` platform plugin using the new
PlatformRegistry system from #17751. The adapter self-registers via
`register(ctx)` — no hardcoding in run.py, toolsets.py, or any
other core file.
Key features:
- Supports personal DMs, group chats, and channel posts
- Adaptive Card approval prompts with in-place button replacement
(Allow Once / Allow Session / Always Allow / Deny)
- aiohttp webhook server bridged from the Teams SDK to avoid
the fastapi/uvicorn dependency
- ConversationReference caching for correct proactive sends in
non-DM chats
- `interactive_setup()` for `hermes gateway setup` integration
- `platform_hint` for LLM context (Teams markdown subset)
- 34 tests covering adapter init, send, message handling, and
plugin registration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add opt-in terminal.docker_run_as_host_user config flag that passes
--user $(id -u):$(id -g) to the Docker backend so files written into
bind-mounted directories (/workspace, /root, docker_volumes entries) are
owned by the host user instead of root.
When enabled on POSIX platforms, also drops SETUID/SETGID caps since the
container no longer needs gosu/su to switch users. Falls back cleanly on
platforms without os.getuid (e.g. native Windows Docker) with a warning.
Wired through all three config.yaml -> TERMINAL_* env-var bridges:
- cli.py env_mappings (CLI + TUI startup)
- gateway/run.py _terminal_env_map (gateway / messaging platforms)
- hermes_cli/config.py _config_to_env_sync (`hermes config set`)
Also fixes docker_mount_cwd_to_workspace silently failing in gateway
mode -- it was missing from gateway/run.py's _terminal_env_map.
Adds tests/tools/test_terminal_config_env_sync.py to guard against
future drift between the three bridges (same bug class shipped twice
in one month).
Bundled Hermes image won't work with this flag since its entrypoint
expects to start as root for the usermod/gosu hermes flow; works with
the default nikolaik/python-nodejs image and plain Debian/Ubuntu.
This PR groups the TUI fixes that restore macOS Terminal usability and clean up the theme/composer regressions:
- copy transcript selections on macOS drag-release so Terminal.app users can copy while mouse tracking is enabled
- copy composer selections on macOS drag-release; composer selection is internal to TextInput and does not use the global Ink selection bus
- keep IDE Cmd+C forwarding setup macOS-only, and make keybinding conflict checks respect simple when-clause overlap/negation
- force truecolor before chalk initializes (unless NO_COLOR / FORCE_COLOR / HERMES_TUI_TRUECOLOR opt-outs apply) so the default banner keeps its gold/amber/bronze gradient in Terminal.app
- move TUI surfaces onto semantic theme tokens and preserve skin prompt symbols as bare tokens with renderer-owned spacing
- render focused placeholders as dim hint text in TTY mode instead of inverse/selected-looking synthetic cursor text
Enter while the agent is busy can now inject the typed text via /steer —
arriving at the agent after the next tool call — instead of interrupting
(current default) or queueing for the next turn.
Changes:
- cli.py: keybinding honors busy_input_mode='steer' by calling
agent.steer(text) on the UI thread (thread-safe), with automatic
fallback to 'queue' when the agent is missing, steer() is unavailable,
images are attached, or steer() rejects the payload. /busy accepts
'steer' as a fourth argument alongside queue/interrupt/status.
- gateway/run.py: busy-message handler and the PRIORITY running-agent
path both route through running_agent.steer() when the mode is 'steer',
with the same fallback-to-queue safety net. Ack wording tells users
their message was steered into the current run. Restart-drain queueing
now also activates for 'steer' so messages aren't lost across restarts.
- agent/onboarding.py: first-touch hint has a steer branch for both
CLI and gateway.
- hermes_cli/commands.py: /busy args_hint updated to include steer,
and 'steer' is registered as a subcommand (completions).
- hermes_cli/web_server.py: dashboard select widget offers steer.
- hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py:
inline docs updated.
- website/docs/user-guide/cli.md + messaging/index.md: documented.
- Tests: steer set/status path for /busy; onboarding hints;
_load_busy_input_mode accepts steer; busy-session ack exercises
steer success + two fallback-to-queue branches.
Requested on X by @CodingAcct.
Default is unchanged (interrupt).
Users who run `hermes setup` get `cli-config.yaml.example` copied verbatim
(including comments) to ~/.hermes/config.yaml. But several display settings
had thin comments that didn't enumerate the valid options, so users couldn't
tell from reading their config what values each key accepts.
- busy_input_mode: widen from 'CLI' to 'CLI and gateway platforms';
note /stop as gateway equivalent of Ctrl+C; add /busy_input_mode runtime hint
- compact, interim_assistant_messages, bell_on_complete, show_reasoning,
streaming: add true/false option lines showing effect of each value
- skin: refresh the built-in skin list (was missing daylight, warm-lightmode,
poseidon, sisyphus, charizard — 5 of 9 built-ins undocumented)
Subagents run inside a ThreadPoolExecutor. The CLI's interactive approval
callback lives in tools/terminal_tool.py's threading.local(), which worker
threads do not inherit. When a subagent hits a dangerous-command guard,
prompt_dangerous_approval() falls back to input() from the worker thread,
deadlocking against the parent's prompt_toolkit TUI that owns stdin.
Fix: install a non-interactive callback into every subagent worker thread
via ThreadPoolExecutor(initializer=set_approval_callback, initargs=(cb,)).
The callback is config-gated by delegation.subagent_auto_approve:
false (default) -> _subagent_auto_deny (safe; matches leaf tool blocklist)
true -> _subagent_auto_approve (opt-in YOLO for cron/batch)
Both emit a logger.warning audit line. Gateway sessions are unaffected
because they resolve approvals via tools/approval.py's per-session queue,
not through these TLS callbacks. Diagnosis credit: @MorAlekss (#14685).
- hermes_cli/config.py: DEFAULT_CONFIG.delegation.subagent_auto_approve: False
- cli-config.yaml.example: documented, commented (default)
- tools/delegate_tool.py: _subagent_auto_deny, _subagent_auto_approve,
_get_subagent_approval_callback, wired into the child timeout executor
- tests/tools/test_delegate.py: 7 tests covering defaults, truthy coercion,
and TLS scoping in the worker thread
Closes#11616.
The agent's API retry loop hardcoded max_retries = 3, so users with
fallback providers on flaky primaries burned through ~3 × provider
timeout (e.g. 3 × 180s = 9 minutes) before their fallback chain got a
chance to kick in.
Expose a new config key:
agent:
api_max_retries: 3 # default unchanged
Set it to 1 for fast failover when you have fallback providers, or
raise it if you prefer longer tolerance on a single provider. Values
< 1 are clamped to 1 (single attempt, no retry); non-integer values
fall back to the default.
This wraps the Hermes-level retry loop only — the OpenAI SDK's own
low-level retries (max_retries=2 default) still run beneath this for
transient network errors.
Changes:
- hermes_cli/config.py: add agent.api_max_retries default 3 with comment.
- run_agent.py: read self._api_max_retries in AIAgent.__init__; replace
hardcoded max_retries = 3 in the retry loop with self._api_max_retries.
- cli-config.yaml.example: documented example entry.
- hermes_cli/tips.py: discoverable tip line.
- tests/run_agent/test_api_max_retries_config.py: 4 tests covering
default, override, clamp-to-one, and invalid-value fallback.
Follow-up on helix4u's PR #14211:
- Flip default to true: narrowing toolsets=['web','browser'] expresses
'I want these extras', not 'silently strip MCP'. Parent MCP tools
(registered at runtime) should survive narrowing by default.
- Drop _config_version bump (22->23); additive nested key under
delegation.* is handled by _deep_merge, no migration needed.
- Update tests to reflect new default behavior.
Adds role='leaf'|'orchestrator' to delegate_task. With max_spawn_depth>=2,
an orchestrator child retains the 'delegation' toolset and can spawn its
own workers; leaf children cannot delegate further (identical to today).
Default posture is flat — max_spawn_depth=1 means a depth-0 parent's
children land at the depth-1 floor and orchestrator role silently
degrades to leaf. Users opt into nested delegation by raising
max_spawn_depth to 2 or 3 in config.yaml.
Also threads acp_command/acp_args through the main agent loop's delegate
dispatch (previously silently dropped in the schema) via a new
_dispatch_delegate_task helper, and adds a DelegateEvent enum with
legacy-string back-compat for gateway/ACP/CLI progress consumers.
Config (hermes_cli/config.py defaults):
delegation.max_concurrent_children: 3 # floor-only, no upper cap
delegation.max_spawn_depth: 1 # 1=flat (default), 2-3 unlock nested
delegation.orchestrator_enabled: true # global kill switch
Salvaged from @pefontana's PR #11215. Overrides vs. the original PR:
concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only,
no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which
silently enabled one level of orchestration for every user).
Co-authored-by: pefontana <fontana.pedro93@gmail.com>
Matches the default-config removal in the preceding commit.
default_toolsets was documented for users to set but was never actually
read at runtime, so showing it in the example config and the delegation
user guide was misleading.
No deprecation note is added: the key was always a no-op, so users who
copied it from the example continue to see no behavior change. Their
config.yaml still parses; the key is just silently unused, same as
before.
Part of Initiative 2 / M0.5.
Users can declare shell scripts in config.yaml under a hooks: block that
fire on plugin-hook events (pre_tool_call, post_tool_call, pre_llm_call,
subagent_stop, etc). Scripts receive JSON on stdin, can return JSON on
stdout to block tool calls or inject context pre-LLM.
Key design:
- Registers closures on existing PluginManager._hooks dict — zero changes
to invoke_hook() call sites
- subprocess.run(shell=False) via shlex.split — no shell injection
- First-use consent per (event, command) pair, persisted to allowlist JSON
- Bypass via --accept-hooks, HERMES_ACCEPT_HOOKS=1, or hooks_auto_accept
- hermes hooks list/test/revoke/doctor CLI subcommands
- Adds subagent_stop hook event fired after delegate_task children exit
- Claude Code compatible response shapes accepted
Cherry-picked from PR #13143 by @pefontana.
Smart model routing (auto-routing short/simple turns to a cheap model
across providers) was opt-in and disabled by default. This removes the
feature wholesale: the routing module, its config keys, docs, tests, and
the orchestration scaffolding it required in cli.py / gateway/run.py /
cron/scheduler.py.
The /fast (Priority Processing / Anthropic fast mode) feature kept its
hooks into _resolve_turn_agent_config — those still build a route dict
and attach request_overrides when the model supports it; the route now
just always uses the session's primary model/provider rather than
running prompts through choose_cheap_model_route() first.
Also removed:
- DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out
example sections in hermes_cli/config.py and cli-config.yaml.example
- _load_smart_model_routing() / self._smart_model_routing on GatewayRunner
- self._smart_model_routing / self._active_agent_route_signature on
HermesCLI (signature kept; just no longer initialised through the
smart-routing pipeline)
- route_label parameter on HermesCLI._init_agent (only set by smart
routing; never read elsewhere)
- 'Smart Model Routing' section in website/docs/integrations/providers.md
- tip in hermes_cli/tips.py
- entries in hermes_cli/dump.py + hermes_cli/web_server.py
- row in skills/autonomous-ai-agents/hermes-agent/SKILL.md
Tests:
- Deleted tests/agent/test_smart_model_routing.py
- Rewrote tests/agent/test_credential_pool_routing.py to target the
simplified _resolve_turn_agent_config directly (preserves credential
pool propagation + 429 rotation coverage)
- Dropped 'cheap model' test from test_cli_provider_resolution.py
- Dropped resolve_turn_route patches from cli + gateway test_fast_command
— they now exercise the real method end-to-end
- Removed _smart_model_routing stub assignments from gateway/cron test
helpers
Targeted suites: 74/74 in the directly affected test files;
tests/agent + tests/cron + tests/cli pass except 5 failures that
already exist on main (cron silent-delivery + alias quick-command).
AWS Bedrock paths (bedrock_converse + AnthropicBedrock SDK) use boto3
with its own timeout config and are not wired to the per-provider knob.
Documented in cli-config.yaml.example and website configuration.md so
users don't expect it to take effect there.
Live test with timeout_seconds: 0.5 on claude-sonnet-4.6 proved the
initial wiring was insufficient: run_agent.py was overriding the
client-level timeout on every call via hardcoded per-request kwargs.
Root cause: run_agent.py had two sites that pass an explicit timeout=
kwarg into chat.completions.create() — api_kwargs['timeout'] at line
7075 (HERMES_API_TIMEOUT=1800s default) and the streaming path's
_httpx.Timeout(..., read=HERMES_STREAM_READ_TIMEOUT=120s, ...) at line
5760. Both override the per-provider config value the client was
constructed with, so a 0.5s config timeout would silently not enforce.
This commit:
- Adds AIAgent._resolved_api_call_timeout() — config > HERMES_API_TIMEOUT env > 1800s default.
- Uses it for the non-streaming api_kwargs['timeout'] field.
- Uses it for the streaming path's httpx.Timeout(connect, read, write, pool)
so both connect and read respect the configured value when set.
Local-provider auto-bump (Ollama/vLLM cold-start) only applies when
no explicit config value is set.
- New test: test_resolved_api_call_timeout_priority covers all three
precedence cases (config, env, default).
Live verified: 0.5s config on claude-sonnet-4.6 now triggers
APITimeoutError at ~3s per retry, exhausts 3 retries in ~15s total
(was: 29-47s success with timeout ignored). Positive case (60s config
+ gpt-4o-mini) still succeeds at 1.3s.
Follow-up on top of mvanhorn's cherry-picked commit. Original PR only
wired request_timeout_seconds into the explicit-creds OpenAI branch at
run_agent.py init; router-based implicit auth, native Anthropic, and the
fallback chain were still hardcoded to SDK defaults.
- agent/anthropic_adapter.py: build_anthropic_client() accepts an optional
timeout kwarg (default 900s preserved when unset/invalid).
- run_agent.py: resolve per-provider/per-model timeout once at init; apply
to Anthropic native init + post-refresh rebuild + stale/interrupt
rebuilds + switch_model + _restore_primary_runtime + the OpenAI
implicit-auth path + _try_activate_fallback (with immediate client
rebuild so the first fallback request carries the configured timeout).
- tests: cover anthropic adapter kwarg honoring; widen mock signatures
to accept the new timeout kwarg.
- docs/example: clarify that the knob now applies to every transport,
the fallback chain, and rebuilds after credential rotation.
Adds optional providers.<id>.request_timeout_seconds and
providers.<id>.models.<model>.timeout_seconds config, resolved via a new
hermes_cli/timeouts.py helper and applied where client_kwargs is built
in run_agent.py. Zero default behavior change: when both keys are unset,
the openai SDK default takes over.
Mirrors the existing _get_task_timeout pattern in agent/auxiliary_client.py
for auxiliary tasks - the primary turn path just never got the equivalent
knob.
Cross-project demand: openclaw/openclaw#43946 (17 reactions) asks for
exactly this config - specifically calls out Ollama cold-start hanging
the client.
Follow-up on the native NVIDIA NIM provider salvage. The original PR wired
PROVIDER_REGISTRY + HERMES_OVERLAYS correctly but missed several touchpoints
required for full parity with other OpenAI-compatible providers (xai,
huggingface, deepseek, zai).
Gaps closed:
- hermes_cli/main.py:
- Add 'nvidia' to the _model_flow_api_key_provider dispatch tuple so
selecting 'NVIDIA NIM' in `hermes model` actually runs the api-key
provider flow (previously fell through silently).
- Add 'nvidia' to `hermes chat --provider` argparse choices so the
documented test command (`hermes chat --provider nvidia --model ...`)
parses successfully.
- hermes_cli/config.py: Register NVIDIA_API_KEY and NVIDIA_BASE_URL in
OPTIONAL_ENV_VARS so setup wizard can prompt for them and they're
auto-added to the subprocess env blocklist.
- hermes_cli/doctor.py: Add NVIDIA NIM row to `_apikey_providers` so
`hermes doctor` probes https://integrate.api.nvidia.com/v1/models.
- hermes_cli/dump.py: Add NVIDIA_API_KEY → 'nvidia' mapping for
`hermes dump` credential masking.
- tests/tools/test_local_env_blocklist.py: Extend registry_vars fixture
with NVIDIA_API_KEY to verify it's blocked from leaking into subprocesses.
- agent/model_metadata.py: Add 'nemotron' → 131072 context-length entry
so all Nemotron variants get 128K context via substring match (rather
than falling back to MINIMUM_CONTEXT_LENGTH).
- hermes_cli/models.py: Fix hallucinated model ID
'nvidia/nemotron-3-nano-8b-a4b' → 'nvidia/nemotron-3-nano-30b-a3b'
(verified against live integrate.api.nvidia.com/v1/models catalog).
Expand curated list from 5 to 9 agentic models mapping to OpenRouter
defaults per provider-guide convention: add qwen3.5-397b-a17b,
deepseek-v3.2, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-120b.
- cli-config.yaml.example: Document 'nvidia' provider option.
- scripts/release.py: Map asurla@nvidia.com → anniesurla in AUTHOR_MAP
for CI attribution.
E2E verified: `hermes chat --provider nvidia ...` now reaches NVIDIA's
endpoint (returns 401 with bogus key instead of argparse error);
`hermes doctor` detects NVIDIA NIM when NVIDIA_API_KEY is set.
- Rename platform from 'qq' to 'qqbot' across all integration points
(Platform enum, toolset, config keys, import paths, file rename qq.py → qqbot.py)
- Add PLATFORM_HINTS for QQBot in prompt_builder (QQ supports markdown)
- Set SUPPORTS_MESSAGE_EDITING = False to skip streaming on QQ
(prevents duplicate messages from non-editable partial + final sends)
- Add _send_qqbot() standalone send function for cron/send_message tool
- Add interactive _setup_qq() wizard in hermes_cli/setup.py
- Restore missing _setup_signal/email/sms/dingtalk/feishu/wecom/wecom_callback
functions that were lost during the original merge
Adds Arcee AI as a standard direct provider (ARCEEAI_API_KEY) with
Trinity models: trinity-large-thinking, trinity-large-preview, trinity-mini.
Standard OpenAI-compatible provider checklist: auth.py, config.py,
models.py, main.py, providers.py, doctor.py, model_normalize.py,
model_metadata.py, setup.py, trajectory_compressor.py.
Based on PR #9274 by arthurbr11, simplified to a standard direct
provider without dual-endpoint OpenRouter routing.
Remove the backward-compat code paths that read compression provider/model
settings from legacy config keys and env vars, which caused silent failures
when auto-detection resolved to incompatible backends.
What changed:
- Remove compression.summary_model, summary_provider, summary_base_url from
DEFAULT_CONFIG and cli.py defaults
- Remove backward-compat block in _resolve_task_provider_model() that read
from the legacy compression section
- Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper
functions (AUXILIARY_*/CONTEXT_* env var readers)
- Remove env var fallback chain for per-task overrides
- Update hermes config show to read from auxiliary.compression
- Add config migration (v16→17) that moves non-empty legacy values to
auxiliary.compression and strips the old keys
- Update example config and openclaw migration script
- Remove/update tests for deleted code paths
Compression model/provider is now configured exclusively via:
auxiliary.compression.provider / auxiliary.compression.model
Closes#8923
The check_interval parameter on terminal_tool sent periodic output
updates to the gateway chat, but these were display-only — the agent
couldn't see or act on them. This added schema bloat and introduced
a bug where notify_on_complete=True was silently dropped when
check_interval was also set (the not-check_interval guard skipped
fast-watcher registration, and the check_interval watcher dict
was missing the notify_on_complete key).
Removing check_interval entirely:
- Eliminates the notify_on_complete interaction bug
- Reduces tool schema size (one fewer parameter for the model)
- Simplifies the watcher registration path
- notify_on_complete (agent wake-on-completion) still works
- watch_patterns (output alerting) still works
- process(action='poll') covers manual status checking
Closes#7947 (root cause eliminated rather than patched).
Add display.interim_assistant_messages config (enabled by default) that
forwards completed assistant commentary between tool calls to the user
as separate chat messages. Models already emit useful status text like
'I'll inspect the repo first.' — this surfaces it on Telegram, Discord,
and other messaging platforms instead of swallowing it.
Independent from tool_progress and gateway streaming. Disabled for
webhooks. Uses GatewayStreamConsumer when available, falls back to
direct adapter send. Tracks response_previewed to prevent double-delivery
when interim message matches the final response.
Also fixes: cursor not stripped from fallback prefix in stream consumer
(affected continuation calculation on no-edit platforms like Signal).
Cherry-picked from PR #7885 by asheriif, default changed to enabled.
Fixes#5016
Cherry-picked from PR #7702 by kshitijk4poor.
Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models:
- mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest)
Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py,
main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py,
models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example.
Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's
main model. Non-vision aux uses the user's selected model. Added
_PROVIDER_VISION_MODELS dict for provider-specific vision model overrides.
On failure, falls back to aggregators (gemini flash) via existing fallback chain.
Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000,
mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000.
36 tests covering registry, aliases, auto-detect, credentials, models.dev,
normalization, URL mapping, providers module, doctor, aux client, vision
model override, and agent init.
Legacy flat stt.model config key (from cli-config.yaml.example and older
versions) was passed as a model override to transcribe_audio() by the
gateway, bypassing provider-specific model resolution. When the provider
was 'local' (faster-whisper), this caused:
ValueError: Invalid model size 'whisper-1'
Changes:
- gateway/run.py, discord.py: stop passing model override — let
transcribe_audio() handle provider-specific model resolution internally
- get_stt_model_from_config(): now provider-aware, reads from the correct
nested section (stt.local.model, stt.openai.model, etc.); ignores
legacy flat key for local provider to prevent model name mismatch
- cli-config.yaml.example: updated STT section to show nested provider
config structure instead of legacy flat key
- config migration v13→v14: moves legacy stt.model to the correct
provider section and removes the flat key
Reported by community user on Discord.
When the API returns "max_tokens too large given prompt" (input tokens
are within the context window, but input + requested output > window),
the old code incorrectly routed through the same handler as "prompt too
long" errors, calling get_next_probe_tier() and permanently halving
context_length. This made things worse: the window was fine, only the
requested output size needed trimming for that one call.
Two distinct error classes now handled separately:
Prompt too long — input itself exceeds context window.
Fix: compress history + halve context_length (existing behaviour,
unchanged).
Output cap too large — input OK, but input + max_tokens > window.
Fix: parse available_tokens from the error message, set a one-shot
_ephemeral_max_output_tokens override for the retry, and leave
context_length completely untouched.
Changes:
- agent/model_metadata.py: add parse_available_output_tokens_from_error()
that detects Anthropic's "available_tokens: N" error format and returns
the available output budget, or None for all other error types.
- run_agent.py: call the new parser first in the is_context_length_error
block; if it fires, set _ephemeral_max_output_tokens (with a 64-token
safety margin) and break to retry without touching context_length.
_build_api_kwargs consumes the ephemeral value exactly once then clears
it so subsequent calls use self.max_tokens normally.
- agent/anthropic_adapter.py: expand build_anthropic_kwargs docstring to
clearly document the max_tokens (output cap) vs context_length (total
window) distinction, which is a persistent source of confusion due to
the OpenAI-inherited "max_tokens" name.
- cli-config.yaml.example: add inline comments explaining both keys side
by side where users are most likely to look.
- website/docs/integrations/providers.md: add a callout box at the top
of "Context Length Detection" and clarify the troubleshooting entry.
- tests/test_ctx_halving_fix.py: 24 tests across four classes covering
the parser, build_anthropic_kwargs clamping, ephemeral one-shot
consumption, and the invariant that context_length is never mutated
on output-cap errors.