Gateway executor work now inherits the active session contextvars via
copy_context() so background process watchers retain the correct
platform/chat/user/session metadata for routing completion events back
to the originating chat.
Cherry-picked from #10647 by @helix4u with:
- Use asyncio.get_running_loop() instead of deprecated get_event_loop()
- Strip trailing whitespace
- Add *args forwarding test
- Add exception propagation test
Recomputes GitHub Copilot api_mode from the selected model in the
shared /model switch path. Before this change, Copilot could carry a
stale codex_responses mode forward from a GPT-5 selection into a later
Claude model switch, causing unsupported_api_for_model errors.
Cherry-picked from #10533 by @helix4u with:
- Comment specificity (Provider-specific → Copilot api_mode override)
- Fix pre-existing duplicate opencode-go in set literal
- Extract test mock helper to reduce duplication
- Add GPT-5 → GPT-5 regression test (keeps codex_responses)
In Telegram forum-enabled groups, the General topic does not include
message_thread_id in incoming messages (it is None). This caused:
1. Messages in General losing thread context — replies went to wrong place
2. Typing indicator failing because thread_id=1 was rejected by Telegram
Fix: synthesize thread_id="1" for forum groups when message_thread_id
is None, then handle it correctly per operation:
- send: omit message_thread_id (Telegram rejects thread_id=1 for sends)
- typing: pass thread_id=1, retry without it on "thread not found"
Also centralizes thread_id extraction into _metadata_thread_id() across
all send methods (send, send_voice, send_image, send_document, send_video,
send_animation, send_photo), replacing ~10 duplicate patterns.
Salvaged from PR #7892 by @corazzione.
Closes#7877, closes#7519.
When an MCP server returns errors consistently (crashed, disconnected,
auth expired), the model sees each error and retries the tool call.
With no circuit breaker, this burned through all 90 iterations — each
one a full LLM API call plus failed MCP call — producing 15-45 minutes
of zero useful output while the gateway inactivity timeout never fired
(because the agent WAS active, just uselessly).
Fix: track consecutive error counts per MCP server. After 3 consecutive
failures (connection errors, MCP-level errors, or transport exceptions),
the handler short-circuits with a message telling the model to stop
retrying and use alternative approaches. The counter resets to 0 on
any successful call.
Closes#10447
Expands the plugin interface so slash command handlers can dispatch tool
calls through the registry with parent agent context wired up automatically.
This is the public API for plugins that need to orchestrate tools like
delegate_task — they call ctx.dispatch_tool() instead of reaching into
framework internals. The parent agent is resolved lazily from _cli_ref
when available (CLI mode) and omitted in gateway mode (tools degrade
gracefully).
Enables the hermes-deliver-plugin pattern where /deliver and /fanout
slash commands spawn subagents via delegate_task without touching the
agent conversation loop.
7 new tests covering: registry delegation, parent_agent injection from
cli_ref, gateway mode (no cli_ref), uninitialized agent, explicit
parent_agent override, kwargs forwarding, return value passthrough.
* feat: implement register_command() on plugin context
Complete the half-built plugin slash command system. The dispatch
code in cli.py and gateway/run.py already called
get_plugin_command_handler() but the registration side was never
implemented.
Changes:
- Add register_command() to PluginContext — stores handler,
description, and plugin name; normalizes names; rejects conflicts
with built-in commands
- Add _plugin_commands dict to PluginManager
- Add commands_registered tracking on LoadedPlugin
- Add get_plugin_command_handler() and get_plugin_commands()
module-level convenience functions
- Fix commands.py to use actual plugin description in Telegram
bot menu (was hardcoded 'Plugin command')
- Add plugin commands to SlashCommandCompleter autocomplete
- Show command count in /plugins display
- 12 new tests covering registration, conflict detection,
normalization, handler dispatch, and introspection
Closes#10495
* docs: add register_command() to plugin guides
- Build a Plugin guide: new 'Register slash commands' section with
full API reference, comparison table vs register_cli_command(),
sync/async examples, and conflict protection docs
- Features/Plugins page: add slash commands to capabilities table
and plugin types summary
* docs: add missing pages to sidebar navigation
- guides/aws-bedrock → Guides & Tutorials
- user-guide/features/credential-pools → Integrations
When no provider was set in config.yaml and auto-detection found no
credentials, the agent silently fell back to bare OPENROUTER_API_KEY
from the environment and sent the configured model name to OpenRouter.
This produced undefined behavior -- wrong provider, wrong model routing,
and auxiliary tasks (compression, vision) hitting the wrong endpoint.
Fix: replace the silent fallback with a hard RuntimeError telling
the user to run hermes model or hermes setup. The provider must
be explicitly configured -- env vars are for secrets, not config.
Pass platform_env_var="TELEGRAM_PROXY" to resolve_proxy_url() in both
telegram.py (main connect) and telegram_network.py (fallback transport),
so a Telegram-specific proxy takes priority over the generic HTTPS_PROXY.
Also bridge telegram.proxy_url from config.yaml to the TELEGRAM_PROXY
env var (env var takes precedence if both are set), add OPTIONAL_ENV_VARS
entry, docs, and tests.
Composite salvage of four community PRs:
- Core approach (both call sites): #9414 by @leeyang1990
- config.yaml bridging + docs: #6530 by @WhiteWorld
- Naming convention: #9074 by @brantzh6
- Earlier proxy work: #7786 by @ten-ltw
Closes#9414, closes#9074, closes#7786, closes#6530
Co-authored-by: WhiteWorld <WhiteWorld@users.noreply.github.com>
Co-authored-by: brantzh6 <brantzh6@users.noreply.github.com>
Co-authored-by: ten-ltw <ten-ltw@users.noreply.github.com>
Salvaged from PR #10643 by kshitijk4poor, updated for current main.
Root causes fixed:
1. Telegram xdist mock pollution — new tests/gateway/conftest.py with shared
mock that runs at collection time (prevents ChatType=None caching)
2. VIRTUAL_ENV env var leak — monkeypatch.delenv in _detect_venv_dir tests
3. Copilot base_url missing — add fallback in _resolve_runtime_from_pool_entry
4. Stale vision model assertion — zai now uses glm-5v-turbo
5. Reasoning item id intentionally stripped — assert 'id' not in (store=False)
6. Context length warning unreachable — pass base_url to AIAgent in test
7. Kimi provider label updated — 'Kimi / Kimi Coding Plan' matches models.py
8. Google Workspace calendar tests — rewritten for current production code,
properly mock subprocess on api_module, removed stale +agenda assertions
9. Credential pool auto-seeding — mock _select_pool_entry / _resolve_auto /
_import_codex_cli_tokens to prevent real credentials from leaking into tests
* feat: implement register_command() on plugin context
Complete the half-built plugin slash command system. The dispatch
code in cli.py and gateway/run.py already called
get_plugin_command_handler() but the registration side was never
implemented.
Changes:
- Add register_command() to PluginContext — stores handler,
description, and plugin name; normalizes names; rejects conflicts
with built-in commands
- Add _plugin_commands dict to PluginManager
- Add commands_registered tracking on LoadedPlugin
- Add get_plugin_command_handler() and get_plugin_commands()
module-level convenience functions
- Fix commands.py to use actual plugin description in Telegram
bot menu (was hardcoded 'Plugin command')
- Add plugin commands to SlashCommandCompleter autocomplete
- Show command count in /plugins display
- 12 new tests covering registration, conflict detection,
normalization, handler dispatch, and introspection
Closes#10495
* docs: add register_command() to plugin guides
- Build a Plugin guide: new 'Register slash commands' section with
full API reference, comparison table vs register_cli_command(),
sync/async examples, and conflict protection docs
- Features/Plugins page: add slash commands to capabilities table
and plugin types summary
Complete the half-built plugin slash command system. The dispatch
code in cli.py and gateway/run.py already called
get_plugin_command_handler() but the registration side was never
implemented.
Changes:
- Add register_command() to PluginContext — stores handler,
description, and plugin name; normalizes names; rejects conflicts
with built-in commands
- Add _plugin_commands dict to PluginManager
- Add commands_registered tracking on LoadedPlugin
- Add get_plugin_command_handler() and get_plugin_commands()
module-level convenience functions
- Fix commands.py to use actual plugin description in Telegram
bot menu (was hardcoded 'Plugin command')
- Add plugin commands to SlashCommandCompleter autocomplete
- Show command count in /plugins display
- 12 new tests covering registration, conflict detection,
normalization, handler dispatch, and introspection
Closes#10495
atomic_yaml_write() and atomic_json_write() used tempfile.mkstemp()
which creates files with 0o600 (owner-only). After os.replace(), the
original file's permissions were destroyed. Combined with _secure_file()
forcing 0o600, this broke Docker/NAS setups where volume-mounted config
files need broader permissions (e.g. 0o666).
Changes:
- atomic_yaml_write/atomic_json_write: capture original permissions
before write, restore after os.replace()
- _secure_file: skip permission tightening in container environments
(detected via /.dockerenv, /proc/1/cgroup, or HERMES_SKIP_CHMOD env)
- save_env_value: preserve original .env permissions, remove redundant
third os.chmod call
- remove_env_value: same permission preservation
On desktop installs, _secure_file() still tightens to 0o600 as before.
In containers, the user's original permissions are respected.
Reported by Cedric Weber (Docker/Portainer on NAS).
The command preview and description were wrapped in Markdown v1 inline
code (backticks) without escaping, causing Telegram API parse errors
when the command itself contained backticks or asterisks.
Fixes: 'Can't parse entities: can't find end of the entity'
Wrap the TelegramAdapter import in _send_to_platform() with a try/except
ImportError guard, matching the existing Feishu pattern in the same function.
When python-telegram-bot is not installed, the import no longer crashes the
cron scheduler. Instead, MAX_MESSAGE_LENGTH falls back to a hardcoded 4096.
The _send_telegram() function already had its own ImportError guard for the
telegram package; this fixes the remaining bare import of TelegramAdapter
in the platform-routing function.
The setup wizard accepted any string as a Telegram bot token without
validation. Invalid tokens were only caught at runtime when the gateway
failed to connect, with no clear error message.
Add regex validation for the expected format (<numeric_id>:<hash>) and
loop until a valid token is entered or the user cancels.
Telegram on iOS auto-converts double hyphens (--) to em dashes (—)
or en dashes (–) via autocorrect. This breaks /model flag parsing
since parse_model_flags() only recognizes literal '--provider' and
'--global'.
When the flag isn't parsed, the entire string (e.g. 'glm-5.1 —provider zai')
gets treated as the model name and fails with 'Model names cannot
contain spaces.'
Fix: normalize Unicode dashes (U+2012-U+2015) to '--' when they
appear before flag keywords (provider, global), before flag extraction.
The existing test suite in test_model_switch_provider_routing.py
already covers all four dash variants — this commit adds the code
that makes them pass.
Replace inline Path.home() / '.hermes' / 'profiles' detection in both CLI
and gateway /profile handlers with the existing get_active_profile_name()
from hermes_cli.profiles — which already handles custom-root deployments,
standard profiles, and Docker layouts.
Fixes /profile incorrectly reporting 'default' when HERMES_HOME points to
a custom-root profile path like /opt/data/profiles/coder.
Based on PR #10484 by Xowiek.
Text-only Matrix sends should continue using the lightweight _send_matrix()
HTTP helper (~100ms). Only route through the heavy MatrixAdapter (full sync +
E2EE setup) when media files are present. Adds test verifying text-only
messages don't take the adapter path.
Matrix media delivery was silently dropped by send_message because Matrix
wasn't wired into the native adapter-backed media path. Only Telegram,
Discord, and Weixin had native media support.
Adds _send_matrix_via_adapter() which creates a MatrixAdapter instance,
connects, sends text + media via the adapter's native upload methods
(send_document, send_image_file, send_video, send_voice), then disconnects.
Also fixes a stale URL-encoding assertion in test_send_message_missing_platforms
that broke after PR #10151 added quote() to room IDs.
Cherry-picked from PR #10486 by helix4u.
Three independent fixes batched together:
1. hermes auth add crashes on non-interactive stdin (#10468)
input() for the label prompt was called without checking isatty().
In scripted/CI environments this raised EOFError. Fix: check
sys.stdin.isatty() and fall back to the computed default label.
2. Subcommand help prints twice (#10230)
'hermes dashboard -h' printed help text twice because the
SystemExit(0) from argparse was caught by the fallback retry
logic, which re-parsed and printed help again. Fix: re-raise
SystemExit with code 0 (help/version) immediately.
3. Duplicate entries in /model picker (#10526, #9545)
- Kimi showed 2x because kimi-coding and kimi-coding-cn both
mapped to the same models.dev ID. Fix: track seen mdev_ids
and skip aliases.
- Providers could show 2-3x from case-variant slugs across the
four loading paths. Fix: normalize all seen_slugs membership
checks and insertions to lowercase.
Closes#10468, #10230, #10526, #9545
bash -lic with a PTY enables job control (set -m), which waits for all
background jobs before the shell exits. A command like
`python3 -m http.server &>/dev/null &` hangs forever because the shell
never completes.
Prefix `set +m;` to disable job control while keeping -i for .bashrc
sourcing and PTY for interactive tools.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Background review notifications ("💾 Skill created", "💾 Memory updated")
could race ahead of the main assistant reply in chat, making it look like
the agent stopped after creating a skill.
Gate bg-review notifications behind a threading.Event + pending queue.
Register a release callback on the adapter's _post_delivery_callbacks dict
so base.py's finally block fires it after the main response is delivered.
The queued-message path in _run_agent pops and calls the callback directly
to prevent double-fire.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Closes#10541
WecomCallbackAdapter declared a _seen_messages dict and
MESSAGE_DEDUP_TTL_SECONDS constant but never actually checked
them in _handle_callback(). WeCom retries callback deliveries
on timeout, and each retry with the same MsgId was treated as
a fresh message and queued for processing.
Fix: check _seen_messages before enqueuing. Uses the same TTL-
based pattern as MessageDeduplicator (fixed in #10306) — check
age before returning duplicate, prune on overflow.
Closes#10305
_load_skill_payload() reconstructed skill_dir as SKILLS_DIR / relative_path,
which is wrong for external skills from skills.external_dirs — they live
outside SKILLS_DIR entirely. Scripts and linked files failed to load.
Fix: skill_view() now includes the absolute skill_dir in its result dict.
_load_skill_payload() uses that directly when available, falling back to
the SKILLS_DIR-relative reconstruction only for legacy responses.
Closes#10313
Add "HERMES_" to _SAFE_ENV_PREFIXES in code_execution_tool.py so HERMES_HOME and other Hermes env vars pass through to execute_code subprocesses. Fixes vision_analyze and other tools that rely on get_hermes_home() failing in Docker environments with non-default HERMES_HOME.
Authored by @shin4.
* fix: show correct env var name in provider API key error (#9506)
The error message for missing provider API keys dynamically built
the env var name as PROVIDER_API_KEY (e.g. ALIBABA_API_KEY), but
some providers use different names (alibaba uses DASHSCOPE_API_KEY).
Users following the error message set the wrong variable.
Fix: look up the actual env var from PROVIDER_REGISTRY before
building the error. Falls back to the dynamic name if the registry
lookup fails.
Closes#9506
* fix: five HERMES_HOME profile-isolation leaks (#5947)
Bug A: Thread session_title from session_db to memory provider init kwargs
so honcho can derive chat-scoped session keys instead of falling back to
cwd-based naming that merges all gateway users into one session.
Bug B: Replace 14 hardcoded ~/.hermes/skills/ paths across 10 skill files
with HERMES_HOME-aware alternatives (${HERMES_HOME:-$HOME/.hermes} in
shell, os.environ.get('HERMES_HOME', ...) in Python).
Bug C: install.sh now respects HERMES_HOME env var and adds --hermes-home
flag. Previously --dir only set INSTALL_DIR while HERMES_HOME was always
hardcoded to $HOME/.hermes.
Bug D: Remove hardcoded ~/.hermes/honcho.json fallback in resolve_config_path().
Non-default profiles no longer silently inherit the default profile's honcho
config. Falls through to ~/.honcho/config.json (global) instead.
Bug E: Guard _edit_skill, _patch_skill, _delete_skill, _write_file, and
_remove_file against writing to skills found in external_dirs. Skills
outside the local SKILLS_DIR are now read-only from the agent's perspective.
Closes#5947
procps-ng 4.0.4 in Docker rejects BSD-style 'ps eww -ax' with a
'must set personality' error, causing find_gateway_pids() to return
empty and falsely report the gateway as not running.
Fix: replace 'ps eww -ax' with 'ps -A eww'. -A is the POSIX
equivalent of BSD -ax (select all processes), and the eww modifiers
(show environment + wide output) still work as BSD flags alongside
the POSIX -A flag. This preserves the HERMES_HOME= environment
visibility needed for profile-aware PID matching.
Closes#9723
When Nous returns a 429, the retry amplification chain burns up to 9
API requests per conversation turn (3 SDK retries × 3 Hermes retries),
each counting against RPH and deepening the rate limit. With multiple
concurrent sessions (cron + gateway + auxiliary), this creates a spiral
where retries keep the limit tapped indefinitely.
New module: agent/nous_rate_guard.py
- Shared file-based rate limit state (~/.hermes/rate_limits/nous.json)
- Parses reset time from x-ratelimit-reset-requests-1h, x-ratelimit-
reset-requests, retry-after headers, or error context
- Falls back to 5-minute default cooldown if no header data
- Atomic writes (tempfile + rename) for cross-process safety
- Auto-cleanup of expired state files
run_agent.py changes:
- Top-of-retry-loop guard: when another session already recorded Nous
as rate-limited, skip the API call entirely. Try fallback provider
first, then return a clear message with the reset time.
- On 429 from Nous: record rate limit state and skip further retries
(sets retry_count = max_retries to trigger fallback path)
- On success from Nous: clear the rate limit state so other sessions
know they can resume
auxiliary_client.py changes:
- _try_nous() checks rate guard before attempting Nous in the auxiliary
fallback chain. When rate-limited, returns (None, None) so the chain
skips to the next provider instead of piling more requests onto Nous.
This eliminates three sources of amplification:
1. Hermes-level retries (saves 6 of 9 calls per turn)
2. Cross-session retries (cron + gateway all skip Nous)
3. Auxiliary fallback to Nous (compression/session_search skip too)
Includes 24 tests covering the rate guard module, header parsing,
state lifecycle, and auxiliary client integration.
Extract resolve_channel_prompt() shared helper into
gateway/platforms/base.py. Refactor Discord to use it.
Wire channel_prompts into Telegram (groups + forum topics),
Slack (channels), and Mattermost (channels).
Config bridging now applies to all platforms (not just Discord).
Added channel_prompts defaults to telegram/slack/mattermost
config sections.
Docs added to all four platform pages with platform-specific
examples (topic inheritance for Telegram, channel IDs for Slack,
etc.).
Move _ensure_discord_mock() from module level to _make_adapter() so it
doesn't poison sys.modules for other discord test files. Use
types.ModuleType instead of MagicMock for the mock module to avoid
auto-generated __file__ attribute confusing hasattr checks.
Add BrennerSpear to AUTHOR_MAP.
- Remove double str() normalization in _resolve_channel_prompt since
config bridging already handles numeric YAML key conversion
- Remove dead prompts.get(str(key)) fallback that could never match
after keys were already normalized to strings
- Replace getattr(event, "channel_prompt", None) with direct attribute
access since channel_prompt is a declared dataclass field
- Update test to verify normalization responsibility lives in config bridging
The error message for missing provider API keys dynamically built
the env var name as PROVIDER_API_KEY (e.g. ALIBABA_API_KEY), but
some providers use different names (alibaba uses DASHSCOPE_API_KEY).
Users following the error message set the wrong variable.
Fix: look up the actual env var from PROVIDER_REGISTRY before
building the error. Falls back to the dynamic name if the registry
lookup fails.
Closes#9506
The GPT-5 auto-upgrade logic unconditionally overrode api_mode to
codex_responses for any model starting with gpt-5, even when the
user explicitly set api_mode=chat_completions. Custom proxies that
serve GPT-5 via /chat/completions became unusable.
Fix: check api_mode is None before the override fires. If the caller
passed any explicit api_mode, it is final -- no auto-upgrade.
Closes#10473