Commit graph

435 commits

Author SHA1 Message Date
Teknium
0f597dd127
fix: STT provider-model mismatch — whisper-1 fed to faster-whisper (#7113)
Legacy flat stt.model config key (from cli-config.yaml.example and older
versions) was passed as a model override to transcribe_audio() by the
gateway, bypassing provider-specific model resolution. When the provider
was 'local' (faster-whisper), this caused:
  ValueError: Invalid model size 'whisper-1'

Changes:
- gateway/run.py, discord.py: stop passing model override — let
  transcribe_audio() handle provider-specific model resolution internally
- get_stt_model_from_config(): now provider-aware, reads from the correct
  nested section (stt.local.model, stt.openai.model, etc.); ignores
  legacy flat key for local provider to prevent model name mismatch
- cli-config.yaml.example: updated STT section to show nested provider
  config structure instead of legacy flat key
- config migration v13→v14: moves legacy stt.model to the correct
  provider section and removes the flat key

Reported by community user on Discord.
2026-04-10 03:27:30 -07:00
donrhmexe
a2f46e4665 fix: include custom_providers in /model command listings and resolution
Custom providers defined in config.yaml under  were
completely invisible to the /model command in both gateway (Telegram,
Discord, etc.) and CLI. The provider listing skipped them and explicit
switching via --provider failed with "Unknown provider".

Root cause: gateway/run.py, cli.py, and model_switch.py only read the
 dict from config, ignoring  entirely.

Changes:
- providers.py: add resolve_custom_provider() and extend
  resolve_provider_full() to check custom_providers after user_providers
- model_switch.py: propagate custom_providers through switch_model(),
  list_authenticated_providers(), and get_authenticated_provider_slugs();
  add custom provider section to provider listings
- gateway/run.py: read custom_providers from config, pass to all
  model-switch calls
- cli.py: hoist config loading, pass custom_providers to listing and
  switch calls

Tests: 4 new regression tests covering listing, resolution, and gateway
command handler. All 71 tests pass.
2026-04-10 03:07:00 -07:00
kshitijk4poor
51d826f889 fix(gateway): apply /model session overrides so switch persists across messages
The gateway /model command stored session overrides in
_session_model_overrides but run_sync() never consulted them when
resolving the model and runtime for the next message.  It always read
from config.yaml, so the switch was lost as soon as a new agent was
created.

Two fixes:

1. In run_sync(), apply _session_model_overrides after resolving from
   config.yaml/env — the override takes precedence for model, provider,
   api_key, base_url, and api_mode.

2. In post-run fallback detection, check whether the model mismatch
   (agent.model != config_model) is due to an intentional /model switch
   before evicting the cached agent.  Without this, the first message
   after /model would work (cached agent reused) but the fallback
   detector would evict it, causing the next message to revert.

Affects all gateway platforms (Telegram, Discord, Slack, WhatsApp,
Signal, Matrix, BlueBubbles, HomeAssistant) since they all share
GatewayRunner._run_agent().

Fixes #6213
2026-04-10 02:58:42 -07:00
Teknium
6da952bc50
fix(gateway): /usage now shows rate limits, cost, and token details between turns (#7038)
The gateway /usage handler only looked in _running_agents for the agent
object, which is only populated while the agent is actively processing a
message. Between turns (when users actually type /usage), the dict is
empty and the handler fell through to a rough message-count estimate.

The agent object actually lives in _agent_cache between turns (kept for
prompt caching). This fix checks both dicts, with _running_agents taking
priority (mid-turn) and _agent_cache as the between-turns fallback.

Also brings the gateway output to parity with the CLI /usage:
- Model name
- Detailed token breakdown (input, output, cache read, cache write)
- Cost estimation (estimated amount or 'included' for subscriptions)
- Cache token lines hidden when zero (cleaner output)

This fixes Nous Portal rate limit headers not showing up for gateway
users — the data was being captured correctly but the handler could
never see it.
2026-04-10 02:33:01 -07:00
Teknium
1780ad24b1 fix: normalize remaining reasoning effort orderings and add missing 'minimal'
Follow-up to cherry-picked PR #6698. Fixes spots the original PR missed:
- hermes_constants.py: VALID_REASONING_EFFORTS tuple ordering
- gateway/run.py: _load_reasoning_config docstring + validation tuple
- configuration.md and batch-processing.md: docs ordering
- hermes-agent skill: /reasoning usage hint was missing 'minimal'
2026-04-09 14:20:16 -07:00
Greer Guthrie
775a46ce75 fix: normalize reasoning effort ordering in UI 2026-04-09 14:20:16 -07:00
Teknium
8dfc96dbbb
feat: capture provider rate limit headers and show in /usage (#6541)
Parse x-ratelimit-* headers from inference API responses (Nous Portal,
OpenRouter, OpenAI-compatible) and display them in the /usage command.

- New agent/rate_limit_tracker.py: parse 12 rate limit headers (RPM/RPH/
  TPM/TPH limits, remaining, reset timers), format as progress bars (CLI)
  or compact one-liner (gateway)
- Hook into streaming path in run_agent.py: stream.response.headers is
  available on the OpenAI SDK Stream object before chunks are consumed
- CLI /usage: appends rate limit section with progress bars + warnings
  when any bucket exceeds 80%
- Gateway /usage: appends compact rate limit summary
- 24 unit tests covering parsing, formatting, edge cases

Headers captured per response:
  x-ratelimit-{limit,remaining,reset}-{requests,tokens}{,-1h}

Example CLI display:
  Nous Rate Limits (captured just now):
    Requests/min [░░░░░░░░░░░░░░░░░░░░]  0.1%  1/800 used  (799 left, resets in 59s)
    Tokens/hr    [░░░░░░░░░░░░░░░░░░░░]  0.0%  49/336.0M   (336.0M left, resets in 52m)
2026-04-09 03:43:14 -07:00
Teknium
d97f6cec7f
feat(gateway): add BlueBubbles iMessage platform adapter (#6437)
Adds Apple iMessage as a gateway platform via BlueBubbles macOS server.

Architecture:
- Webhook-based inbound (event-driven, no polling/dedup needed)
- Email/phone → chat GUID resolution for user-friendly addressing
- Private API safety (checks helper_connected before tapback/typing)
- Inbound attachment downloading (images, audio, documents cached locally)
- Markdown stripping for clean iMessage delivery
- Smart progress suppression for platforms without message editing

Based on PR #5869 by @benjaminsehl (webhook architecture, GUID resolution,
Private API safety, progress suppression) with inbound attachment downloading
from PR #4588 by @1960697431 (attachment cache routing).

Integration points: Platform enum, env config, adapter factory, auth maps,
cron delivery, send_message routing, channel directory, platform hints,
toolset definition, setup wizard, status display.

27 tests covering config, adapter, webhook parsing, GUID resolution,
attachment download routing, toolset consistency, and prompt hints.
2026-04-08 23:54:03 -07:00
xingkongliang
1d8d4f28ae fix(gateway): prevent background process notifications from triggering false pairing requests
When a background process with notify_on_complete=True finishes, the
gateway injects a synthetic MessageEvent to notify the session. This
event was constructed without user_id, causing _is_user_authorized()
to reject it and — for DM-origin sessions — trigger the pairing flow,
sending "Hi~ I don't recognize you yet!" with a pairing code to the
chat owner.

Add an `internal` flag to MessageEvent that bypasses authorization
checks for system-generated synthetic events. Only the process watcher
sets this flag; no external/adapter code path can produce it.

Includes 4 regression tests covering the fix and the normal pairing path.
2026-04-08 23:01:04 -07:00
Teknium
af4abd2f22 fix: correct unbound exception variable and remaining-time math in warning
- Bind exception in warning send handler (was using stale _ne from outer scope)
- Calculate remaining time until timeout correctly: (timeout - warning) // 60
  instead of warning // 60 (which equals elapsed time, not remaining)
2026-04-08 20:01:06 -07:00
Helmi
092061711e fix(gateway): add staged inactivity warning before timeout escalation
Introduce gateway_timeout_warning (default 900s) as a pre-timeout alert
layer.  When inactivity reaches the warning threshold, a single
notification is sent to the user offering to wait or reset.  If
inactivity continues to the gateway_timeout (default 1800s), the full
timeout fires as before.

This gives users a chance to intervene before work is lost on slow
API providers without disabling the safety timeout entirely.

Config: agent.gateway_timeout_warning in config.yaml, or
HERMES_AGENT_TIMEOUT_WARNING env var (0 = disable warning).
2026-04-08 20:01:06 -07:00
Teknium
e26393ffc2
fix: Signal duplicate replies with streaming + per-platform tool_progress (#6348)
Fixes #4647 — Signal replies duplicated when gateway streaming is enabled.

Root cause: stream_consumer.py did not handle the case where send() returns
success=True but no message_id (Signal behavior). Every stream delta produced
a separate send() call (7+ messages instead of 2), plus the gateway sent
another full duplicate since already_sent was never set.

Changes:
- stream_consumer.py: Add elif branch for success-without-message_id — enters
  fallback mode (sets already_sent, disables editing, sends only continuation)
- signal.py send(): Extract timestamp from signal-cli RPC result as message_id
  so stream consumer follows normal edit→fallback path
- signal.py: Add public stop_typing() delegating to _stop_typing_indicator()
  so base adapter's _keep_typing finally block can clean up typing tasks
- gateway/run.py: Per-platform tool_progress_overrides (#6164) — lets users
  set e.g. signal: off while keeping telegram: all
- hermes_cli/config.py: Add tool_progress_overrides to DEFAULT_CONFIG

Refs: #4647, #6164
2026-04-08 17:39:45 -07:00
Teknium
ab21fbfd89 fix: add gateway coverage for session boundary hooks, move test to tests/cli/
- Fire on_session_finalize and on_session_reset in gateway _handle_reset_command()
- Fire on_session_finalize during gateway stop() for each active agent
- Move CLI test from tests/ root to tests/cli/ (matches recent restructure)
- Add 5 gateway tests covering reset hooks, ordering, shutdown, and error handling
- Place on_session_reset after new session is guaranteed to exist (covers
  the get_or_create_session fallback path)
2026-04-08 04:27:34 -07:00
Teknium
30ea423ce8
fix: unify reasoning_effort to config.yaml only, remove HERMES_REASONING_EFFORT env var
Gateway and cron had inconsistent reasoning_effort resolution:
- CLI: config.yaml only (correct)
- Gateway: config.yaml first, env var fallback
- Cron: env var first, config.yaml fallback

All three now read exclusively from agent.reasoning_effort in config.yaml.
Removed HERMES_REASONING_EFFORT env var support entirely — .env is for
secrets only, not behavioral config.
2026-04-08 03:36:44 -07:00
Marc Bickel
25080986a0 fix(gateway): discard empty placeholder when voice transcription succeeds
When a Discord voice message arrives, the adapter sets event.text to
"(The user sent a message with no text content)" since voice messages
have no text content. The transcription enrichment in
_enrich_message_with_transcription() then prepends the transcript but
leaves the placeholder intact, causing the agent to receive both:

    [The user sent a voice message~ Here's what they said: "..."]

    (The user sent a message with no text content)

The agent sees this as two separate user turns — one transcribed
and one empty — creating confusing duplicate messages.

Fix: when the transcription succeeds and user_text is only the empty
placeholder, return just the transcript without the redundant placeholder.
2026-04-07 17:59:16 -07:00
Zainan Victor Zhou
0d41fb0827 fix(gateway): show full session id and title in /status 2026-04-07 17:27:09 -07:00
Teknium
99ff375f7a
fix(gateway): respect tool_preview_length in all/new progress modes (#5937)
Previously, all/new tool progress modes always hard-truncated previews
to 40 chars, ignoring the display.tool_preview_length config. This made
it impossible for gateway users to see meaningful command/path info
without switching to verbose mode (which shows too much detail).

Now all/new modes read tool_preview_length from config:
- tool_preview_length: 0 (default/unset) → 40 chars (no regression)
- tool_preview_length: 120 → 120-char previews in all/new mode
- verbose mode: unchanged (already respected the config)

Users who want longer previews can set:
  display:
    tool_preview_length: 120

Reported by demontut_ on Discord.
2026-04-07 14:10:56 -07:00
Teknium
125e5ef089 fix: extend caption substring fix to all platforms
Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)

The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).
2026-04-07 14:08:59 -07:00
Teknium
69c753c19b
fix: thread gateway user_id to memory plugins for per-user scoping (#5895)
Memory plugins (Mem0, Honcho) used static identifiers ('hermes-user',
config peerName) meaning all gateway users shared the same memory bucket.

Changes:
- AIAgent.__init__: add user_id parameter, store as self._user_id
- run_agent.py: include user_id in _init_kwargs passed to memory providers
- gateway/run.py: pass source.user_id to AIAgent in primary + background paths
- Mem0 plugin: prefer kwargs user_id over config default
- Honcho plugin: override cfg.peer_name with gateway user_id when present

CLI sessions (user_id=None) preserve existing defaults. Only gateway
sessions with a real platform user_id get per-user memory scoping.

Reported by plev333.
2026-04-07 11:14:12 -07:00
Teknium
ab0c1e58f1
fix: pause typing indicator during approval waits (#5893)
When the agent waits for dangerous-command approval, the typing
indicator (_keep_typing loop) kept refreshing. On Slack's Assistant
API this is critical: assistant_threads_setStatus disables the
compose box, preventing users from typing /approve or /deny.

- Add _typing_paused set + pause/resume methods to BasePlatformAdapter
- _keep_typing skips send_typing when chat_id is paused
- _approval_notify_sync pauses typing before sending approval prompt
- _handle_approve_command / _handle_deny_command resume typing after

Benefits all platforms — no reason to show 'is thinking...' while
the agent is idle waiting for human input.
2026-04-07 11:04:50 -07:00
Teknium
d0ffb111c2
refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821)
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.

Changes by category:

Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
  (vs availability checks which were left alone)

Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
  source, new_server_names, verify, pconfig, default_terminal,
  result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
  (12 instances of add_parser() where result was never used)

Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
  surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
  module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction

Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports

Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
  they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values

Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
  x.startswith('a') or x.startswith('b') consolidated to
  x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
  truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
  send_message_tool.py

Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls

Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
2026-04-07 10:25:31 -07:00
Teknium
caded0a5e7
fix: repair 57 failing CI tests across 14 files (#5823)
* fix: repair 57 failing CI tests across 14 files

Categories of fixes:

**Test isolation under xdist (-n auto):**
- test_hermes_logging: Strip ALL RotatingFileHandlers before each test
  to prevent handlers leaked from other xdist workers from polluting counts
- test_code_execution: Force TERMINAL_ENV=local in setUp — prevents Modal
  AuthError when another test leaks TERMINAL_ENV=modal
- test_timezone: Same TERMINAL_ENV fix for execute_code timezone tests
- test_codex_execution_paths: Mock _resolve_turn_agent_config to ensure
  model resolution works regardless of xdist worker state

**Matrix adapter tests (nio not installed in CI):**
- Add _make_fake_nio() helper with real response classes for isinstance()
  checks in production code
- Replace MagicMock(spec=nio.XxxResponse) with fake_nio instances
- Wrap production method calls with patch.dict('sys.modules', {'nio': ...})
  so import nio succeeds in method bodies
- Use try/except instead of pytest.importorskip for nio.crypto imports
  (importorskip can be fooled by MagicMock in sys.modules)
- test_matrix_voice: Skip entire file if nio is a mock, not just missing

**Stale test expectations:**
- test_cli_provider_resolution: _prompt_provider_choice now takes **kwargs
  (default param added); mock getpass.getpass alongside input
- test_anthropic_oauth_flow: Mock getpass.getpass (code switched from input)
- test_gemini_provider: Mock models.dev + OpenRouter API lookups to test
  hardcoded defaults without external API variance
- test_code_execution: Add notify_on_complete to blocked terminal params
- test_setup_openclaw_migration: Mock prompt_choice to select 'Full setup'
  (new quick-setup path leads to _require_tty → sys.exit in CI)
- test_skill_manager_tool: Patch get_all_skills_dirs alongside SKILLS_DIR
  so _find_skill searches tmp_path, not real ~/.hermes/skills/

**Missing attributes in object.__new__ test runners:**
- test_platform_reconnect: Add session_store to _make_runner()
- test_session_race_guard: Add hooks, _running_agents_ts, session_store,
  delivery_router to _make_runner()

**Production bug fix (gateway/run.py):**
- Fix sentinel eviction race: _AGENT_PENDING_SENTINEL was immediately
  evicted by the stale-detection logic because sentinels have no
  get_activity_summary() method, causing _stale_idle=inf >= timeout.
  Guard _should_evict with 'is not _AGENT_PENDING_SENTINEL'.

* fix: address remaining CI failures

- test_setup_openclaw_migration: Also mock _offer_launch_chat (called at
  end of both quick and full setup paths)
- test_code_execution: Move TERMINAL_ENV=local to module level to protect
  ALL test classes (TestEnvVarFiltering, TestExecuteCodeEdgeCases,
  TestInterruptHandling, TestHeadTailTruncation) from xdist env leaks
- test_matrix: Use try/except for nio.crypto imports (importorskip can be
  fooled by MagicMock in sys.modules under xdist)
2026-04-07 09:58:45 -07:00
Teknium
e120d2afac
feat: notify_on_complete for background processes (#5779)
* feat: notify_on_complete for background processes

When terminal(background=true, notify_on_complete=true), the system
auto-triggers a new agent turn when the process exits — no polling needed.

Changes:
- ProcessSession: add notify_on_complete field
- ProcessRegistry: add completion_queue, populate on _move_to_finished()
- Terminal tool: add notify_on_complete parameter to schema + handler
- CLI: drain completion_queue after agent turn AND during idle loop
- Gateway: enhanced _run_process_watcher injects synthetic MessageEvent
  on completion, triggering a full agent turn
- Checkpoint persistence includes notify_on_complete for crash recovery
- code_execution_tool: block notify_on_complete in sandbox scripts
- 15 new tests covering queue mechanics, checkpoint round-trip, schema

* docs: update terminal tool descriptions for notify_on_complete

- background: remove 'ONLY for servers' language, describe both patterns
  (long-lived processes AND long-running tasks with notify_on_complete)
- notify_on_complete: more prescriptive about when to use it
- TERMINAL_TOOL_DESCRIPTION: remove 'Do NOT use background for builds'
  guidance that contradicted the new feature
2026-04-07 02:40:16 -07:00
Teknium
eb7c408445
fix(gateway): /stop and /new bypass Level 1 active-session guard (#5765)
* fix(gateway): /stop and /new bypass Level 1 active-session guard

The base adapter's Level 1 guard intercepted ALL messages while an
agent was running, including /stop and /new. These commands were queued
as pending messages instead of being dispatched to the gateway runner's
Level 2 handler. When the agent eventually stopped (via the interrupt
mechanism), the command text leaked into the conversation as a user
message — the model would receive '/stop' as input and respond to it.

Fix: Add /stop, /new, and /reset to the bypass set in base.py alongside
/approve, /deny, and /status. Consolidate the three separate bypass
blocks into one. Commands in the bypass set are dispatched inline to the
gateway runner, where Level 2 handles them correctly (hard-kill for
/stop, session reset for /new).

Also add a safety net in _run_agent's pending-message processing: if the
pending text resolves to a known slash command, discard it instead of
passing it to the agent. This catches edge cases where command text
leaks through the interrupt_message fallback.

Refs: #5244

* test: regression tests for command bypass of active-session guard

17 tests covering:
- /stop, /new, /reset bypass the Level 1 guard when agent is running
- /approve, /deny, /status bypass (existing behavior, now tested)
- Regular text and unknown commands still queued (not bypassed)
- File paths like '/path/to/file' not treated as commands
- Telegram @botname suffix handled correctly
- Safety net command resolution (resolve_command detects known commands)
2026-04-07 00:53:45 -07:00
Teknium
3bc2fe802e
feat(telegram): paginated model picker with Next/Prev navigation
- Raise max_models from 8 to 50 so all curated models come through
- Add _build_model_keyboard() helper with 8-per-page pagination
- Next ▶ / ◀ Prev buttons with page counter (e.g. 2/4)
- mg:<page> callback data for page navigation
- Catch-all query.answer() for noop buttons
2026-04-06 23:10:40 -07:00
Teknium
5a2cf280a3
feat: interactive model picker for Telegram and Discord (#5742)
/model with no args now shows an interactive UI on Telegram and Discord
instead of a text list:

Telegram: Inline keyboard buttons — two-step drill-down.
  Step 1: Provider buttons with model counts (e.g. 'OpenRouter (15)')
  Step 2: Model buttons within the selected provider
  Edits the same message in-place as the user navigates.
  Back/Cancel buttons for navigation.

Discord: Embed + Select dropdown menus via discord.ui.View.
  Step 1: Provider dropdown with model counts
  Step 2: Model dropdown within the selected provider
  Back/Cancel buttons. Auth-gated to allowed users.

Platforms without picker support (Slack, WhatsApp, Signal, etc.)
fall back to the existing text list.

/model <name> continues to work as a direct text switch on all
platforms — the interactive picker is only for bare /model.

Implementation:
- TelegramAdapter.send_model_picker() + _handle_model_picker_callback()
  with compact callback_data (mp:/mm:/mb/mx, all within 64-byte limit)
- DiscordAdapter.send_model_picker() + ModelPickerView (discord.ui.View)
  with Select menus (up to 25 options per dropdown)
- GatewayRunner._handle_model_command() detects adapter capability via
  getattr(type(adapter), 'send_model_picker', None) (safe with mocks)
  and sends picker with async callback closure for the switch logic
- Callback performs full switch: switch_model(), cached agent update,
  session override, pending model note — same as /model <name>
2026-04-06 23:00:04 -07:00
eizus
4ec615b0c2 feat(gateway): Enable Slack thread replies without explicit @mentions
When a user replies in a Slack thread where the bot has an active
conversation session, the bot now processes the message even without
an explicit @mention. This improves UX for ongoing threaded
discussions.

Changes:
- Added set_session_store() to BasePlatformAdapter for adapters to
  check active sessions
- Modified SlackAdapter to detect thread replies and check if a
  session exists for that thread before requiring @mentions
- Updated GatewayRunner to inject the session store into adapters
- Added comprehensive tests for the new behavior

Fixes: Thread replies without @jarvis are now processed if there is
an active session, matching user expectations for conversation flow
2026-04-06 21:27:16 -07:00
ryanautomated
0f9aa57069 fix: silent memory flush failure on /new and /resume commands
The _async_flush_memories() helper accepts (session_id) but both the
/new and /resume handlers passed two arguments (session_id, session_key).
The TypeError was silently swallowed at DEBUG level, so memory extraction
never ran when users typed /new or /resume.

One call site (the session expiry watcher) was already fixed in 9c96f669,
but /new and /resume were missed.

- gateway/run.py:3247 — remove stray session_key from /new handler
- gateway/run.py:4989 — remove stray session_key from /resume handler
- tests/gateway/test_resume_command.py:222 — update test assertion
2026-04-06 16:49:42 -07:00
Mikita Lisavets
29b5ec2555 fix: clear session-scoped model after session reset 2026-04-06 13:20:01 -07:00
Mikita Lisavets
9afb9a6cb2 fix: clear session-scoped model overrides during session reset 2026-04-06 13:20:01 -07:00
donrhmexe
2c814d7b5d fix: /model --global writes model.name instead of model.default
The canonical config key for model name is model.default (used by setup,
auth, runtime_provider, profile list, and CLI startup). But /model --global
wrote to model.name in both gateway and CLI paths.

This caused:
- hermes profile list showing the old model (reads model.default)
- Gateway restart reverting to the old model (_resolve_gateway_model reads model.default)
- CLI startup using the old model (main.py reads model.default)

The only reason it appeared to work in Telegram was the cached agent
staying alive with the in-place switch.

Fix: change all 3 write/read sites to use model.default.
2026-04-06 13:20:01 -07:00
Dusk1e
7d0953d6ff security(gateway): isolate env/credential registries using ContextVars 2026-04-06 12:42:16 -07:00
Teknium
9c96f669a1
feat: centralized logging, instrumentation, hermes logs CLI, gateway noise fix (#5430)
Adds comprehensive logging infrastructure to Hermes Agent across 4 phases:

**Phase 1 — Centralized logging**
- New hermes_logging.py with idempotent setup_logging() used by CLI, gateway, and cron
- agent.log (INFO+) and errors.log (WARNING+) with RotatingFileHandler + RedactingFormatter
- config.yaml logging: section (level, max_size_mb, backup_count)
- All entry points wired (cli.py, main.py, gateway/run.py, run_agent.py)
- Fixed debug_helpers.py writing to ./logs/ instead of ~/.hermes/logs/

**Phase 2 — Event instrumentation**
- API calls: model, provider, tokens, latency, cache hit %
- Tool execution: name, duration, result size (both sequential + concurrent)
- Session lifecycle: turn start (session/model/provider/platform), compression (before/after)
- Credential pool: rotation events, exhaustion tracking

**Phase 3 — hermes logs CLI command**
- hermes logs / hermes logs -f / hermes logs errors / hermes logs gateway
- --level, --session, --since filters
- hermes logs list (file sizes + ages)

**Phase 4 — Gateway bug fix + noise reduction**
- fix: _async_flush_memories() called with wrong arg count — sessions never flushed
- Batched session expiry logs: 6 lines/cycle → 2 summary lines
- Added inbound message + response time logging

75 new tests, zero regressions on the full suite.
2026-04-06 00:08:20 -07:00
Teknium
dce5f51c7c
feat: config structure validation — detect malformed YAML at startup (#5426)
Add validate_config_structure() that catches common config.yaml mistakes:
- custom_providers as dict instead of list (missing '-' in YAML)
- fallback_model accidentally nested inside another section
- custom_providers entries missing required fields (name, base_url)
- Missing model section when custom_providers is configured
- Root-level keys that look like misplaced custom_providers fields

Surface these diagnostics at three levels:
1. Startup: print_config_warnings() runs at CLI and gateway module load,
   so users see issues before hitting cryptic errors
2. Error time: 'Unknown provider' errors in auth.py and model_switch.py
   now include config diagnostics with fix suggestions
3. Doctor: 'hermes doctor' shows a Config Structure section with all
   issues and fix hints

Also adds a warning log in runtime_provider.py when custom_providers
is a dict (previously returned None silently).

Motivated by a Discord user who had malformed custom_providers YAML
and got only 'Unknown Provider' with no guidance on what was wrong.

17 new tests covering all validation paths.
2026-04-05 23:31:20 -07:00
Teknium
89c812d1d2
feat: shared thread sessions by default — multi-user thread support (#5391)
Threads (Telegram forum topics, Discord threads, Slack threads) now default
to shared sessions where all participants see the same conversation. This is
the expected UX for threaded conversations where multiple users @mention the
bot and interact collaboratively.

Changes:
- build_session_key(): when thread_id is present, user_id is no longer
  appended to the session key (threads are shared by default)
- New config: thread_sessions_per_user (default: false) — opt-in to restore
  per-user isolation in threads if needed
- Sender attribution: messages in shared threads are prefixed with
  [sender name] so the agent can tell participants apart
- System prompt: shared threads show 'Multi-user thread' note instead of
  a per-turn User line (avoids busting prompt cache)
- Wired through all callers: gateway/run.py, base.py, telegram.py, feishu.py
- Regular group messages (no thread) remain per-user isolated (unchanged)
- DM threads are unaffected (they have their own keying logic)

Closes community request from demontut_ re: thread-based shared sessions.
2026-04-05 19:46:58 -07:00
Teknium
fec58ad99e
fix(gateway): replace wall-clock agent timeout with inactivity-based timeout (#5389)
The gateway previously used a hard wall-clock asyncio.wait_for timeout
that killed agents after a fixed duration regardless of activity. This
punished legitimate long-running tasks (subagent delegation, reasoning
models, multi-step research).

Now uses an inactivity-based polling loop that checks the agent's
built-in activity tracker (get_activity_summary) every 5 seconds. The
agent can run indefinitely as long as it's actively calling tools or
receiving API responses. Only fires when the agent has been completely
idle for the configured duration.

Changes:
- Replace asyncio.wait_for with asyncio.wait poll loop checking
  agent idle time via get_activity_summary()
- Add agent.gateway_timeout config.yaml key (default 1800s, 0=unlimited)
- Update stale session eviction to use agent idle time instead of
  pure wall-clock (prevents evicting active long-running tasks)
- Preserve all existing diagnostic logging and user-facing context

Inspired by PR #4864 (Mibayy) and issue #4815 (BongSuCHOI).
Reimplemented on current main using existing _touch_activity()
infrastructure rather than a parallel tracker.
2026-04-05 19:38:21 -07:00
Teknium
2563493466
fix: improve timeout debug logging and user-facing diagnostics (#5370)
Agent activity tracking:
- Add _last_activity_ts, _last_activity_desc, _current_tool to AIAgent
- Touch activity on: API call start/complete, tool start/complete,
  first stream chunk, streaming request start
- Public get_activity_summary() method for external consumers

Gateway timeout diagnostics:
- Timeout message now includes what the agent was doing when killed:
  actively working vs stuck on a tool vs waiting on API response
- Includes iteration count, last activity description, seconds since
  last activity — users can distinguish legitimate long tasks from
  genuine hangs
- 'Still working' notifications now show iteration count and current
  tool instead of just elapsed time
- Stale lock eviction logs include agent activity state for debugging

Stream stale timeout:
- _emit_status when stale stream is detected (was log-only) — gateway
  users now see 'No response from provider for Ns' with model and
  context size
- Improved logger.warning with model name and estimated context size

Error path notifications (gateway-visible via _emit_status):
- Context compression attempts now use _emit_status (was _vprint only)
- Non-retryable client errors emit summary before aborting
- Max retry exhaustion emits error summary (was _vprint only)
- Rate limit exhaustion emits specific rate-limit message

These were all CLI-visible but silent to gateway users, which is why
people on Telegram/Discord saw generic 'request failed' messages
without explanation.
2026-04-05 18:33:33 -07:00
Mibayy
cc2b56b26a feat(api): structured run events via /v1/runs SSE endpoint
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).

Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.

Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.

Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-04-05 12:05:13 -07:00
Teknium
0c95e91059 fix: follow-up fixes for salvaged PRs
- Fix GatewayApp → GatewayRunner import in api_server.py (PR #4976)
- Update launchd test assertions for new bootstrap/bootout/kickstart commands (PR #4892)
- Add nonlocal message declaration in run_sync() to fix UnboundLocalError (pre-existing scoping bug)
2026-04-05 11:59:28 -07:00
analista
6a6ae9a5c3 fix(gateway): correct misleading log text for unknown /commands
The warning said 'forwarding as plain text' but the code returns a
user-facing error reply instead of forwarding. Describe what actually
happens.
2026-04-05 11:59:28 -07:00
analista
e8053e8b93 fix(gateway): surface unknown /commands instead of leaking them to the LLM
Previously, typing a /command that isn't a built-in, plugin, or skill
would silently fall through to the LLM as plain text. The model often
interprets it as a loose instruction and invents unrelated tool calls —
e.g. a stray /claude_code slipped through and the model fabricated a
delegate_task invocation that got stuck in an OAuth loop.

Now we check GATEWAY_KNOWN_COMMANDS after the skill / plugin /
unavailable-skill lookups and return an actionable message pointing the
user at /commands. The user gets feedback, and the agent doesn't waste
a round-trip guessing what /foo-bar was supposed to mean.
2026-04-05 11:59:28 -07:00
analista
4a75aec433 fix(gateway): resolve Telegram's underscored /commands to skill/plugin keys
Telegram's Bot API disallows hyphens in command names, so
_build_telegram_menu registers /claude-code as /claude_code. When the
user taps it from autocomplete, the gateway dispatch did a direct
lookup against skill_cmds (keyed on the hyphenated form) and missed,
silently falling through to the LLM as plain text. The model would
then typically call delegate_task, spawning a Hermes subagent instead
of invoking the intended skill.

Normalize underscores to hyphens in skill and plugin command lookup,
matching the existing pattern in _check_unavailable_skill.
2026-04-05 11:59:28 -07:00
nibzard
4df2fca2f0 fix(gateway): cap memory flush retries at 3 to prevent infinite loop
The _session_expiry_watcher retried failed memory flushes forever
because exceptions were caught at debug level without setting
memory_flushed=True. Expired sessions with transient failures
(rate limits, network errors) would retry every 5 minutes
indefinitely, burning API quota and blocking gateway message
processing via 429 rate limit cascades.

Observed case: a March 19 session retried 28+ times over ~17 days,
causing repeated 429 errors that made Telegram unresponsive.

Add a per-session failure counter (_flush_failures) that gives up
after 3 consecutive attempts and marks the session as flushed to
break the loop.
2026-04-05 11:59:28 -07:00
Teknium
c100ad874c fix(matrix): E2EE cron delivery via live adapter + HTML formatting + origin fallback
Salvaged from PRs #3767 (chalkers), #5236 (ygd58), #2641 (buntingszn).

Three improvements to Matrix cron delivery:

1. Live adapter path: when the gateway is running, cron delivery now uses
   the connected MatrixAdapter via run_coroutine_threadsafe instead of
   the standalone HTTP PUT. This enables delivery to E2EE rooms where
   the raw HTTP path cannot encrypt. Falls back to standalone on failure.
   Threads adapters + event loop from gateway -> cron ticker -> tick() ->
   _deliver_result(). (from #3767)

2. HTML formatted_body: _send_matrix() now converts markdown to HTML
   using the optional markdown library, with h1-h6 to bold conversion
   for Element X compatibility. Falls back to plain text if markdown
   is not installed. Also adds random bytes to txn_id to prevent
   collisions. (from #5236)

3. Origin fallback: when deliver="origin" but origin is null (jobs
   created via API/scripts), falls back to HOME_CHANNEL env vars
   in order: matrix -> telegram -> discord -> slack. (from #2641)
2026-04-05 11:07:47 -07:00
dlkakbs
36e046e843 fix(gateway): MIME type fallback for Matrix document uploads
Cherry-picked run.py portion from PR #3495 by dlkakbs.
When Matrix sends non-image files (text, YAML, JSON, etc.), the MIME
type may be empty or application/octet-stream. Falls back to
extension-based detection so text files are properly injected into
agent context.
2026-04-05 11:07:47 -07:00
LucidPaths
70f798043b fix: Ollama Cloud auth, /model switch persistence, and alias tab completion
- Add OLLAMA_API_KEY to credential resolution chain for ollama.com endpoints
- Update requested_provider/_explicit_api_key/_explicit_base_url after /model
  switch so _ensure_runtime_credentials() doesn't revert the switch
- Pass base_url/api_key from fallback config to resolve_provider_client()
- Add DirectAlias system: user-configurable model_aliases in config.yaml
  checked before catalog resolution, with reverse lookup by model ID
- Add /model tab completion showing aliases with provider metadata

Co-authored-by: LucidPaths <LucidPaths@users.noreply.github.com>
2026-04-05 11:06:06 -07:00
Teknium
e899d6a05d
fix: increase default HERMES_AGENT_TIMEOUT from 10min to 30min
Users hitting the 10-minute default during complex tool chains.
Bumps both the execution cap and stale-lock eviction timeout.
Still overridable via HERMES_AGENT_TIMEOUT env var (0 = unlimited).
2026-04-05 10:32:59 -07:00
Teknium
4976a8b066
feat: /model command — models.dev primary database + --provider flag (#5181)
Full overhaul of the model/provider system.

## What changed
- models.dev (109 providers, 4000+ models) as primary database for provider identity AND model metadata
- --provider flag replaces colon syntax for explicit provider switching
- Full ModelInfo/ProviderInfo dataclasses with context, cost, capabilities, modalities
- HermesOverlay system merges models.dev + Hermes-specific transport/auth/aggregator flags
- User-defined endpoints via config.yaml providers: section
- /model (no args) lists authenticated providers with curated model catalog
- Rich metadata display: context window, max output, cost/M tokens, capabilities
- Config migration: custom_providers list → providers dict (v11→v12)
- AIAgent.switch_model() for in-place model swap preserving conversation

## Files
agent/models_dev.py, hermes_cli/providers.py, hermes_cli/model_switch.py,
hermes_cli/model_normalize.py, cli.py, gateway/run.py, run_agent.py,
hermes_cli/config.py, hermes_cli/commands.py
2026-04-05 01:04:44 -07:00
Teknium
0c54da8aaf
feat(gateway): live-stream /update output + interactive prompt buttons (#5180)
* feat(gateway): live-stream /update output + forward interactive prompts

Adds real-time output streaming and interactive prompt forwarding for
the gateway /update command, so users on Telegram/Discord/etc see the
full update progress and can respond to prompts (stash restore, config
migration) without needing terminal access.

Changes:

hermes_cli/main.py:
- Add --gateway flag to 'hermes update' argparse
- Add _gateway_prompt() file-based IPC function that writes
  .update_prompt.json and polls for .update_response
- Modify _restore_stashed_changes() to accept optional input_fn
  parameter for gateway mode prompt forwarding
- cmd_update() uses _gateway_prompt when --gateway is set, enabling
  interactive stash restore and config migration prompts

gateway/run.py:
- _handle_update_command: spawn with --gateway flag and
  PYTHONUNBUFFERED=1 for real-time output flushing
- Store session_key in .update_pending.json for cross-restart
  session matching
- Add _update_prompt_pending dict to track sessions awaiting
  update prompt responses
- Replace _watch_for_update_completion with _watch_update_progress:
  streams output chunks every ~4s, detects .update_prompt.json and
  forwards prompts to the user, handles completion/failure/timeout
- Add update prompt interception in _handle_message: when a prompt
  is pending, the user's next message is written to .update_response
  instead of being processed normally
- Preserve _send_update_notification as legacy fallback for
  post-restart cases where adapter isn't available yet

File-based IPC protocol:
- .update_prompt.json: written by update process with prompt text,
  default value, and unique ID
- .update_response: written by gateway with user's answer
- .update_output.txt: existing, now streamed in real-time
- .update_exit_code: existing completion marker

Tests: 16 new tests covering _gateway_prompt IPC, output streaming,
prompt detection/forwarding, message interception, and cleanup.

* feat: interactive buttons for update prompts (Telegram + Discord)

Telegram: Inline keyboard with ✓ Yes / ✗ No buttons. Clicking a button
answers the callback query, edits the message to show the choice, and
writes .update_response directly. CallbackQueryHandler registered on
the update_prompt: prefix.

Discord: UpdatePromptView (discord.ui.View) with green Yes / red No
buttons. Follows the ExecApprovalView pattern — auth check, embed color
update, disabled-after-click. Writes .update_response on click.

All platforms: /approve and /deny (and /yes, /no) now work as shorthand
for yes/no when an update prompt is pending. The text fallback message
instructs users to use these commands. Raw message interception still
works as a fallback for non-command responses.

Gateway watcher checks adapter for send_update_prompt method (class-level
check to avoid MagicMock false positives) and falls back to text prompt
with /approve instructions when unavailable.

* fix: block /update on non-messaging platforms (API, webhooks, ACP)

Add _UPDATE_ALLOWED_PLATFORMS frozenset that explicitly lists messaging
platforms where /update is permitted. API server, webhook, and ACP
platforms get a clear error directing them to run hermes update from
the terminal instead.

ACP and API server already don't reach _handle_message (separate
codepaths), and webhooks have distinct session keys that can't collide
with messaging sessions. This guard is belt-and-suspenders.
2026-04-05 00:28:58 -07:00
Teknium
b93fa234df
fix: clear ghost status-bar lines on terminal resize (#4960)
* feat: add /branch (/fork) command for session branching

Inspired by Claude Code's /branch command. Creates a copy of the current
session's conversation history in a new session, allowing the user to
explore a different approach without losing the original.

Works like 'git checkout -b' for conversations:
- /branch            — auto-generates a title from the parent session
- /branch my-idea    — uses a custom title
- /fork              — alias for /branch

Implementation:
- CLI: _handle_branch_command() in cli.py
- Gateway: _handle_branch_command() in gateway/run.py
- CommandDef with 'fork' alias in commands.py
- Uses existing parent_session_id field in session DB
- Uses get_next_title_in_lineage() for auto-numbered branches
- 14 tests covering session creation, history copy, parent links,
  title generation, edge cases, and agent sync

* fix: clear ghost status-bar lines on terminal resize

When the terminal shrinks (e.g. un-maximize), the emulator reflows
previously full-width rows (status bar, input rules) into multiple
narrower rows. prompt_toolkit's _on_resize only cursor_up()s by the
stored layout height, missing the extra rows from reflow — leaving
ghost duplicates of the status bar visible.

Fix: monkey-patch Application._on_resize to detect width shrinks,
calculate the extra rows created by reflow, and inflate the renderer's
cursor_pos.y so the erase moves up far enough to clear ghosts.
2026-04-03 22:43:45 -07:00