hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Teknium	1a3ae6ac6e	feat: structured API error classification for smart failover (#6514 ) Add agent/error_classifier.py with a priority-ordered classification pipeline that replaces scattered inline string-matching in the retry loop with structured error taxonomy and recovery hints. FailoverReason enum (14 categories): auth, auth_permanent, billing, rate_limit, overloaded, server_error, timeout, context_overflow, payload_too_large, model_not_found, format_error, thinking_signature, long_context_tier, unknown. ClassifiedError dataclass carries reason + recovery action hints (retryable, should_compress, should_rotate_credential, should_fallback). Key improvements over inline matching: - 402 disambiguation: 'insufficient credits' = billing (immediate rotate), 'usage limit, try again' = rate_limit (backoff first) - OpenRouter 403 'key limit exceeded' correctly classified as billing - Error cause chain walking (walks __cause__/__context__ up to 5 levels) - Body message included in pattern matching (SDK str() misses it) - Server disconnect + large session check ordered before generic transport catch so RemoteProtocolError triggers compression when appropriate - Chinese error message support for context overflow run_agent.py: replaced 6 inline detection blocks with classifier calls, net -55 lines. All recovery actions (pool rotation, fallback activation, compression, transport recovery) unchanged. 65 new unit tests + 10 E2E tests + live tests with real SDK error objects. Inspired by OpenClaw's failover error classification system.	2026-04-09 04:10:11 -07:00
Teknium	8dfc96dbbb	feat: capture provider rate limit headers and show in /usage (#6541 ) Parse x-ratelimit-* headers from inference API responses (Nous Portal, OpenRouter, OpenAI-compatible) and display them in the /usage command. - New agent/rate_limit_tracker.py: parse 12 rate limit headers (RPM/RPH/ TPM/TPH limits, remaining, reset timers), format as progress bars (CLI) or compact one-liner (gateway) - Hook into streaming path in run_agent.py: stream.response.headers is available on the OpenAI SDK Stream object before chunks are consumed - CLI /usage: appends rate limit section with progress bars + warnings when any bucket exceeds 80% - Gateway /usage: appends compact rate limit summary - 24 unit tests covering parsing, formatting, edge cases Headers captured per response: x-ratelimit-{limit,remaining,reset}-{requests,tokens}{,-1h} Example CLI display: Nous Rate Limits (captured just now): Requests/min [░░░░░░░░░░░░░░░░░░░░] 0.1% 1/800 used (799 left, resets in 59s) Tokens/hr [░░░░░░░░░░░░░░░░░░░░] 0.0% 49/336.0M (336.0M left, resets in 52m)	2026-04-09 03:43:14 -07:00
Teknium	1eabbe905e	fix: retry 3 times when model returns truly empty response (#6488 ) When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now silently retries up to 3 times before falling through to (empty). Silent retry (no synthetic messages) keeps the conversation history clean, preserves prompt caching, and respects the no-synthetic-user- injection invariant. Most empty responses from open models are transient (provider hiccups, rate limits, sampling flukes) so a simple retry is sufficient. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (#5931 — structured reasoning) 3. Empty response silent retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That differs from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge	2026-04-09 02:06:12 -07:00
Brooklyn Nicholson	54bd25ff4a	fix(tui): -c resume, ctrl z, pasting updates, exit summary, session fix	2026-04-09 00:36:53 -05:00
angelos	e7d3e9d767	fix(terminal): persistent sandbox envs survive between turns `_cleanup_task_resources` was unconditionally calling `cleanup_vm()` at the end of every `run_conversation` (i.e. every user turn), tearing down the docker/daytona/modal sandbox container regardless of its `persistent_filesystem` setting. This contradicted the documented intent of `terminal.lifetime_seconds` (idle reaper) and `container_persistent`, and caused per-turn loss of `/workspace`, `~/.config`, agent CLI auth state, and any other content living inside the sandbox. The unconditional teardown was introduced in `fbd3a2fd` ("prevent leakage of morph instances between tasks", 2025-11-04) to plug a Morph backend leak, two days after `lifetime_seconds` shipped in `faecbddd`. It was later refactored into `_cleanup_task_resources` in `70dd3a16` without changing semantics. Code and docs have disagreed since. Fix: introduce `terminal_tool.is_persistent_env(task_id)` and skip the per-turn `cleanup_vm` when the active env is persistent. The idle reaper (`_cleanup_inactive_envs`) still tears persistent envs down once `terminal.lifetime_seconds` is exceeded. Non-persistent backends (Morph) are unchanged — still torn down per turn, preserving the original leak-prevention intent.	2026-04-08 21:31:57 -07:00
Teknium	54db7cbbe1	fix(agent): tiered context pressure warnings + gateway dedup (#6411 ) Combines the approaches from PR #6309 (duan78) and PR #5963 (KUSH42): Tiered warnings (from #5963): - Replaces boolean _context_pressure_warned with float _context_pressure_warned_at - Fires at 85% (orange) and re-fires at 95% (red/critical) - Adds 'compacting context...' status message before compression Gateway dedup (from #6309): - Class-level dict _context_pressure_last_warned survives across AIAgent instances (gateway creates a new instance per message) - 5-minute cooldown per session prevents warning spam - Higher-tier warnings bypass the cooldown (85% → 95% always fires) - Compression reset clears the dedup entry for the session - Stale entries evicted (older than 2x cooldown) to prevent memory leak Does NOT inject into messages — purely user-facing via _safe_print (CLI) and status_callback (gateway). Zero prompt cache impact. Fixes #6309. Fixes #5963.	2026-04-08 21:31:44 -07:00
SHL0MS	8567031433	fix: improve context compression quality — named constants, tool tracking, degradation warning Three targeted improvements to the compression system: 1. Replace hardcoded truncation limits with named class constants (_CONTENT_MAX=6000, _CONTENT_HEAD=4000, _CONTENT_TAIL=1500, _TOOL_ARGS_MAX=1500, _TOOL_ARGS_HEAD=1200). Previous limits (3000/500) heavily truncated the summarizer's input — a 200-line edit got cut to 3000 chars before the summarizer ever saw it. 2. Add '## Tools & Patterns' section to both compression prompt templates (first-pass and iterative). Preserves working tool invocations, preferred flags, and tool-specific discoveries across compaction boundaries. 3. Warn users on 2nd+ compression: 'Session compressed N times — accuracy may degrade. Consider /new to start fresh.' Ref #499	2026-04-08 20:54:23 -07:00
Teknium	ae4a884e8d	fix(agent): disable stale stream timeout for local providers (#6368 ) Local inference providers (Ollama, oMLX, llama-cpp) can take 300+ seconds for prefill on large contexts. The 180s stale stream detector was killing these connections while the provider was still processing. Uses the existing is_local_endpoint() (proper URL parsing with RFC-1918, localhost, WSL detection) instead of ad-hoc substring matching. The stale timeout is only disabled when the user hasn't explicitly set HERMES_STREAM_STALE_TIMEOUT — explicit user config is always honored. Fixes #5889	2026-04-08 19:53:39 -07:00
konsisumer	42e366f27b	fix(agent): respect config timeout for flush_memories instead of hardcoded 30s The _call_llm() and direct OpenAI fallback paths in flush_memories() both hardcoded timeout=30.0, ignoring the user-configurable value at auxiliary.flush_memories.timeout in config.yaml. Remove the explicit timeout from the auxiliary _call_llm() call so that _get_task_timeout('flush_memories') reads from config. For the direct OpenAI fallback, import and use _get_task_timeout() instead of the hardcoded value. Add two regression tests verifying both code paths respect the config. Fixes #6154	2026-04-08 18:55:33 -07:00
kshitijk4poor	875a72e4c8	fix: normalize httpx.URL base_url + strip thinking signatures for third-party endpoints Two linked fixes for MiniMax Anthropic-compatible fallback: 1. Normalize httpx.URL to str before calling .rstrip() in auth/provider detection helpers. Some client objects expose base_url as httpx.URL, not str — crashed with AttributeError in _requires_bearer_auth() and _is_third_party_anthropic_endpoint(). Also fixes _try_activate_fallback() to use the already-stringified fb_base_url instead of raw httpx.URL. 2. Strip Anthropic-proprietary thinking block signatures when targeting third-party Anthropic-compatible endpoints (MiniMax, Azure AI Foundry, self-hosted proxies). These endpoints cannot validate Anthropic's signatures and reject them with HTTP 400 'Invalid signature in thinking block'. Now threads base_url through convert_messages_to_anthropic() → build_anthropic_kwargs() so signature management is endpoint-aware. Based on PR #4945 by kshitijk4poor (rstrip fix). Fixes #4944.	2026-04-08 16:39:29 -07:00
kshitijk4poor	3377017eb4	feat(qwen): add Qwen OAuth provider with portal request support Based on #6079 by @tunamitom with critical fixes and comprehensive tests. Changes from #6079: - Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex field sanitization, not before (was silently discarding Qwen transforms) - Fix: missing try/except AuthError in runtime_provider.py — stale Qwen credentials now fall through to next provider on auto-detect - Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba' (DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider - Fix: hardcoded ['coder-model'] replaced with live API fetch + curated fallback list (qwen3-coder-plus, qwen3-coder) - Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace 5 inline 'portal.qwen.ai' string checks and share headers between init and credential swap - Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session credential swaps - Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice - Fix: handle bare string items in content lists (were silently dropped) - Fix: remove redundant dict() copies after deepcopy in message prep - Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion New tests (30 test functions): - _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths) - _save_qwen_cli_tokens (roundtrip, parent creation, permissions) - _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew, None, non-numeric) - _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths, default expires_in, disk persistence) - resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh, missing token, env override) - get_qwen_auth_status (logged in, not logged in) - Runtime provider resolution (direct, pool entry, alias) - _build_api_kwargs (metadata, vl_high_resolution_images, message formatting, max_tokens suppression)	2026-04-08 13:46:30 -07:00
Teknium	1368caf66f	fix(anthropic): smart thinking block signature management (#6112 ) Anthropic signs thinking blocks against the full turn content. Any upstream mutation (context compression, session truncation, orphan stripping, message merging) invalidates the signature, causing HTTP 400 'Invalid signature in thinking block' — especially in long-lived gateway sessions. Strategy (following clawdbot/OpenClaw pattern): 1. Strip thinking/redacted_thinking from all assistant messages EXCEPT the last one — preserves reasoning continuity on the current tool-use chain while avoiding stale signature errors on older turns. 2. Downgrade unsigned thinking blocks to plain text — Anthropic can't validate them, but the reasoning content is preserved. 3. Strip cache_control from thinking/redacted_thinking blocks to prevent cache markers from interfering with signature validation. 4. Drop thinking blocks from the second message when merging consecutive assistant messages (role alternation enforcement). 5. Error recovery: on HTTP 400 mentioning 'signature' and 'thinking', strip all reasoning_details from the conversation and retry once. This is the safety net for edge cases the proactive stripping misses. Addresses the issue reported in PR #6086 by @mingginwan while preserving reasoning continuity (their PR stripped ALL thinking blocks unconditionally). Files changed: - agent/anthropic_adapter.py: thinking block management in convert_messages_to_anthropic (strip old turns, downgrade unsigned, strip cache_control, merge-time strip) - run_agent.py: one-shot signature error recovery in retry loop - tests/test_anthropic_adapter.py: 10 new tests covering all cases	2026-04-08 03:38:08 -07:00
alt-glitch	65e24c942e	wip: tool result fixes -- persistence	2026-04-08 02:24:32 -07:00
zocomputer	e1befe5077	feat(agent): add jittered retry backoff Adds agent/retry_utils.py with jittered_backoff() — exponential backoff with additive jitter to prevent thundering-herd retry spikes when multiple gateway sessions hit the same rate-limited provider. Replaces fixed exponential backoff at 4 call sites: - run_agent.py: None-choices retry path (5s base, 120s cap) - run_agent.py: API error retry path (2s base, 60s cap) - trajectory_compressor.py: sync + async summarization retries Thread-safe jitter counter with overflow guards ensures unique seeds across concurrent retries. Trimmed from original PR to keep only wired-in functionality. Co-authored-by: martinp09 <martinp09@users.noreply.github.com>	2026-04-08 00:41:36 -07:00
Teknium	5c03f2e7cc	fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983 ) Salvaged fixes from community PRs: - fix(model_switch): _read_auth_store → _load_auth_store + fix auth store key lookup (was checking top-level dict instead of store['providers']). OAuth providers now correctly detected in /model picker. Cherry-picked from PR #5911 by Xule Lin (linxule). - fix(ollama): pass num_ctx to override 2048 default context window. Ollama defaults to 2048 context regardless of model capabilities. Now auto-detects from /api/show metadata and injects num_ctx into every request. Config override via model.ollama_num_ctx. Fixes #2708. Cherry-picked from PR #5929 by kshitij (kshitijk4poor). - fix(aux): normalize provider aliases for vision/auxiliary routing. Adds _normalize_aux_provider() with 17 aliases (google→gemini, claude→anthropic, glm→zai, etc). Fixes vision routing failure when provider is set to 'google' instead of 'gemini'. Cherry-picked from PR #5793 by e11i (Elizabeth1979). - fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK. MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API), but auxiliary client uses OpenAI SDK which appends /chat/completions → 404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint. Inspired by PR #5786 by Lempkey. Added debug logging to silent exception blocks across all fixes. Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-04-07 22:23:28 -07:00
lesterli	37bf19a29d	fix(codex): align validation with normalization for empty stream output The response validation stage unconditionally marked Codex Responses API replies as invalid when response.output was empty, triggering unnecessary retries and fallback chains. However, _normalize_codex_response can recover from this state by synthesizing output from response.output_text. Now the validation stage checks for output_text before marking the response invalid, matching the normalization logic. Also fixes logging.warning → logger.warning for consistency with the rest of the file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 17:29:41 -07:00
Teknium	ab8f9c089e	feat: thinking-only prefill continuation for structured reasoning responses (#5931 ) When the model produces structured reasoning (via API fields like .reasoning, .reasoning_content, .reasoning_details) but no visible text content, append the assistant message as prefill and continue the loop. The model sees its own reasoning context on the next turn and produces the text portion. Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill attempts before falling through to the existing '(empty)' terminal. Key design decisions: - Only triggers for structured reasoning (API fields), NOT inline <think> tags - Prefill messages are popped on success to maintain strict role alternation - _thinking_prefill marker stripped from all API message building paths - Works across all providers: OpenAI (continuation), Anthropic (native prefill) Verified with E2E tests: simulated thinking-only → real OpenRouter continuation produces correct content. Also confirmed Qwen models consistently produce structured-reasoning-only responses under token pressure.	2026-04-07 13:19:06 -07:00
Teknium	ca0459d109	refactor: remove 24 confirmed dead functions — 432 lines of unused code Each function was verified to have exactly 1 reference in the entire codebase (its own definition). Zero calls, zero imports, zero string references anywhere including tests. Removed by category: Superseded wrappers (replaced by newer implementations): - agent/anthropic_adapter.py: run_hermes_oauth_login, refresh_hermes_oauth_token - hermes_cli/callbacks.py: sudo_password_callback (superseded by CLI method) - hermes_cli/setup.py: _set_model_provider, _sync_model_from_disk - tools/file_tools.py: get_file_tools (superseded by registry.register) - tools/cronjob_tools.py: get_cronjob_tool_definitions (same) - tools/terminal_tool.py: _check_dangerous_command (_check_all_guards used) Dead private helpers (lost their callers during refactors): - agent/anthropic_adapter.py: _convert_user_content_part_to_anthropic - agent/display.py: honcho_session_line, write_tty - hermes_cli/providers.py: _build_labels (+ dead _labels_cache var) - hermes_cli/tools_config.py: _prompt_yes_no - hermes_cli/models.py: _extract_model_ids - hermes_cli/uninstall.py: log_error - gateway/platforms/feishu.py: _is_loop_ready - tools/file_operations.py: _read_image (64-line method) - tools/process_registry.py: cleanup_expired - tools/skill_manager_tool.py: check_skill_manage_requirements Dead class methods (zero callers): - run_agent.py: _is_anthropic_url (logic duplicated inline at L618) - run_agent.py: _classify_empty_content_response (68-line method, never wired) - cli.py: reset_conversation (callers all use new_session directly) - cli.py: _clear_current_input (added but never wired in) Other: - gateway/delivery.py: build_delivery_context_for_tool - tools/browser_tool.py: get_active_browser_sessions	2026-04-07 11:41:26 -07:00
Teknium	69c753c19b	fix: thread gateway user_id to memory plugins for per-user scoping (#5895 ) Memory plugins (Mem0, Honcho) used static identifiers ('hermes-user', config peerName) meaning all gateway users shared the same memory bucket. Changes: - AIAgent.__init__: add user_id parameter, store as self._user_id - run_agent.py: include user_id in _init_kwargs passed to memory providers - gateway/run.py: pass source.user_id to AIAgent in primary + background paths - Mem0 plugin: prefer kwargs user_id over config default - Honcho plugin: override cfg.peer_name with gateway user_id when present CLI sessions (user_id=None) preserve existing defaults. Only gateway sessions with a real platform user_id get per-user memory scoping. Reported by plev333.	2026-04-07 11:14:12 -07:00
Teknium	d0ffb111c2	refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821 ) Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture) and manual analysis of the entire codebase. Changes by category: Unused imports removed (~95 across 55 files): - Removed genuinely unused imports from all major subsystems - agent/, hermes_cli/, tools/, gateway/, plugins/, cron/ - Includes imports in try/except blocks that were truly unused (vs availability checks which were left alone) Unused variables removed (~25): - Removed dead variables: connected, inner, channels, last_exc, source, new_server_names, verify, pconfig, default_terminal, result, pending_handled, temperature, loop - Dropped unused argparse subparser assignments in hermes_cli/main.py (12 instances of add_parser() where result was never used) Dead code removed: - run_agent.py: Removed dead ternary (None if False else None) and surrounding unreachable branch in identity fallback - run_agent.py: Removed write-only attribute _last_reported_tool - hermes_cli/providers.py: Removed dead @property decorator on module-level function (decorator has no effect outside a class) - gateway/run.py: Removed unused MCP config load before reconnect - gateway/platforms/slack.py: Removed dead SessionSource construction Undefined name bugs fixed (would cause NameError at runtime): - batch_runner.py: Added missing logger = logging.getLogger(__name__) - tools/environments/daytona.py: Added missing Dict and Path imports Unnecessary global statements removed (14): - tools/terminal_tool.py: 5 functions declared global for dicts they only mutated via .pop()/[key]=value (no rebinding) - tools/browser_tool.py: cleanup thread loop only reads flag - tools/rl_training_tool.py: 4 functions only do dict mutations - tools/mcp_oauth.py: only reads the global - hermes_time.py: only reads cached values Inefficient patterns fixed: - startswith/endswith tuple form: 15 instances of x.startswith('a') or x.startswith('b') consolidated to x.startswith(('a', 'b')) - len(x)==0 / len(x)>0: 13 instances replaced with pythonic truthiness checks (not x / bool(x)) - in dict.keys(): 5 instances simplified to in dict - Redefined unused name: removed duplicate _strip_mdv2 import in send_message_tool.py Other fixes: - hermes_cli/doctor.py: Replaced undefined logger.debug() with pass - hermes_cli/config.py: Consolidated chained .endswith() calls Test results: 3934 passed, 17 failed (all pre-existing on main), 19 skipped. Zero regressions.	2026-04-07 10:25:31 -07:00
Teknium	2021442c8a	fix: cover remaining codex empty-output gaps in fallback + normalizer (#5724 ) Two gaps in the codex empty-output handling: 1. _run_codex_create_stream_fallback() skipped all non-terminal events, so when the fallback path was used and the terminal response had empty output, there was no recovery. Now collects output_item.done and text deltas during the fallback stream, backfills on empty output. 2. _normalize_codex_response() hard-crashed with RuntimeError when output was empty, even when the response had output_text set. The function already had fallback logic at line 3562 to use output_text, but the guard at line 3446 killed it first. Now checks output_text before raising and synthesizes a minimal output item.	2026-04-06 20:58:47 -07:00
Teknium	0e336b0e71	fix: backfill codex stream output from output_item.done events (#5689 ) Salvages the core fix from PR #5673 (egerev) onto current main. The chatgpt.com/backend-api/codex endpoint streams valid output items via response.output_item.done events, but the OpenAI SDK's get_final_response() returns an empty output list. This caused every Codex response to be rejected as invalid. Fix: collect output_item.done events during streaming and backfill response.output when get_final_response() returns empty. Falls back to synthesizing from text deltas when no done events were received. Also moves the synthesis logic from the validation loop (too late, from #5681) into _run_codex_stream() (before the response leaves the streaming function), and simplifies the validation to just log diagnostics since recovery now happens upstream. Co-authored-by: Egor <egerev@users.noreply.github.com>	2026-04-06 18:19:30 -07:00
Teknium	dc4c07ed9d	fix: codex OAuth credential pool disconnect + expired token import (#5681 ) Three bugs causing OpenAI Codex sessions to fail silently: 1. Credential pool vs legacy store disconnect: hermes auth and hermes model store device_code tokens in the credential pool, but get_codex_auth_status(), resolve_codex_runtime_credentials(), and _model_flow_openai_codex() only read from the legacy provider state. Fresh pool tokens were invisible to the auth status checks and model selection flow. 2. _import_codex_cli_tokens() imported expired tokens from ~/.codex/ without checking JWT expiry. Combined with _login_openai_codex() saying 'Login successful!' for expired credentials, users got stuck in a loop of dead tokens being recycled. 3. _login_openai_codex() accepted expired tokens from resolve_codex_runtime_credentials() without validating expiry before telling the user login succeeded. Fixes: - get_codex_auth_status() now checks credential pool first, falls back to legacy provider state - _model_flow_openai_codex() uses pool-aware auth status for token retrieval when fetching model lists - _import_codex_cli_tokens() validates JWT exp claim, rejects expired - _login_openai_codex() verifies resolved token isn't expiring before accepting existing credentials - _run_codex_stream() logs response.incomplete/failed terminal events with status and incomplete_details for diagnostics - Codex empty output recovery: captures streamed text during streaming and synthesizes a response when get_final_response() returns empty output (handles chatgpt.com backend-api edge cases)	2026-04-06 18:10:33 -07:00
Teknium	8cf013ecd9	fix: replace stale 'hermes login' refs with 'hermes auth' + fix credential removal re-seeding (#5670 ) Two fixes: 1. Replace all stale 'hermes login' references with 'hermes auth' across auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py, and documentation. The 'hermes login' command was deprecated; 'hermes auth' now handles OAuth credential management. 2. Fix credential removal not persisting for singleton-sourced credentials (device_code for openai-codex/nous, hermes_pkce for anthropic). auth_remove_command already cleared env vars for env-sourced credentials, but singleton credentials stored in the auth store were re-seeded by _seed_from_singletons() on the next load_pool() call. Now clears the underlying auth store entry when removing singleton-sourced credentials.	2026-04-06 17:17:57 -07:00
tymrtn	40527ff5e3	fix(auth): actionable error message when Codex refresh token is reused When the Codex CLI (or VS Code extension) consumes a refresh token before Hermes can use it, Hermes previously surfaced a generic 401 error with no actionable guidance. - In `refresh_codex_oauth_pure`: detect `refresh_token_reused` from the OAuth endpoint and raise an AuthError explaining the cause and the exact steps to recover (run `codex` to refresh, then `hermes login`). - In `run_agent.py`: when provider is `openai-codex` and HTTP 401 is received, show Codex-specific recovery steps instead of the generic "check your API key" message. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:50:10 -07:00
Julien Talbot	92c19924a9	feat: add xAI prompt caching via x-grok-conv-id header When using xAI's API directly (base_url contains x.ai), send the x-grok-conv-id header set to the Hermes session_id. This routes consecutive requests to the same server, maximizing automatic prompt cache hits. Ref: https://docs.x.ai/developers/advanced-api-usage/prompt-caching	2026-04-06 12:06:33 -07:00
ClintonEmok	f77be22c65	Fix #5211 : Preserve dots in OpenCode Go model names OpenCode Go model names with dots (minimax-m2.7, glm-4.5, kimi-k2.5) were being mangled to hyphens (minimax-m2-7), causing HTTP 401 errors. Two code paths were affected: 1. model_normalize.py: opencode-go was incorrectly in DOT_TO_HYPHEN_PROVIDERS 2. run_agent.py: _anthropic_preserve_dots() did not check for opencode-go Fix: - Remove opencode-go from _DOT_TO_HYPHEN_PROVIDERS (dots are correct for Go) - Add opencode-go to _anthropic_preserve_dots() provider check - Add opencode.ai/zen/go to base_url fallback check - Add regression tests in tests/test_model_normalize.py Co-authored-by: jacob3712 <jacob3712@users.noreply.github.com>	2026-04-06 11:25:06 -07:00
Teknium	9c96f669a1	feat: centralized logging, instrumentation, hermes logs CLI, gateway noise fix (#5430 ) Adds comprehensive logging infrastructure to Hermes Agent across 4 phases: Phase 1 — Centralized logging - New hermes_logging.py with idempotent setup_logging() used by CLI, gateway, and cron - agent.log (INFO+) and errors.log (WARNING+) with RotatingFileHandler + RedactingFormatter - config.yaml logging: section (level, max_size_mb, backup_count) - All entry points wired (cli.py, main.py, gateway/run.py, run_agent.py) - Fixed debug_helpers.py writing to ./logs/ instead of ~/.hermes/logs/ Phase 2 — Event instrumentation - API calls: model, provider, tokens, latency, cache hit % - Tool execution: name, duration, result size (both sequential + concurrent) - Session lifecycle: turn start (session/model/provider/platform), compression (before/after) - Credential pool: rotation events, exhaustion tracking Phase 3 — hermes logs CLI command - hermes logs / hermes logs -f / hermes logs errors / hermes logs gateway - --level, --session, --since filters - hermes logs list (file sizes + ages) Phase 4 — Gateway bug fix + noise reduction - fix: _async_flush_memories() called with wrong arg count — sessions never flushed - Batched session expiry logs: 6 lines/cycle → 2 summary lines - Added inbound message + response time logging 75 new tests, zero regressions on the full suite.	2026-04-06 00:08:20 -07:00
Teknium	dc9c3cac87	chore: remove redundant local import of normalize_usage Already imported at module level (line 94). The local import inside _usage_summary_for_api_request_hook was unnecessary.	2026-04-05 23:31:29 -07:00
kshitijk4poor	f530ef1835	feat(plugins): pre_api_request/post_api_request with narrow payloads - Rename per-LLM-call hooks from pre_llm_request/post_llm_request for clarity vs pre_llm_call - Emit summary kwargs only (counts, usage dict from normalize_usage); keep env_var_enabled for HERMES_DUMP_REQUESTS - Add is_truthy_value/env_var_enabled to utils; wire hermes_cli.plugins._env_enabled through it - Update Langfuse local setup doc; add scripts/langfuse_smoketest.py and optional ~/.hermes plugin tests Made-with: Cursor	2026-04-05 23:31:29 -07:00
kshitijk4poor	9e820dda37	Add request-scoped plugin lifecycle hooks	2026-04-05 23:31:29 -07:00
Teknium	9ca954a274	fix: mem0 API v2 compat, prefetch context fencing, secret redaction (#5423 ) Consolidated salvage from PRs #5301 (qaqcvc), #5339 (lance0), #5058 and #5098 (maymuneth). Mem0 API v2 compatibility (#5301): - All reads use filters={user_id: ...} instead of bare user_id= kwarg - All writes use filters with user_id + agent_id for attribution - Response unwrapping for v2 dict format {results: [...]} - Split _read_filters() vs _write_filters() — reads are user-scoped only for cross-session recall, writes include agent_id - Preserved 'hermes-user' default (no breaking change for existing users) - Omitted run_id scoping from #5301 — cross-session memory is Mem0's core value, session-scoping reads would defeat that purpose Memory prefetch context fencing (#5339): - Wraps prefetched memory in <memory-context> fenced blocks with system note marking content as recalled context, NOT user input - Sanitizes provider output to strip fence-escape sequences, preventing injection where memory content breaks out of the fence - API-call-time only — never persisted to session history Secret redaction (#5058, #5098): - Added prefix patterns for Groq (gsk_), Matrix (syt_), RetainDB (retaindb_), Hindsight (hsk-), Mem0 (mem0_), ByteRover (brv_)	2026-04-05 22:43:33 -07:00
Teknium	0efe7dace7	feat: add GPT/Codex execution discipline guidance for tool persistence (#5414 ) Adds OPENAI_MODEL_EXECUTION_GUIDANCE — XML-tagged behavioral guidance injected for GPT and Codex models alongside the existing tool-use enforcement. Targets four specific failure modes: - <tool_persistence>: retry on empty/partial results instead of giving up - <prerequisite_checks>: do discovery/lookup before jumping to final action - <verification>: check correctness/grounding/formatting before finalizing - <missing_context>: use lookup tools instead of hallucinating Follows the same injection pattern as GOOGLE_MODEL_OPERATIONAL_GUIDANCE for Gemini/Gemma models. Inspired by OpenClaw PR #38953 and OpenAI's GPT-5.4 prompting guide patterns.	2026-04-05 21:51:07 -07:00
Teknium	2563493466	fix: improve timeout debug logging and user-facing diagnostics (#5370 ) Agent activity tracking: - Add _last_activity_ts, _last_activity_desc, _current_tool to AIAgent - Touch activity on: API call start/complete, tool start/complete, first stream chunk, streaming request start - Public get_activity_summary() method for external consumers Gateway timeout diagnostics: - Timeout message now includes what the agent was doing when killed: actively working vs stuck on a tool vs waiting on API response - Includes iteration count, last activity description, seconds since last activity — users can distinguish legitimate long tasks from genuine hangs - 'Still working' notifications now show iteration count and current tool instead of just elapsed time - Stale lock eviction logs include agent activity state for debugging Stream stale timeout: - _emit_status when stale stream is detected (was log-only) — gateway users now see 'No response from provider for Ns' with model and context size - Improved logger.warning with model name and estimated context size Error path notifications (gateway-visible via _emit_status): - Context compression attempts now use _emit_status (was _vprint only) - Non-retryable client errors emit summary before aborting - Max retry exhaustion emits error summary (was _vprint only) - Rate limit exhaustion emits specific rate-limit message These were all CLI-visible but silent to gateway users, which is why people on Telegram/Discord saw generic 'request failed' messages without explanation.	2026-04-05 18:33:33 -07:00
donrhmexe	7409715947	fix: link subagent sessions to parent and hide from session list Subagent sessions spawned by delegate_task were created with parent_session_id=NULL and source=cli, making them indistinguishable from user sessions in hermes sessions list and /resume. Changes: - delegate_tool.py: pass parent_agent.session_id to child agent - run_agent.py: accept parent_session_id param, pass to create_session - hermes_state.py list_sessions_rich: filter parent_session_id IS NULL by default (opt-in include_children=True for callers that need them) - hermes_state.py delete_session: delete child sessions first (FK) - hermes_state.py prune_sessions: delete children before parents (FK) session_search already handles parent_session_id correctly — child sessions are filtered from recent list and resolved to parent root in full-text search results. Fixes #5122	2026-04-05 12:48:50 -07:00
Teknium	12724e6295	feat: progressive subdirectory hint discovery (#5291 ) As the agent navigates into subdirectories via tool calls (read_file, terminal, search_files, etc.), automatically discover and load project context files (AGENTS.md, CLAUDE.md, .cursorrules) from those directories. Previously, context files were only loaded from the CWD at session start. If the agent moved into backend/, frontend/, or any subdirectory with its own AGENTS.md, those instructions were never seen. Now, SubdirectoryHintTracker watches tool call arguments for file paths and shell commands, resolves directories, and loads hint files on first access. Discovered hints are appended to the tool result so the model gets relevant context at the moment it starts working in a new area — without modifying the system prompt (preserving prompt caching). Features: - Extracts paths from tool args (path, workdir) and shell commands - Loads AGENTS.md, CLAUDE.md, .cursorrules (first match per directory) - Deduplicates — each directory loaded at most once per session - Ignores paths outside the working directory - Truncates large hint files at 8K chars - Works on both sequential and concurrent tool execution paths Inspired by Block/goose SubdirectoryHintTracker.	2026-04-05 12:33:47 -07:00
Mibayy	cc2b56b26a	feat(api): structured run events via /v1/runs SSE endpoint Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events for SSE streaming of typed lifecycle events (tool.started, tool.completed, message.delta, reasoning.available, run.completed, run.failed). Changes the internal tool_progress_callback signature from positional (tool_name, preview, args) to event-type-first (event_type, tool_name, preview, args, **kwargs). Existing consumers filter on event_type and remain backward-compatible. Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep. Fixes logic inversion in cli.py _on_tool_progress where the original PR would have displayed internal tools instead of non-internal ones. Co-authored-by: Mibayy <mibayy@users.noreply.github.com>	2026-04-05 12:05:13 -07:00
Git-on-my-level	fcdd5447e2	fix: keep ACP stdout protocol-clean Route AIAgent print output to stderr via _print_fn for ACP stdio sessions. Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC on stdout isn't corrupted. Child agents inherit the redirect. Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>	2026-04-05 12:05:13 -07:00
Teknium	a0a1b86c2e	fix: accept reasoning-only responses without retries — set content to "(empty)" (#5278 ) * feat: coerce tool call arguments to match JSON Schema types LLMs frequently return numbers as strings ("42" instead of 42) and booleans as strings ("true" instead of true). This causes silent failures with MCP tools and any tool with strictly-typed parameters. Added coerce_tool_args() in model_tools.py that runs before every tool dispatch. For each argument, it checks the tool registry schema and attempts safe coercion: - "42" → 42 when schema says "type": "integer" - "3.14" → 3.14 when schema says "type": "number" - "true"/"false" → True/False when schema says "type": "boolean" - Union types tried in order - Original values preserved when coercion fails or is not applicable Inspired by Block/goose tool argument coercion system. * fix: accept reasoning-only responses without retries — set content to "(empty)" Previously, when a model returned reasoning/thinking but no visible content, we entered a 120-line retry/classify/compress/salvage cascade that wasted 3+ API calls trying to "fix" the response. The model was done thinking — retrying with the same input just burned money. Now reasoning-only responses are accepted immediately: - Reasoning stays in the `reasoning` field (semantically correct) - Content set to "(empty)" — valid non-empty string every provider accepts - No retries, no compression triggers, no salvage logic - Session history contains "(empty)" not "" — prevents #2128 session poisoning where empty assistant content caused prefill rejections Removes ~120 lines, adds ~15. Saves 2-3 API calls per reasoning-only response. Fixes #2128.	2026-04-05 11:30:52 -07:00
LucidPaths	70f798043b	fix: Ollama Cloud auth, /model switch persistence, and alias tab completion - Add OLLAMA_API_KEY to credential resolution chain for ollama.com endpoints - Update requested_provider/_explicit_api_key/_explicit_base_url after /model switch so _ensure_runtime_credentials() doesn't revert the switch - Pass base_url/api_key from fallback config to resolve_provider_client() - Add DirectAlias system: user-configurable model_aliases in config.yaml checked before catalog resolution, with reverse lookup by model ID - Add /model tab completion showing aliases with provider metadata Co-authored-by: LucidPaths <LucidPaths@users.noreply.github.com>	2026-04-05 11:06:06 -07:00
Teknium	51ed7dc2f3	feat: save oversized tool results to file instead of destructive truncation (#5210 ) Previously, tool results exceeding 100K characters were silently chopped with only a '[Truncated]' notice — the rest of the content was lost permanently. The model had no way to access the truncated portion. Now, oversized results are written to HERMES_HOME/cache/tool_responses/ and the model receives: - A 1,500-char head preview for immediate context - The file path so it can use read_file/search_files on the full output This preserves the context window protection (inline content stays small) while making the full data recoverable. Falls back to the old destructive truncation if the file write fails. Inspired by Block/goose's large response handler pattern.	2026-04-05 10:29:57 -07:00
Teknium	4976a8b066	feat: /model command — models.dev primary database + --provider flag (#5181 ) Full overhaul of the model/provider system. ## What changed - models.dev (109 providers, 4000+ models) as primary database for provider identity AND model metadata - --provider flag replaces colon syntax for explicit provider switching - Full ModelInfo/ProviderInfo dataclasses with context, cost, capabilities, modalities - HermesOverlay system merges models.dev + Hermes-specific transport/auth/aggregator flags - User-defined endpoints via config.yaml providers: section - /model (no args) lists authenticated providers with curated model catalog - Rich metadata display: context window, max output, cost/M tokens, capabilities - Config migration: custom_providers list → providers dict (v11→v12) - AIAgent.switch_model() for in-place model swap preserving conversation ## Files agent/models_dev.py, hermes_cli/providers.py, hermes_cli/model_switch.py, hermes_cli/model_normalize.py, cli.py, gateway/run.py, run_agent.py, hermes_cli/config.py, hermes_cli/commands.py	2026-04-05 01:04:44 -07:00
Teknium	441ec48802	style: use module-level re import instead of local import re as _re	2026-04-05 00:20:53 -07:00
kshitijk4poor	65952ac00c	Honor provider reset windows in pooled credential failover Persist structured exhaustion metadata from provider errors, use explicit reset timestamps when available, and expose label-based credential targeting in the auth CLI. This keeps long-lived Codex cooldowns from being misreported as one-hour waits and avoids forcing operators to manage entries by list position alone. Constraint: Existing credential pool JSON needs to remain backward compatible with stored entries that only record status code and timestamp Constraint: Runtime recovery must keep the existing retry-then-rotate semantics for 429s while enriching pool state with provider metadata Rejected: Add a separate credential scheduler subsystem \| too large for the Hermes pool architecture and unnecessary for this fix Rejected: Only change CLI formatting \| would leave runtime rotation blind to resets_at and preserve the serial-failure behavior Confidence: high Scope-risk: moderate Reversibility: clean Directive: Preserve structured rate-limit metadata when new providers expose reset hints; do not collapse back to status-code-only exhaustion tracking Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (272 passed); py_compile on changed Python files; hermes -w auth list/remove smoke test with temporary HERMES_HOME Not-tested: Full repository pytest suite, broader gateway/integration flows outside the touched auth and pool paths	2026-04-05 00:20:53 -07:00
Lume	ed4a605696	docs: update docstring to mention Fireworks strict validation Updates _sanitize_tool_calls_for_strict_api docstring to explicitly mention Fireworks alongside Mistral as strict APIs requiring sanitization. Also documents the specific fields that are stripped (call_id, response_item_id).	2026-04-05 00:13:25 -07:00
Lume	d90035835b	refactor: use _should_sanitize_tool_calls in run_conversation() Replaces hardcoded Mistral check with the new _should_sanitize_tool_calls() method. Updates comment to mention Fireworks alongside Mistral as strict APIs requiring tool_call field sanitization.	2026-04-05 00:13:25 -07:00
Lume	234c01f690	refactor: use _should_sanitize_tool_calls in _handle_max_iterations() Replaces hardcoded Mistral check with the new _should_sanitize_tool_calls() method. Ensures summary generation works correctly with Fireworks and other strict APIs that reject unknown tool_call fields.	2026-04-05 00:13:25 -07:00
Lume	7f6e509199	refactor: use _should_sanitize_tool_calls in flush_memories() Replaces hardcoded Mistral check with the new _should_sanitize_tool_calls() method. This ensures tool_calls are sanitized for all strict APIs, not just Mistral. Prevents 400 errors from Fireworks and other providers.	2026-04-05 00:13:25 -07:00
Lume	560c6ae143	feat: add _should_sanitize_tool_calls() method Adds a centralized method to determine when tool_calls need sanitization for strict APIs. Returns True for all APIs except codex_responses mode. This prevents 400 errors from providers like Fireworks that reject unknown fields (call_id, response_item_id) in tool_calls.	2026-04-05 00:13:25 -07:00
Teknium	5879b3ef82	fix: move pre_llm_call plugin context to user message, preserve prompt cache (#5146 ) Plugin context from pre_llm_call hooks was injected into the system prompt, breaking the prompt cache prefix every turn when content changed (typical for memory plugins). Now all plugin context goes into the current turn's user message — the system prompt stays identical across turns, preserving cached tokens. The system prompt is reserved for Hermes internals. Plugins contribute context alongside the user's input. Also adds comprehensive documentation for all 6 plugin hooks: pre_tool_call, post_tool_call, pre_llm_call, post_llm_call, on_session_start, on_session_end — each with full callback signatures, parameter tables, firing conditions, and examples. Supersedes #5138 which identified the same cache-busting bug and proposed an uncached system suffix approach. This fix goes further by removing system prompt injection entirely. Co-identified-by: OutThisLife (PR #5138)	2026-04-04 16:55:44 -07:00

... 4 5 6 7 8 ...

752 commits