hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-25 00:51:20 +00:00

Author	SHA1	Message	Date
Teknium	a37a095980	fix: detect qwen-oauth provider via CLI tokens in /model picker Seed qwen-oauth credentials from resolve_qwen_runtime_credentials() in _seed_from_singletons(). Users who authenticate via 'qwen auth qwen-oauth' store tokens in ~/.qwen/oauth_creds.json which the runtime resolver reads but the credential pool couldn't detect — same gap pattern as copilot. Uses refresh_if_expiring=False to avoid network calls during discovery.	2026-04-14 11:16:26 -07:00
Marvae	0bd3f521ae	fix: detect copilot provider via gh auth token in /model picker Seed copilot credentials from resolve_copilot_token() in the credential pool's _seed_from_singletons(), alongside the existing anthropic and openai-codex seeding logic. This makes copilot appear in the /model provider picker when the user authenticates solely through gh auth token. Cherry-picked from PR #9767 by Marvae.	2026-04-14 11:16:26 -07:00
N0nb0at	b21b3bfd68	feat(plugins): namespaced skill registration for plugin skill bundles Add ctx.register_skill() API so plugins can ship SKILL.md files under a 'plugin:skill' namespace, preventing name collisions with built-in Hermes skills. skill_view() detects the ':' separator and routes to the plugin registry while bare names continue through the existing flat-tree scan unchanged. Key additions: - agent/skill_utils: parse_qualified_name(), is_valid_namespace() - hermes_cli/plugins: PluginContext.register_skill(), PluginManager skill registry (find/list/remove) - tools/skills_tool: qualified name dispatch in skill_view(), _serve_plugin_skill() with full guards (disabled, platform, injection scan), bundle context banner with sibling listing, stale registry self-heal - Hoisted _INJECTION_PATTERNS to module level (dedup) - Updated skill_view schema description Based on PR #9334 by N0nb0at. Lean P1 salvage — omits autogen shim (P2) for a simpler first merge. Closes #8422	2026-04-14 10:42:58 -07:00
walli	884cd920d4	feat(gateway): unify QQBot branding, add PLATFORM_HINTS, fix streaming, restore missing setup functions - Rename platform from 'qq' to 'qqbot' across all integration points (Platform enum, toolset, config keys, import paths, file rename qq.py → qqbot.py) - Add PLATFORM_HINTS for QQBot in prompt_builder (QQ supports markdown) - Set SUPPORTS_MESSAGE_EDITING = False to skip streaming on QQ (prevents duplicate messages from non-editable partial + final sends) - Add _send_qqbot() standalone send function for cron/send_message tool - Add interactive _setup_qq() wizard in hermes_cli/setup.py - Restore missing _setup_signal/email/sms/dingtalk/feishu/wecom/wecom_callback functions that were lost during the original merge	2026-04-14 00:11:49 -07:00
Kenny Xie	cdd44817f2	fix(anthropic): send fast mode speed via extra_body	2026-04-13 22:32:39 -07:00
Teknium	943c01536f	feat: add openrouter/elephant-alpha to curated model lists (#9378 ) * Add hermes debug share instructions to all issue templates - bug_report.yml: Add required Debug Report section with hermes debug share and /debug instructions, make OS/Python/Hermes version optional (covered by debug report), demote old logs field to optional supplementary - setup_help.yml: Replace hermes doctor reference with hermes debug share, add Debug Report section with fallback chain (debug share -> --local -> doctor) - feature_request.yml: Add optional Debug Report section for environment context All templates now guide users to run hermes debug share (or /debug in chat) and paste the resulting paste.rs links, giving maintainers system info, config, and recent logs in one step. * feat: add openrouter/elephant-alpha to curated model lists - Add to OPENROUTER_MODELS (free, positioned above GPT models) - Add to _PROVIDER_MODELS["nous"] mirror list - Add 256K context window fallback in model_metadata.py	2026-04-13 21:16:14 -07:00
Teknium	d15efc9c1b	fix: correct GPT-5 family context lengths in fallback defaults (#9309 ) The generic 'gpt-5' fallback was set to 128,000 — which is the max OUTPUT tokens, not the context window. GPT-5 base and most variants (codex, mini) have 400,000 context. This caused /model to report 128k for models like gpt-5.3-codex when models.dev was unavailable. Added specific entries for GPT-5 variants with different context sizes: - gpt-5.4, gpt-5.4-pro: 1,050,000 (1.05M) - gpt-5.4-mini, gpt-5.4-nano: 400,000 - gpt-5.3-codex-spark: 128,000 (reduced) - gpt-5.1-chat: 128,000 (chat variant) - gpt-5 (catch-all): 400,000 Sources: https://developers.openai.com/api/docs/models	2026-04-13 19:22:23 -07:00
Teknium	f324222b79	fix: add vLLM/local server error patterns + MCP initial connection retry (#9281 ) Port two improvements inspired by Kilo-Org/kilocode analysis: 1. Error classifier: add context overflow patterns for vLLM, Ollama, and llama.cpp/llama-server. These local inference servers return different error formats than cloud providers (e.g., 'exceeds the max_model_len', 'context length exceeded', 'slot context'). Without these patterns, context overflow errors from local servers are misclassified as format errors, causing infinite retries instead of triggering compression. 2. MCP initial connection retry: previously, if the very first connection attempt to an MCP server failed (e.g., transient DNS blip at startup), the server was permanently marked as failed with no retry. Post-connect reconnection had 5 retries with exponential backoff, but initial connection had zero. Now initial connections retry up to 3 times with backoff before giving up, matching the resilience of post-connect reconnection. (Inspired by Kilo Code's MCP server disappearing fix in v1.3.3) Tests: 6 new error classifier tests, 4 new MCP retry tests, 1 updated existing test. All 276 affected tests pass.	2026-04-13 18:46:14 -07:00
arthurbr11	0a4cf5b3e1	feat(providers): add Arcee AI as direct API provider Adds Arcee AI as a standard direct provider (ARCEEAI_API_KEY) with Trinity models: trinity-large-thinking, trinity-large-preview, trinity-mini. Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py, main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py, setup.py, trajectory_compressor.py. Based on PR #9274 by arthurbr11, simplified to a standard direct provider without dual-endpoint OpenRouter routing.	2026-04-13 18:40:06 -07:00
Teknium	8d023e43ed	refactor: remove dead code — 1,784 lines across 77 files (#9180 ) Deep scan with vulture, pyflakes, and manual cross-referencing identified: - 41 dead functions/methods (zero callers in production) - 7 production-dead functions (only test callers, tests deleted) - 5 dead constants/variables - ~35 unused imports across agent/, hermes_cli/, tools/, gateway/ Categories of dead code removed: - Refactoring leftovers: _set_default_model, _setup_copilot_reasoning_selection, rebuild_lookups, clear_session_context, get_logs_dir, clear_session - Unused API surface: search_models_dev, get_pricing, skills_categories, get_read_files_summary, clear_read_tracker, menu_labels, get_spinner_list - Dead compatibility wrappers: schedule_cronjob, list_cronjobs, remove_cronjob - Stale debug helpers: get_debug_session_info copies in 4 tool files (centralized version in debug_helpers.py already exists) - Dead gateway methods: send_emote, send_notice (matrix), send_reaction (bluebubbles), _normalize_inbound_text (feishu), fetch_room_history (matrix), _start_typing_indicator (signal), parse_feishu_post_content - Dead constants: NOUS_API_BASE_URL, SKILLS_TOOL_DESCRIPTION, FILE_TOOLS, VALID_ASPECT_RATIOS, MEMORY_DIR - Unused UI code: _interactive_provider_selection, _interactive_model_selection (superseded by prompt_toolkit picker) Test suite verified: 609 tests covering affected files all pass. Tests for removed functions deleted. Tests using removed utilities (clear_read_tracker, MEMORY_DIR) updated to use internal APIs directly.	2026-04-13 16:32:04 -07:00
Teknium	b27eaaa4db	fix: improve ACP type check and restore comment accuracy - Use isinstance() with try/except import for CopilotACPClient check in _to_async_client instead of fragile __class__.__name__ string check - Restore accurate comment: GPT-5.x models require (not 'often require') the Responses API on OpenAI/OpenRouter; ACP is the exception, not a softening of the requirement - Add inline comment explaining the ACP exclusion rationale	2026-04-13 16:17:43 -07:00
helix4u	8680f61f8b	fix(copilot-acp): keep acp runtime off responses path	2026-04-13 16:17:43 -07:00
Teknium	0e60a9dc25	fix: add kimi-coding-cn to remaining provider touchpoints Follow-up for salvaged PR #7637. Adds kimi-coding-cn to: - model_normalize.py (prefix strip) - providers.py (models.dev mapping) - runtime_provider.py (credential resolution) - setup.py (model list + setup label) - doctor.py (health check) - trajectory_compressor.py (URL detection) - models_dev.py (registry mapping) - integrations/providers.md (docs)	2026-04-13 11:20:37 -07:00
hcshen0111	2b3aa36242	feat(providers): add kimi-coding-cn provider for mainland China users Cherry-picked from PR #7637 by hcshen0111. Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var and api.moonshot.cn/v1 endpoint for China-region Moonshot users.	2026-04-13 11:20:37 -07:00
墨綠BG	c449cd1af5	fix(config): restore custom providers after v11→v12 migration The v11→v12 migration converts custom_providers (list) into providers (dict), then deletes the list. But all runtime resolvers read from custom_providers — after migration, named custom endpoints silently stop resolving and fallback chains fail with AuthError. Add get_compatible_custom_providers() that reads from both config schemas (legacy custom_providers list + v12+ providers dict), normalizes entries, deduplicates, and returns a unified list. Update ALL consumers: - hermes_cli/runtime_provider.py: _get_named_custom_provider() + key_env - hermes_cli/auth_commands.py: credential pool provider names - hermes_cli/main.py: model picker + _model_flow_named_custom() - agent/auxiliary_client.py: key_env + custom_entry model fallback - agent/credential_pool.py: _iter_custom_providers() - cli.py + gateway/run.py: /model switch custom_providers passthrough - run_agent.py + gateway/run.py: per-model context_length lookup Also: use config.pop() instead of del for safer migration, fix stale _config_version assertions in tests, add pool mock to codex test. Co-authored-by: 墨綠BG <s5460703@gmail.com> Closes #8776, salvaged from PR #8814	2026-04-13 10:50:52 -07:00
luyao618	8ec1608642	fix(agent): propagate api_mode to vision provider resolution resolve_vision_provider_client() computed resolved_api_mode from config but never passed it to downstream resolve_provider_client() or _get_cached_client() calls, causing custom providers with api_mode: anthropic_messages to crash when used for vision tasks. Also remove the for_vision special case in _normalize_aux_provider() that incorrectly discarded named custom provider identifiers. Fixes #8857 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 05:02:54 -07:00
Teknium	e3ffe5b75f	fix: remove legacy compression.summary_* config and env var fallbacks (#8992 ) Remove the backward-compat code paths that read compression provider/model settings from legacy config keys and env vars, which caused silent failures when auto-detection resolved to incompatible backends. What changed: - Remove compression.summary_model, summary_provider, summary_base_url from DEFAULT_CONFIG and cli.py defaults - Remove backward-compat block in _resolve_task_provider_model() that read from the legacy compression section - Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper functions (AUXILIARY_/CONTEXT_ env var readers) - Remove env var fallback chain for per-task overrides - Update hermes config show to read from auxiliary.compression - Add config migration (v16→17) that moves non-empty legacy values to auxiliary.compression and strips the old keys - Update example config and openclaw migration script - Remove/update tests for deleted code paths Compression model/provider is now configured exclusively via: auxiliary.compression.provider / auxiliary.compression.model Closes #8923	2026-04-13 04:59:26 -07:00
Richard Li	82901695ff	feat(wecom): add platform hint for native media sending	2026-04-13 04:46:04 -07:00
ismell0992-afk	3e99964789	fix(agent): prefer Ollama Modelfile num_ctx over GGUF training max _query_local_context_length was checking model_info.context_length (the GGUF training max) before num_ctx (the Modelfile runtime override), inverse to query_ollama_num_ctx. The two helpers therefore disagreed on the same model: hermes-brain:qwen3-14b-ctx32k # Modelfile: num_ctx 32768 underlying qwen3:14b GGUF # qwen3.context_length: 40960 query_ollama_num_ctx correctly returned 32768 (the value Ollama will actually allocate KV cache for). _query_local_context_length returned 40960, which let ContextCompressor grow conversations past 32768 before triggering compression — at which point Ollama silently truncated the prefix, corrupting context. Swap the order so num_ctx is checked first, matching query_ollama_num_ctx. Adds a parametrized test that seeds both values and asserts num_ctx wins. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-13 04:24:07 -07:00
Teknium	400fe9b2a1	fix: add <thought> stripping to auxiliary_client + tests auxiliary_client.py had its own regex mirroring _strip_think_blocks but was missing the <thought> variant. Also adds test coverage for <thought> paired and orphaned tags.	2026-04-12 12:44:49 -07:00
Teknium	7a67b13506	fix: title_generator no longer logs as 'compression' task Changed task='compression' to task='title_generation' so auto-title calls don't pollute logs with false compression alarms.	2026-04-12 04:17:18 -07:00
Teknium	17c72f176d	fix: make skill loading instructions more aggressive in system prompt (#8286 ) The previous wording ('If one clearly matches') set too high a threshold, and 'If none match, proceed normally' was an easy escape hatch for lazy models. Now: - Lowered threshold: 'matches or is even partially relevant' - Added MUST directive and 'err on the side of loading' guidance - Replaced permissive closer with 'only proceed without if genuinely none are relevant' This should reduce cases where the agent skips loading relevant skills unless explicitly forced.	2026-04-12 03:03:16 -07:00
Teknium	b321330362	feat: add WSL environment hint to system prompt (#8285 ) When running inside WSL (Windows Subsystem for Linux), inject a hint into the system prompt explaining that the Windows host filesystem is mounted at /mnt/c/, /mnt/d/, etc. This lets the agent naturally translate Windows paths (Desktop, Documents) to their /mnt/ equivalents without the user needing to configure anything. Uses the existing is_wsl() detection from hermes_constants (cached, checks /proc/version for 'microsoft'). Adds build_environment_hints() in prompt_builder.py — extensible for Termux, Docker, etc. later. Closes the UX gap where WSL users had to manually explain path translation to the agent every session.	2026-04-12 02:26:28 -07:00
Teknium	95fa78eb6c	fix: write refreshed Codex tokens back to ~/.codex/auth.json (#8277 ) OpenAI OAuth refresh tokens are single-use and rotate on every refresh. When Hermes refreshes a Codex token, it consumed the old refresh_token but never wrote the new pair back to ~/.codex/auth.json. This caused Codex CLI and VS Code to fail with 'refresh_token_reused' on their next refresh attempt. This mirrors the existing Anthropic write-back pattern where refreshed tokens are written to ~/.claude/.credentials.json via _write_claude_code_credentials(). Changes: - Add _write_codex_cli_tokens() in hermes_cli/auth.py (parallel to _write_claude_code_credentials in anthropic_adapter.py) - Call it from _refresh_codex_auth_tokens() (non-pool refresh path) - Call it from credential_pool._refresh_entry() (pool happy path + retry) - Add tests for the new write-back behavior - Update existing test docstring to clarify _save_codex_tokens vs _write_codex_cli_tokens separation Fixes refresh token conflict reported by @ec12edfae2cb221	2026-04-12 02:05:20 -07:00
Teknium	a1220977d3	fix: make skill loading instructions more aggressive in system prompt (#8209 ) The previous wording ('If one clearly matches') set too high a threshold, and 'If none match, proceed normally' was an easy escape hatch for lazy models. Now: - Lowered threshold: 'matches or is even partially relevant' - Added MUST directive and 'err on the side of loading' guidance - Replaced permissive closer with 'only proceed without if genuinely none are relevant' This should reduce cases where the agent skips loading relevant skills unless explicitly forced.	2026-04-12 01:46:34 -07:00
Teknium	078dba015d	fix: three provider-related bugs (#8161 , #8181 , #8147 ) (#8243 ) - Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV so context-length lookups use models.dev data instead of 128k fallback. Fixes #8161. - Set api_mode from custom_providers entry when switching via hermes model, and clear stale api_mode when the entry has none. Also extract api_mode in _named_custom_provider_map(). Fixes #8181. - Convert OpenAI image_url content blocks to Anthropic image blocks when the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL containing /anthropic). Fixes #8147.	2026-04-12 01:44:18 -07:00
Harish Kukreja	b1f13a8c5f	fix(agent): route compression aux through live session runtime	2026-04-12 01:34:52 -07:00
Teknium	eb2a49f95a	fix: openai-codex and anthropic not appearing in /model picker for external credentials (#8224 ) Users whose credentials exist only in external files — OpenAI Codex OAuth tokens in ~/.codex/auth.json or Anthropic Claude Code credentials in ~/.claude/.credentials.json — would not see those providers in the /model picker, even though hermes auth and hermes model detected them. Root cause: list_authenticated_providers() only checked the raw Hermes auth store and env vars. External credential file fallbacks (Codex CLI import, Claude Code file discovery) were never triggered. Fix (three parts): 1. _seed_from_singletons() in credential_pool.py: openai-codex now imports from ~/.codex/auth.json when the Hermes auth store is empty, mirroring resolve_codex_runtime_credentials(). 2. list_authenticated_providers() in model_switch.py: auth store + pool checks now run for ALL providers (not just OAuth auth_type), catching providers like anthropic that support both API key and OAuth. 3. list_authenticated_providers(): direct check for anthropic external credential files (Claude Code, Hermes PKCE). The credential pool intentionally gates anthropic behind is_provider_explicitly_configured() to prevent auxiliary tasks from silently consuming tokens. The /model picker bypasses this gate since it is discovery-oriented.	2026-04-12 00:33:42 -07:00
Teknium	1cec910b6a	fix: improve context compaction to prevent model answering stale questions (#8107 ) After compression, models (especially Kimi 2.5) would sometimes respond to questions from the summary instead of the latest user message. This happened ~30% of the time on Telegram. Root cause: the summary's 'Next Steps' section read as active instructions, and the SUMMARY_PREFIX didn't explicitly tell the model to ignore questions in the summary. When the summary merged into the first tail message, there was no clear separator between historical context and the actual user message. Changes inspired by competitor analysis (Claude Code, OpenCode, Codex): 1. SUMMARY_PREFIX rewritten with explicit 'Do NOT answer questions from this summary — respond ONLY to the latest user message AFTER it' 2. Summarizer preamble (shared by both prompts) adds: - 'Do NOT respond to any questions' (from OpenCode's approach) - 'Different assistant' framing (from Codex) to create psychological distance between summary content and active conversation 3. New summary sections: - '## Resolved Questions' — tracks already-answered questions with their answers, preventing re-answering (from Claude Code's 'Pending user asks' pattern) - '## Pending User Asks' — explicitly marks unanswered questions - '## Remaining Work' replaces '## Next Steps' — passive framing avoids reading as active instructions 4. merge-summary-into-tail path now inserts a clear separator: '--- END OF CONTEXT SUMMARY — respond to the message below ---' 5. Iterative update prompt now instructs: 'Move answered questions to Resolved Questions' to maintain the resolved/pending distinction across multiple compactions.	2026-04-11 19:43:58 -07:00
Teknium	a0a02c1bc0	feat: /compress <focus> — guided compression with focus topic (#8017 ) Adds an optional focus topic to /compress: `/compress database schema` guides the summariser to preserve information related to the focus topic (60-70% of summary budget) while compressing everything else more aggressively. Inspired by Claude Code's /compact <focus>. Changes: - context_compressor.py: focus_topic parameter on _generate_summary() and compress(); appends FOCUS TOPIC guidance block to the LLM prompt - run_agent.py: focus_topic parameter on _compress_context(), passed through to the compressor - cli.py: _manual_compress() extracts focus topic from command string, preserves existing manual_compression_feedback integration (no regression) - gateway/run.py: _handle_compress_command() extracts focus from event args and passes through — full gateway parity - commands.py: args_hint="[focus topic]" on /compress CommandDef Salvaged from PR #7459 (CLI /compress focus only — /context command deferred). 15 new tests across CLI, compressor, and gateway.	2026-04-11 19:23:29 -07:00
Teknium	5c2ecdec49	fix: use ceiling division for token estimation, deduplicate inline formula Switch estimate_tokens_rough(), estimate_messages_tokens_rough(), and estimate_request_tokens_rough() from floor division (len // 4) to ceiling division ((len + 3) // 4). Short texts (1-3 chars) previously estimated as 0 tokens, causing the compressor and pre-flight checks to systematically undercount when many short tool results are present. Also replaced the inline duplicate formula in run_conversation() (total_chars // 4) with a call to the shared estimate_messages_tokens_rough() function. Updated 4 tests that hardcoded floor-division expected values. Related: issue #6217, PR #6629	2026-04-11 16:33:40 -07:00
Teknium	c8aff74632	fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking Three root causes of the 'agent stops mid-task' gateway bug: 1. Compression threshold floor (64K tokens minimum) - The 50% threshold on a 100K-context model fired at 50K tokens, causing premature compression that made models lose track of multi-step plans. Now threshold_tokens = max(50% * context, 64K). - Models with <64K context are rejected at startup with a clear error. 2. Budget warning removal — grace call instead - Removed the 70%/90% iteration budget warnings entirely. These injected '[BUDGET WARNING: Provide your final response NOW]' into tool results, causing models to abandon complex tasks prematurely. - Now: no warnings during normal execution. When the budget is actually exhausted (90/90), inject a user message asking the model to summarise, allow one grace API call, and only then fall back to _handle_max_iterations. 3. Activity touches during long terminal execution - _wait_for_process polls every 0.2s but never reported activity. The gateway's inactivity timeout (default 1800s) would fire during long-running commands that appeared 'idle.' - Now: thread-local activity callback fires every 10s during the poll loop, keeping the gateway's activity tracker alive. - Agent wires _touch_activity into the callback before each tool call. Also: docs update noting 64K minimum context requirement. Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).	2026-04-11 16:18:57 -07:00
Teknium	8c3935ebe8	fix: is_local_endpoint misses Docker/Podman DNS names (#7950 ) * fix(tools): neutralize shell injection in _write_to_sandbox via path quoting _write_to_sandbox interpolated storage_dir and remote_path directly into a shell command passed to env.execute(). Paths containing shell metacharacters (spaces, semicolons, $(), backticks) could trigger arbitrary command execution inside the sandbox. Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric + slashes/hyphens/dots) are left unmodified by shlex.quote, so existing behavior is unchanged. Paths with unsafe characters get single-quoted. Tests added for spaces, $(command) substitution, and semicolon injection. * fix: is_local_endpoint misses Docker/Podman DNS names host.docker.internal, host.containers.internal, gateway.docker.internal, and host.lima.internal are well-known DNS names that container runtimes use to resolve the host machine. Users running Ollama on the host with the agent in Docker/Podman hit the default 120s stream timeout instead of the bumped 1800s because these hostnames weren't recognized as local. Add _CONTAINER_LOCAL_SUFFIXES tuple and suffix check in is_local_endpoint(). Tests cover all three runtime families plus a negative case for domains that merely contain the suffix as a substring.	2026-04-11 14:46:18 -07:00
Teknium	04c1c5d53f	refactor: extract shared helpers to deduplicate repeated code patterns (#7917 ) * refactor: add shared helper modules for code deduplication New modules: - gateway/platforms/helpers.py: MessageDeduplicator, TextBatchAggregator, strip_markdown, ThreadParticipationTracker, redact_phone - hermes_cli/cli_output.py: print_info/success/warning/error, prompt helpers - tools/path_security.py: validate_within_dir, has_traversal_component - utils.py additions: safe_json_loads, read_json_file, read_jsonl, append_jsonl, env_str/lower/int/bool helpers - hermes_constants.py additions: get_config_path, get_skills_dir, get_logs_dir, get_env_path * refactor: migrate gateway adapters to shared helpers - MessageDeduplicator: discord, slack, dingtalk, wecom, weixin, mattermost - strip_markdown: bluebubbles, feishu, sms - redact_phone: sms, signal - ThreadParticipationTracker: discord, matrix - _acquire/_release_platform_lock: telegram, discord, slack, whatsapp, signal, weixin Net -316 lines across 19 files. * refactor: migrate CLI modules to shared helpers - tools_config.py: use cli_output print/prompt + curses_radiolist (-117 lines) - setup.py: use cli_output print helpers + curses_radiolist (-101 lines) - mcp_config.py: use cli_output prompt (-15 lines) - memory_setup.py: use curses_radiolist (-86 lines) Net -263 lines across 5 files. * refactor: migrate to shared utility helpers - safe_json_loads: agent/display.py (4 sites) - get_config_path: skill_utils.py, hermes_logging.py, hermes_time.py - get_skills_dir: skill_utils.py, prompt_builder.py - Token estimation dedup: skills_tool.py imports from model_metadata - Path security: skills_tool, cronjob_tools, skill_manager_tool, credential_files - Non-atomic YAML writes: doctor.py, config.py now use atomic_yaml_write - Platform dict: new platforms.py, skills_config + tools_config derive from it - Anthropic key: new get_anthropic_key() in auth.py, used by doctor/status/config/main * test: update tests for shared helper migrations - test_dingtalk: use _dedup.is_duplicate() instead of _is_duplicate() - test_mattermost: use _dedup instead of _seen_posts/_prune_seen - test_signal: import redact_phone from helpers instead of signal - test_discord_connect: _platform_lock_identity instead of _token_lock_identity - test_telegram_conflict: updated lock error message format - test_skill_manager_tool: 'escapes' instead of 'boundary' in error msgs	2026-04-11 13:59:52 -07:00
Teknium	976bad5bde	refactor(auxiliary): config.yaml takes priority over env vars for aux task settings (#7889 ) The auxiliary client previously checked env vars (AUXILIARY_{TASK}_PROVIDER, AUXILIARY_{TASK}_MODEL, etc.) before config.yaml's auxiliary.{task}.* section. This violated the project's '.env is for secrets only' policy — these are behavioral settings, not API keys. Flipped the resolution order in _resolve_task_provider_model(): 1. Explicit args (always win) 2. config.yaml auxiliary.{task}.* (PRIMARY) 3. Env var overrides (backward-compat fallback only) 4. 'auto' (full auto-detection chain) Env var reading code is kept for backward compatibility but config.yaml now takes precedence. Updated module docstring and function docstring. Also removed AUXILIARY_VISION_MODEL from _EXTRA_ENV_KEYS in config.py.	2026-04-11 11:21:59 -07:00
Teknium	d4bb44d4b9	docs: add Xiaomi MiMo to all provider docs + fix MiMo-V2-Flash ctx len - environment-variables.md: XIAOMI_API_KEY, XIAOMI_BASE_URL, provider list - cli-commands.md: --provider choices - integrations/providers.md: provider table, Chinese providers section, config example, base URL list, choosing table, fallback providers list - fallback-providers.md: supported providers table, auto-detection chain - Fix XiaomiMiMo/MiMo-V2-Flash context length 32768 → 256000 (OpenRouter entry)	2026-04-11 11:17:52 -07:00
kshitijk4poor	6693e2a497	feat(xiaomi): add Xiaomi MiMo as first-class provider Cherry-picked from PR #7702 by kshitijk4poor. Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models: - mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest) Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py, main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py, models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example. Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's main model. Non-vision aux uses the user's selected model. Added _PROVIDER_VISION_MODELS dict for provider-specific vision model overrides. On failure, falls back to aggregators (gemini flash) via existing fallback chain. Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000, mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000. 36 tests covering registry, aliases, auto-detect, credentials, models.dev, normalization, URL mapping, providers module, doctor, aux client, vision model override, and agent init.	2026-04-11 11:17:52 -07:00
kshitijk4poor	50bb4fe010	fix(vision): auto-resize oversized images, increase default timeout, fix vision capability detection Cherry-picked from PR #7749 by kshitijk4poor with modifications: - Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider) - Send images at full resolution first; only auto-resize to 5 MB on API failure - Add _is_image_size_error() helper to detect size-related API rejections - Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction - Fix get_model_capabilities() to check modalities.input for vision support - Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent) - Applied retry-with-resize to both vision_analyze_tool and browser_vision Closes #7740	2026-04-11 11:12:50 -07:00
kshitijk4poor	af9caec44f	fix(qwen): correct context lengths for qwen3-coder models and send max_tokens to portal Based on PR #7285 by @kshitijk4poor. Two bugs affecting Qwen OAuth users: 1. Wrong context window — qwen3-coder-plus showed 128K instead of 1M. Added specific entries before the generic qwen catch-all: - qwen3-coder-plus: 1,000,000 (corrected from PR's 1,048,576 per official Alibaba Cloud docs and OpenRouter) - qwen3-coder: 262,144 2. Random stopping — max_tokens was suppressed for Qwen Portal, so the server applied its own low default. Reasoning models exhaust that on thinking tokens. Now: honor explicit max_tokens, default to 65536 when unset. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-04-11 03:29:31 -07:00
aaronagent	307697688e	fix: prevent zombie processes, redact cron stderr, skip symlinks in skill enumeration process_registry.py: _reader_loop() has process.wait() after the try-except block (line 380). If the reader thread crashes with an unexpected exception (e.g. MemoryError, KeyboardInterrupt), control exits the except handler but skips wait() — leaving the child as a zombie process. Move wait() and the cleanup into a finally block so the child is always reaped. cron/scheduler.py: _run_job_script() only redacts secrets in stdout on the SUCCESS path (line 417-421). When a cron script fails (non-zero exit), both stdout and stderr are returned WITHOUT redaction (lines 407-413). A script that accidentally prints an API key to stderr during a failure would leak it into the LLM context. Move redaction before the success/failure branch so both paths benefit. skill_commands.py: _build_skill_message() enumerates supporting files using rglob("*") but only checks is_file() (line 171) without filtering symlinks. PR #6693 added symlink protection to scan_skill_commands() but missed this function. A malicious skill can create symlinks in references/ pointing to arbitrary files, exposing their paths (and potentially content via skill_view) to the LLM. Add is_symlink() check to match the guard in scan_skill_commands. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 02:03:20 -07:00
kshitijk4poor	c89719ad9c	fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161 )	2026-04-11 01:52:58 -07:00
kshitijk4poor	d3c5d65563	fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264 ) async_call_llm (and call_llm) can return non-OpenAI objects from custom providers or adapter shims, crashing downstream consumers with misleading AttributeError ('str' has no attribute 'choices'). Add _validate_llm_response() that checks the response has the expected .choices[0].message shape before returning. Wraps all return paths in call_llm, async_call_llm, and fallback paths. Fails fast with a clear RuntimeError identifying the task, response type, and a preview of the malformed payload. Closes #7264	2026-04-11 01:52:58 -07:00
ran	4f5e8b22a7	fix: drop incompatible model slugs on auxiliary client cache hit `resolve_provider_client()` already drops OpenRouter-format model slugs (containing "/") when the resolved provider is not OpenRouter (line 1097). However, `_get_cached_client()` returns `model or cached_default` directly on cache hits, bypassing this check entirely. When the main provider is openai-codex, the auto-detection chain (Step 1 of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary calls for different tasks (e.g. compression with `summary_model: google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter- format model slug straight to the Codex Responses API, which does not understand it and returns an empty `response.output`. This causes two user-visible failures: - "Invalid API response shape" (empty output after 3 retries) - "Context length exceeded, cannot compress further" (compression itself fails through the same path) Add `_compat_model()` helper that mirrors the "/" check from `resolve_provider_client()` and call it on the cache-hit return path.	2026-04-11 01:52:58 -07:00
kshitijk4poor	eeb8b4b00f	fix(auxiliary): harden fallback behavior for non-OpenRouter users Four fixes to auxiliary_client.py: 1. Respect explicit provider as hard constraint (#7559) When auxiliary.{task}.provider is explicitly set (not 'auto'), connection/payment errors no longer silently fallback to cloud providers. Local-only users (Ollama, vLLM) will no longer get unexpected OpenRouter billing from auxiliary tasks. 2. Eliminate model='default' sentinel (#7512) _resolve_api_key_provider() no longer sends literal 'default' as model name to APIs. Providers without a known aux model in _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing model_not_supported errors. 3. Add payment/connection fallback to async_call_llm (#7512) async_call_llm now mirrors sync call_llm's fallback logic for payment (402) and connection errors. Previously, async consumers (session_search, web_tools, vision) got hard failures with no recovery. Also fixes hardcoded 'openrouter' fallback to use the full auto-detection chain. 4. Use accurate error reason in fallback logs (#7512) _try_payment_fallback() now accepts a reason parameter and uses it in log messages. Connection timeouts are no longer misleadingly logged as 'payment error'. Closes #7559 Closes #7512	2026-04-11 01:52:58 -07:00
kshitijk4poor	ffbd80f5fc	fix(auxiliary): honor api_mode in auxiliary client (#6800 ) The auxiliary client always calls client.chat.completions.create(), ignoring the api_mode config flag. This breaks codex-family models (e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the /v1/responses endpoint. Changes: - Expand _resolve_task_provider_model to return api_mode (5-tuple) - Read api_mode from auxiliary.{task}.api_mode config and env vars (AUXILIARY_{TASK}_API_MODE) - Pass api_mode through _get_cached_client to resolve_provider_client - Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI clients in CodexAuxiliaryClient when api_mode=codex_responses or when auto-detection finds api.openai.com + codex model pattern - Apply wrapping at all custom endpoint, named custom provider, and API-key provider return paths - Update test mocks for the new 5-tuple return format Users can now set: auxiliary: compression: model: gpt-5.3-codex base_url: https://api.openai.com/v1 api_mode: codex_responses Closes #6800	2026-04-11 01:52:58 -07:00
Long Hao	58b62e3e43	feat(skin): make all CLI colors skin-aware Refactor hardcoded color constants throughout the CLI to resolve from the active skin engine, so custom themes fully control the visual appearance. cli.py: - Replace _GOLD constant with _ACCENT (_SkinAwareAnsi class) that lazily resolves response_border from the active skin - Rename _GOLD_DEFAULT to _ACCENT_ANSI_DEFAULT - Make _build_compact_banner() read banner_title/accent/dim from skin - Make session resume notifications use _accent_hex() - Make status line use skin colors (accent_color, separator_color, label_color instead of cryptic _dim_c/_dim_c2/_accent_c/_label_c) - Reset _ACCENT cache on /skin switch agent/display.py: - Replace hardcoded diff ANSI escapes with skin-aware functions: _diff_dim(), _diff_file(), _diff_hunk(), _diff_minus(), _diff_plus() (renamed from SCREAMING_CASE _ANSI_* to snake_case) - Add reset_diff_colors() for cache invalidation on skin switch	2026-04-11 01:47:48 -07:00
kshitijk4poor	d442f25a2f	fix: align MiniMax provider with official API docs Aligns MiniMax provider with official API documentation. Fixes 6 bugs: transport mismatch (openai_chat -> anthropic_messages), credential leak in switch_model(), prompt caching sent to non-Anthropic endpoints, dot-to-hyphen model name corruption, trajectory compressor URL routing, and stale doctor health check. Also corrects context window (204,800), thinking support (manual mode), max output (131,072), and model catalog (M2 family only on /anthropic). Source: https://platform.minimax.io/docs/api-reference/text-anthropic-api Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-04-11 01:04:41 -07:00
Teknium	caf371da18	fix: MiniMax/Alibaba incorrectly detected as Anthropic OAuth, causing mcp_ tool prefix (#7509 ) _is_oauth_token() returned True for any key not starting with 'sk-ant-api', which means MiniMax and Alibaba API keys were falsely treated as Anthropic OAuth tokens. This triggered the Claude Code compatibility path: - All tool names prefixed with mcp_ (e.g. mcp_terminal, mcp_web_search) - System prompt injected with 'You are Claude Code' identity - 'Hermes Agent' replaced with 'Claude Code' throughout Fix: Make _is_oauth_token() positively identify Anthropic OAuth tokens by their key format instead of using a broad catch-all: - sk-ant-* (but not sk-ant-api-) -> setup tokens, managed keys - eyJ -> JWTs from Anthropic OAuth flow - Everything else -> False (MiniMax, Alibaba, etc.) Reported by stefan171.	2026-04-11 00:43:01 -07:00
Kenny Xie	1ffd92cc94	fix(gateway): make manual compression feedback truthful	2026-04-10 21:16:53 -07:00
hermes-agent-dhabibi	c1af614289	fix: wrap copilot Responses-API models in CodexAuxiliaryClient for auxiliary tasks GPT-5+ models (except gpt-5-mini) are only accessible via the Responses API on Copilot. When these models were configured as the compression summary_model (or any auxiliary task), the plain OpenAI client sent them to /chat/completions which returned a 400 error: model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint resolve_provider_client() now checks _should_use_copilot_responses_api() for the copilot provider and wraps the client in CodexAuxiliaryClient when needed, routing calls through responses.stream() transparently. Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping (gpt-4.1-mini) paths.	2026-04-10 21:16:53 -07:00

1 2 3 4 5 ...

436 commits