hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-28 01:21:43 +00:00

Author	SHA1	Message	Date
Teknium	8104f400f8	test: disable text batching in existing adapter tests Set _text_batch_delay_seconds = 0 on test adapter fixtures so messages dispatch immediately (bypassing async batching). This preserves the existing synchronous assertion patterns while the batching logic is tested separately in test_text_batching.py.	2026-04-09 23:25:27 -07:00
Teknium	1ed00496f2	test: add text batching tests for Discord, Matrix, WeCom, Telegram, Feishu 22 tests covering: - Single message dispatch after delay - Split message aggregation (2-way and 3-way) - Different chats/rooms not merged - Adaptive delay for near-limit chunks - State cleanup after flush - Split continuation merging All 5 platform adapters tested.	2026-04-09 23:25:27 -07:00
Teknium	f92a0b8596	fix(feishu): add adaptive batch delay for split long messages Feishu already had text batching with a static 0.6s delay. This adds adaptive delay: waits 2.0s when a chunk is near the ~4096-char split point since a continuation is almost certain. Tracks _last_chunk_len on each queued event to determine the delay. Configurable via HERMES_FEISHU_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 2.0). Ref #6892	2026-04-09 23:25:27 -07:00
Teknium	1723e8e998	fix(wecom): add text batching to merge split long messages Ports the adaptive batching pattern from the Telegram adapter. WeCom clients split messages around 4000 chars. Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. Only text messages are batched; commands/media dispatch immediately. Ref #6892	2026-04-09 23:25:27 -07:00
Teknium	07148cac9a	fix(matrix): add text batching to merge split long messages Ports the adaptive batching pattern from the Telegram adapter. Matrix clients split messages around 4000 chars. Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise. Only text messages are batched; commands dispatch immediately. Ref #6892	2026-04-09 23:25:27 -07:00
Teknium	0fc0c1c83b	fix(discord): add text batching to merge split long messages Cherry-picked from PR #6894 by SHL0MS with fixes: - Only batch TEXT messages; commands/media dispatch immediately - Use build_session_key() for proper session-scoped batch keys - Consistent naming (_text_batch_delay_seconds) - Proper Dict[str, MessageEvent] typing Discord splits at 2000 chars (lowest of all platforms). Adaptive delay waits 2.0s when a chunk is near the limit, 0.6s otherwise.	2026-04-09 23:25:27 -07:00
Teknium	5075717949	fix(telegram): adaptive batch delay for split long messages Cherry-picked from PR #6891 by SHL0MS. When a chunk is near the 4096-char split point, wait 2.0s instead of 0.6s since a continuation is almost certain.	2026-04-09 23:25:27 -07:00
Ari Lotter	660379637a	one more nix fix	2026-04-10 01:41:29 -04:00
Teknium	f783986f5a	fix: increase stream read timeout default to 120s, auto-raise for local LLMs (#6967 ) Raise the default httpx stream read timeout from 60s to 120s for all providers. Additionally, auto-detect local LLM endpoints (Ollama, llama.cpp, vLLM) and raise the read timeout to HERMES_API_TIMEOUT (1800s) since local models can take minutes for prefill on large contexts before producing the first token. The stale stream timeout already had this local auto-detection pattern; the httpx read timeout was missing it — causing a hard 60s wall that users couldn't find (HERMES_STREAM_READ_TIMEOUT was undocumented). Changes: - Default HERMES_STREAM_READ_TIMEOUT: 60s -> 120s - Auto-detect local endpoints -> raise to 1800s (user override respected) - Document HERMES_STREAM_READ_TIMEOUT and HERMES_STREAM_STALE_TIMEOUT - Add 10 parametrized tests Reported-by: Pavan Srinivas (@pavanandums)	2026-04-09 22:35:30 -07:00
emozilla	bda9aa17cb	fix(streaming): prevent <think> in prose from suppressing response output When the model mentions <think> as literal text in its response (e.g. "(/think not producing <think> tags)"), the streaming display treated it as a reasoning block opener and suppressed everything after it. The response box would close with truncated content and no error — the API response was complete but the display ate it. Root cause: _stream_delta() matched <think> anywhere in the text stream regardless of position. Real reasoning blocks always start at the beginning of a line; mentions in prose appear mid-sentence. Fix: track line position across streaming deltas with a _stream_last_was_newline flag. Only enter reasoning suppression when the tag appears at a block boundary (start of stream, after a newline, or after only whitespace on the current line). Add a _flush_stream() safety net that recovers buffered content if no closing tag is found by end-of-stream. Also fixes three related issues discovered during investigation: - anthropic_adapter: _get_anthropic_max_output() now normalizes dots to hyphens so 'claude-opus-4.6' matches the 'claude-opus-4-6' table key (was returning 32K instead of 128K) - run_agent: send explicit max_tokens for Claude models on Nous Portal, same as OpenRouter — both proxy to Anthropic's API which requires it. Without it the backend defaults to a low limit that truncates responses. - run_agent: reset truncated_tool_call_retries after successful tool execution so a single truncation doesn't poison the entire conversation.	2026-04-09 22:16:36 -07:00
Teknium	8394b5ddd2	feat: expand /fast to all OpenAI Priority Processing models (#6960 ) Previously /fast only supported gpt-5.4 and forced a provider switch to openai-codex. Now supports all 13 models from OpenAI's Priority Processing pricing table (gpt-5.4, gpt-5.4-mini, gpt-5.2, gpt-5.1, gpt-5, gpt-5-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o4-mini). Key changes: - Replaced _FAST_MODE_BACKEND_CONFIG with _PRIORITY_PROCESSING_MODELS frozenset - Removed provider-forcing logic — service_tier is now injected into whatever API path the user is already on (Codex Responses, Chat Completions, or OpenRouter passthrough) - Added request_overrides support to chat_completions path in run_agent.py - Updated messaging from 'Codex inference tier' to 'Priority Processing' - Expanded test coverage for all supported models	2026-04-09 22:06:30 -07:00
g-guthrie	d416a69288	feat: add Codex fast mode toggle (/fast command) Add /fast slash command to toggle OpenAI Codex service_tier between normal and priority ('fast') inference. Only exposed for models registered in _FAST_MODE_BACKEND_CONFIG (currently gpt-5.4). - Registry-based backend config for extensibility - Dynamic command visibility (hidden from help/autocomplete for non-supported models) via command_filter on SlashCommandCompleter - service_tier flows through request_overrides from route resolution - Omit max_output_tokens for Codex backend (rejects it) - Persists to config.yaml under agent.service_tier Salvage cleanup: removed simple_term_menu/input() menu (banned), bare /fast now shows status like /reasoning. Removed redundant override resolution in _build_api_kwargs — single source of truth via request_overrides from route. Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-04-09 21:54:32 -07:00
Ari Lotter	bc80848e49	update lockfile	2026-04-10 00:50:39 -04:00
Teknium	4caa635803	fix: add auth.json write-back for Codex retry and valid-token early-return paths The Codex retry block and valid-token short-circuit in _refresh_entry() both return early, bypassing the auth.json sync at the end of the method. This adds _sync_device_code_entry_to_auth_store() calls on both paths so refreshed/synced tokens are written back to auth.json regardless of which code path succeeds.	2026-04-09 21:48:50 -07:00
Ben Barclay	a64d8a83e1	fix: proactive Codex CLI sync before refresh + retry on failure	2026-04-09 21:48:50 -07:00
Ben Barclay	dfde4058cf	fix: sync refreshed OAuth tokens from pool back to auth.json providers	2026-04-09 21:48:50 -07:00
Ben Barclay	13b3ea6484	fix: skip stale Nous pool entry when agent_key is expired	2026-04-09 21:48:50 -07:00
Ari Lotter	658cd2dd4c	nix: add tui lockfile update script	2026-04-10 00:46:37 -04:00
Brooklyn Nicholson	8c1ba639c6	Merge branch 'feat/ink-refactor' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-09 23:35:29 -05:00
Brooklyn Nicholson	17a9c47178	feat: support shift enter for ghostty etc	2026-04-09 23:35:25 -05:00
Austin Pickett	e1df13cf20	fix: menus	2026-04-10 00:01:37 -04:00
SHL0MS	941608cdde	feat(skills): add creative divergence strategies for experimental output Adds opt-in creative thinking frameworks to ascii-video, p5js, and manim-video skills, based on Lluminate (joelsimon.net/lluminate). Only engaged when the user explicitly asks for creative, experimental, or unconventional output. Straightforward requests are unaffected. Each skill gets 2-3 strategies matched to its domain: - ascii-video: Forced Connections, Conceptual Blending, Oblique Strategies - p5js: Conceptual Blending, SCAMPER, Distance Association - manim-video: SCAMPER, Assumption Reversal Strategies sourced from creativity research (Boden, Eno, de Bono, Koestler, Fauconnier & Turner, Osborn), formalized for LLM prompting by Lluminate.	2026-04-09 21:40:16 -04:00
Teknium	b87d00288d	fix: add actionable hint for OpenRouter 'no tool endpoints' error When OpenRouter returns 'No endpoints found that support tool use' (HTTP 404), display a hint explaining that provider routing restrictions may be filtering out tool-capable providers. Links the user directly to the model's OpenRouter page to check which providers support tools. The hint fires in the error display block that runs regardless of whether fallback succeeds — so the user always understands WHY the model failed, not just that it fell back. Reported via Discord: GLM-5.1 on OpenRouter with US-based provider restrictions eliminated all 4 tool-supporting endpoints (DeepInfra, Z.AI, Friendli, Venice), leaving only 7 non-tool providers.	2026-04-09 18:03:09 -07:00
kshitijk4poor	08e2a1a51e	fix(anthropic): omit tool-streaming beta on MiniMax endpoints MiniMax's Anthropic-compatible endpoints reject requests that include the fine-grained-tool-streaming beta header — every tool-use message triggers a connection error (~18s timeout). Regular chat works fine. Add _common_betas_for_base_url() that filters out the tool-streaming beta for Bearer-auth (MiniMax) endpoints while keeping all other betas. All four client-construction branches now use the filtered list. Based on #6528 by @HiddenPuppy. Original cherry-picked from PR #6688 by kshitijk4poor. Fixes #6510, fixes #6555.	2026-04-09 17:53:52 -07:00
Brooklyn Nicholson	4fe78d5b88	chore: fix bad merge apparently?	2026-04-09 19:17:06 -05:00
Brooklyn Nicholson	aa5b697a9d	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-09 19:12:31 -05:00
Brooklyn Nicholson	aca479c1ae	Merge branch 'feat/ink-refactor' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-09 19:08:52 -05:00
Brooklyn Nicholson	b85ff282bc	feat(ui-tui): slash command history/display, CoT fade, live skin switch, fix double reasoning	2026-04-09 19:08:47 -05:00
Teknium	9634e20e15	feat: API server model name derived from profile name (#6857 ) * feat: API server model name derived from profile name For multi-user setups (e.g. OpenWebUI), each profile's API server now advertises a distinct model name on /v1/models: - Profile 'lucas' -> model ID 'lucas' - Profile 'admin' -> model ID 'admin' - Default profile -> 'hermes-agent' (unchanged) Explicit override via API_SERVER_MODEL_NAME env var or platforms.api_server.model_name config for custom names. Resolves friction where OpenWebUI couldn't distinguish multiple hermes-agent connections all advertising the same model name. * docs: multi-user setup with profiles for API server + Open WebUI - api-server.md: added Multi-User Setup section, API_SERVER_MODEL_NAME to config table, updated /v1/models description - open-webui.md: added Multi-User Setup with Profiles section with step-by-step guide, updated model name references - environment-variables.md: added API_SERVER_MODEL_NAME entry	2026-04-09 17:07:29 -07:00
AIandI0x1	2d0d05a337	fix(agent): detect truncated streaming tool calls before execution When a streaming response is cut mid-tool-call (connection drop, timeout), the accumulated function.arguments is invalid JSON. The mock response builder defaulted finish_reason to 'stop', so the agent loop treated it as a valid completed turn and tried to execute tools with broken args. Fix: validate tool call arguments with json.loads() during mock response reconstruction. If any are invalid JSON, override finish_reason to 'length'. In the main loop's length handler, if tool calls are present, refuse to execute and return partial=True with a clear error instead of silently failing or wasting retries. Also fixes _thinking_exhausted to not short-circuit when tool calls are present — truncated tool calls are not thinking exhaustion. Original cherry-picked from PR #6776 by AIandI0x1. Closes #6638.	2026-04-09 17:03:54 -07:00
Austin Pickett	f805323517	chore: merge main	2026-04-09 20:00:34 -04:00
Austin Pickett	4406b4b100	fix: add delete support	2026-04-09 19:53:55 -04:00
Brooklyn Nicholson	17ecdce936	feat: add slash commands to the history so it doesnt get lost	2026-04-09 18:51:17 -05:00
Brooklyn Nicholson	7e813a30e0	fix: sexier cots	2026-04-09 18:33:25 -05:00
Teknium	3b554bf839	fix: test for suppress_status_output should capture stdout, not mock _vprint The test was mocking _vprint entirely, bypassing the suppress guard. Switch to capturing _print_fn output so the real _vprint runs and the guard suppresses retry noise as intended.	2026-04-09 16:24:53 -07:00
Teknium	69a0092c38	fix: deduplicate _is_termux() into hermes_constants.is_termux() Replace 6 identical copies of the Termux detection function across cli.py, browser_tool.py, voice_mode.py, status.py, doctor.py, and gateway.py with a single shared implementation in hermes_constants.py. Each call site imports with its original local name to preserve all existing callers (internal references and test monkeypatches).	2026-04-09 16:24:53 -07:00
adybag14-cyber	c3141429b7	fix(termux): tighten voice setup and mobile chat UX	2026-04-09 16:24:53 -07:00
adybag14-cyber	769ec1ee1a	fix(termux): deepen browser, voice, and tui support	2026-04-09 16:24:53 -07:00
adybag14-cyber	3237733ca5	fix(termux): harden execute_code and mobile browser/audio UX	2026-04-09 16:24:53 -07:00
adybag14-cyber	54d5138a54	fix(termux): harden env-backed background jobs	2026-04-09 16:24:53 -07:00
adybag14-cyber	6dcb3c4774	fix(termux): compact narrow-screen tui chrome	2026-04-09 16:24:53 -07:00
adybag14-cyber	096b3f9f12	fix(termux): add local image chat route	2026-04-09 16:24:53 -07:00
adybag14-cyber	a3aed1bd26	fix(termux): keep quiet chat output parseable	2026-04-09 16:24:53 -07:00
adybag14-cyber	4970705ed3	fix(termux): silence quiet chat tool previews	2026-04-09 16:24:53 -07:00
adybag14-cyber	2194425918	fix(termux): make setup-hermes use android path	2026-04-09 16:24:53 -07:00
adybag14-cyber	3878495972	fix(termux): disable gateway service flows on android	2026-04-09 16:24:53 -07:00
adybag14-cyber	4e40e93b98	fix(termux): improve status and install UX	2026-04-09 16:24:53 -07:00
adybag14-cyber	122925a6f2	fix(termux): honor temp dirs for local temp artifacts	2026-04-09 16:24:53 -07:00
adybag14-cyber	e79cc88985	feat: add tested Termux install path and EOF-aware gh auth	2026-04-09 16:24:53 -07:00
sprmn24	e053433c84	fix(error_classifier): disambiguate usage-limit patterns in _classify_by_message _classify_by_message had no handling for _USAGE_LIMIT_PATTERNS, so messages like 'usage limit exceeded, try again in 5 minutes' arriving without an HTTP status code fell through to FailoverReason.unknown instead of rate_limit. Apply the same billing/rate-limit disambiguation that _classify_402 already uses: USAGE_LIMIT_PATTERNS + transient signal → rate_limit, USAGE_LIMIT_PATTERNS alone → billing. Add 4 tests covering the no-status-code usage-limit path.	2026-04-09 16:24:13 -07:00

... 24 25 26 27 28 ...

4951 commits