hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-16 14:32:34 +00:00

History

Teknium cb38ce28cb refactor(codex): drop SDK responses.stream() helper; consume events directly (#33042 ) * refactor(codex): drop SDK responses.stream() helper; consume events directly The OpenAI Python SDK's high-level `client.responses.stream(...)` helper does post-hoc typed reconstruction from the terminal `response.completed.response.output` field. The chatgpt.com Codex backend has been observed (today, gpt-5.5) to ship `response.output = null` on terminal frames, which crashes the SDK with `TypeError: 'NoneType' object is not iterable` mid-iteration. Carlton's #32963 patched the symptom by wrapping the helper in try/except and recovering from the same per-event accumulator the SDK was supposed to populate. This PR removes the helper from the call path entirely: we now use `client.responses.create(stream=True)` (raw AsyncIterable of SSE events) and assemble the final response object ourselves from `response.output_item.done` events as they arrive. The terminal event's `output` field is never read for content. Same strategy OpenClaw uses for the same backend. This makes Hermes structurally immune to the bug class, not patched. The next time OpenAI ships a shape change to chatgpt.com's terminal frame, our consumer keeps working because it doesn't read that frame for content — only for usage/status/id. Changes - `agent/codex_runtime.py`: new `_consume_codex_event_stream()` shared consumer; `run_codex_stream()` uses `responses.create(stream=True)`; `run_codex_create_stream_fallback()` collapses into a thin alias since the primary path now does what the fallback used to do. - `agent/auxiliary_client.py`: `_CodexCompletionsAdapter` uses the same consumer; old null-output recovery helpers deleted as unreferenced. - Tests migrated: fixtures that mocked `responses.stream` now mock `responses.create` returning a raw iterable. New regression test asserts the auxiliary path returns streamed items even when the terminal event's `output` is literally `null`. Validation - Live: tested against fresh OAuth on `chatgpt.com/backend-api/codex` with `gpt-5.5` — response built correctly with `response.output=null` on the terminal frame, all events consumed, usage/reasoning tokens propagated. - `tests/run_agent/test_run_agent_codex_responses.py` + `tests/agent/test_auxiliary_client.py`: 242 passed. * test+fix(codex): migrate streaming tests, raise on truncated streams CI surfaced 10 test failures across tests/run_agent/test_streaming.py and tests/run_agent/test_codex_xai_oauth_recovery.py — both files had their own `responses.stream(...)` mocks I missed in the first sweep. agent/codex_runtime.py: _consume_codex_event_stream() now raises "Codex Responses stream did not emit a terminal response" when the stream ends without any terminal frame AND no usable content. This preserves the signal callers used to get from the SDK's high-level helper, which they distinguished from "completed with empty body" in error handling. Tests migrated: - test_streaming.py: text-delta callback, activity-touch, and remote-protocol-error tests all switch from mocking responses.stream to responses.create returning an iterable of events. - test_codex_xai_oauth_recovery.py: prelude-error tests are recast as wire-error-event tests (the new path raises _StreamErrorEvent directly when the wire emits type=error, which is strictly better than the old two-phase "SDK RuntimeError → retry → fallback"). The retry-on-transport-error test moves from responses.stream side-effect to responses.create side-effect. Verified live against chatgpt.com Codex with gpt-5.5 — AIAgent.chat() through the full codex_responses path returns correctly, 319/319 targeted tests passing.		2026-05-27 00:30:06 -07:00
..
lsp	fix(lint): skip per-file shell linter when LSP will handle the file (#29054 )	2026-05-20 01:46:40 -05:00
transports	fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults	2026-05-25 01:47:55 -07:00
__init__.py	test: add unit tests for 8 modules (batch 2)	2026-02-26 13:54:20 +03:00
test_anthropic_adapter.py	fix(security): close TOCTOU window when saving Claude Code OAuth credentials (#21152 )	2026-05-24 17:45:12 -07:00
test_anthropic_keychain.py	fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain	2026-04-24 07:14:00 -07:00
test_anthropic_mcp_prefix_strip.py	fix(anthropic): skip mcp_ prefix on outgoing tool schemas when already prefixed	2026-05-24 15:27:45 -07:00
test_anthropic_oauth_pkce.py	test(security): regression guard for OAuth PKCE state/verifier separation	2026-05-16 02:38:02 -07:00
test_arcee_trinity_overrides.py	test(arcee): cover Trinity Large Thinking temperature + compression overrides	2026-05-05 17:23:45 -07:00
test_async_utils.py	fix(async): close unscheduled coroutines in all threadsafe bridges (#26584 )	2026-05-15 14:00:01 -07:00
test_auxiliary_client.py	refactor(codex): drop SDK responses.stream() helper; consume events directly (#33042 )	2026-05-27 00:30:06 -07:00
test_auxiliary_client_anthropic_custom.py	fix(anthropic): complete third-party Anthropic-compatible provider support (#12846 )	2026-04-19 22:43:09 -07:00
test_auxiliary_client_azure_foundry.py	feat(azure-foundry): add Microsoft Entra ID auth	2026-05-18 10:14:38 -07:00
test_auxiliary_config_bridge.py	feat(plugins): add register_auxiliary_task() to PluginContext API	2026-05-23 17:49:47 -07:00
test_auxiliary_main_first.py	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 )	2026-05-17 02:29:41 -07:00
test_auxiliary_named_custom_providers.py	fix(fallback): let custom_providers shadow built-in aliases	2026-04-30 20:18:44 -07:00
test_auxiliary_transport_autodetect.py	fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027 )	2026-04-28 06:50:14 -07:00
test_azure_identity_adapter.py	feat(azure-foundry): add Microsoft Entra ID auth	2026-05-18 10:14:38 -07:00
test_bedrock_1m_context.py	feat(azure-foundry): add Microsoft Entra ID auth	2026-05-18 10:14:38 -07:00
test_bedrock_adapter.py	test(ci): stabilize shared optional dependency baselines	2026-05-13 17:32:22 -07:00
test_bedrock_integration.py	test(ci): stabilize shared optional dependency baselines	2026-05-13 17:32:22 -07:00
test_codex_cloudflare_headers.py	fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765 )	2026-04-29 23:23:50 -07:00
test_codex_ttfb_watchdog.py	fix(codex): add time-to-first-byte watchdog for stalled Codex streams	2026-05-25 05:34:42 -07:00
test_compress_focus.py	fix: resolve CI test failures — add missing functions, fix stale tests (#9483 )	2026-04-14 01:43:45 -07:00
test_compressor_historical_media.py	Port from Kilo-Org/kilocode#9434: strip historical media after compression (#27189 )	2026-05-16 17:18:25 -07:00
test_compressor_image_tokens.py	feat(image-input): native multimodal routing based on model vision capability (#16506 )	2026-04-27 06:27:59 -07:00
test_context_compressor.py	test: keep tirith checks hermetic	2026-05-23 02:20:14 -07:00
test_context_compressor_summary_continuity.py	fix(ci): stabilize shared test state after 21012	2026-05-14 14:28:14 -07:00
test_context_engine.py	feat: wire context engine plugin slot into agent and plugin system	2026-04-10 19:15:50 -07:00
test_context_references.py	fix(agent): fall back when rg is blocked for @folder references	2026-04-20 01:56:41 -07:00
test_copilot_acp_client.py	fix(ci): recover 38 failing tests on main (#17642 )	2026-04-29 20:05:32 -07:00
test_copilot_acp_deprecation.py	fix(copilot-acp): tighten deprecation detection + sharpen GitHub Models 413 hint	2026-05-16 02:24:48 -07:00
test_credential_pool.py	fix(anthropic): API-key path skips OAuth autodiscovery + prunes stale entries	2026-05-25 17:41:40 -07:00
test_credential_pool_routing.py	refactor: remove smart_model_routing feature (#12732 )	2026-04-19 18:12:55 -07:00
test_crossloop_client_cache.py	refactor(tests): re-architect tests + fix CI failures (#5946 )	2026-04-07 17:19:07 -07:00
test_curator.py	fix(skills): keep manual skills out of curator	2026-05-04 02:19:28 -07:00
test_curator_activity.py	fix: use skill activity in curator status	2026-04-30 10:31:47 -07:00
test_curator_backup.py	fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671 ) (#18731 )	2026-05-02 01:29:57 -07:00
test_curator_classification.py	feat(curator): hint at `hermes curator pin` in the rename block (#23212 )	2026-05-10 06:44:53 -07:00
test_curator_reports.py	fix(curator): rewrite cron job skill refs after consolidation (#18253 )	2026-04-30 23:04:50 -07:00
test_custom_provider_extra_body.py	fix(custom): pass custom provider extra body	2026-05-21 07:48:53 -07:00
test_deepseek_anthropic_thinking.py	chore: ruff auto-fix PLR6201 resweep — tuple → set in membership tests (#27355 )	2026-05-17 02:29:41 -07:00
test_direct_provider_url_detection.py	fix: restrict provider URL detection to exact hostname matches	2026-04-20 22:14:29 -07:00
test_display.py	fix: classify landed file mutations with diagnostics	2026-05-13 06:46:23 -07:00
test_display_emoji.py	feat(tools): centralize tool emoji metadata in registry + skin integration	2026-03-15 20:21:21 -07:00
test_display_todo_progress.py	feat(cli): show todo progress as done/total fraction	2026-05-23 21:03:51 -07:00
test_display_tool_failure.py	test(display): cover failure-suffix rendering + update scrollback test	2026-05-23 21:03:51 -07:00
test_error_classifier.py	fix(codex-responses): gracefully recover from invalid_encrypted_content (salvage #10144 ) (#33035 )	2026-05-26 22:01:17 -07:00
test_external_skills.py	feat(skills): support external skill directories via config (#3678 )	2026-03-29 00:33:30 -07:00
test_external_skills_dirs_cache.py	perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138 )	2026-05-08 16:39:32 -07:00
test_file_safety.py	fix(security): block read_file on project-local .env files	2026-05-25 03:40:47 -07:00
test_file_safety_credentials.py	fix(security): block read_file on project-local .env files	2026-05-25 03:40:47 -07:00
test_file_safety_cross_profile.py	fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290 )	2026-05-24 00:38:17 -07:00
test_gemini_cloudcode.py	fix(agent/gemini-cloudcode): seed delta defaults for reasoning-only stream chunks	2026-05-14 08:03:56 -07:00
test_gemini_fast_fallback.py	fix: wrap _pool_may_recover_from_rate_limit call through run_agent namespace	2026-05-18 20:04:57 -07:00
test_gemini_free_tier_gate.py	feat(gemini): block free-tier keys at setup + surface guidance on 429 (#15100 )	2026-04-24 04:46:17 -07:00
test_gemini_native_adapter.py	fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133 )	2026-04-24 05:35:17 -07:00
test_gemini_schema.py	fix(gemini): drop integer/number/boolean enums from tool schemas (#15082 )	2026-04-24 03:40:00 -07:00
test_i18n.py	feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914 )	2026-05-10 07:14:14 -07:00
test_image_gen_registry.py	feat(plugins): pluggable image_gen backends + OpenAI provider (#13799 )	2026-04-21 21:30:10 -07:00
test_image_routing.py	fix(agent): consult supports_vision override in auto-mode routing	2026-05-20 23:27:10 -07:00
test_insights.py	test: stop testing mutable data — convert change-detectors to invariants (#13363 )	2026-04-20 23:20:33 -07:00
test_kimi_coding_anthropic_thinking.py	fix(anthropic): broaden Kimi thinking-suppression to custom endpoints (#17455 )	2026-04-29 06:35:42 -07:00
test_last_total_tokens.py	fix(compressor): ABC compliance — total_tokens, api_mode, logger consistency	2026-05-23 17:38:19 -07:00
test_local_stream_timeout.py	fix(agent): recognize Tailscale CGNAT (100.64.0.0/10) as local for Ollama timeouts	2026-04-22 14:46:10 -07:00
test_markdown_tables.py	fix(cli): vertical fallback for markdown tables wider than terminal (#23948 )	2026-05-11 16:49:13 -07:00
test_memory_provider.py	fix(agent): widen toolset gate to context engine tools (#5544 sibling)	2026-05-21 23:18:37 -07:00
test_memory_session_switch.py	feat(hindsight): probe API for update_mode='append' support, dedupe across processes	2026-05-05 15:09:59 -07:00
test_memory_user_id.py	feat(hindsight): richer session-scoped retain metadata	2026-04-22 05:27:10 -07:00
test_minimax_auxiliary_url.py	fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983 )	2026-04-07 22:23:28 -07:00
test_minimax_provider.py	feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path	2026-05-05 13:40:01 -07:00
test_model_metadata.py	chore(models): drop retired grok-4-1-fast from metadata, tests, docs	2026-05-25 14:51:43 -07:00
test_model_metadata_local_ctx.py	fix(tui): show correct context length	2026-04-28 12:27:36 -07:00
test_model_metadata_ssl.py	fix(auth): honor SSL CA env vars across httpx + requests callsites	2026-04-24 03:00:33 -07:00
test_models_dev.py	fix(xai): resolve Grok Build context for OAuth	2026-05-22 13:05:36 -07:00
test_moonshot_schema.py	fix(moonshot): strip $ref siblings and collapse tuple items in tool schemas (#27104 )	2026-05-16 13:02:19 -07:00
test_non_stream_stale_timeout.py	fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults	2026-05-25 01:47:55 -07:00
test_nous_oauth_401_guidance.py	feat(errors): actionable guidance for Nous OAuth 401s (#32082 )	2026-05-25 06:06:51 -07:00
test_nous_rate_guard.py	fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898 )	2026-04-26 04:53:42 -07:00
test_onboarding.py	docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507 )	2026-04-29 08:08:36 -07:00
test_openrouter_response_cache.py	fix(openrouter): use canonical X-Title attribution header	2026-05-05 10:13:34 -07:00
test_plugin_llm.py	feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194 )	2026-05-10 07:09:28 -07:00
test_portal_tags.py	feat(nous): unified client=hermes-client-v<version> tag on every Portal request (#24779 )	2026-05-12 20:49:20 -07:00
test_prompt_builder.py	fix(agent): add qwen and deepseek to TOOL_USE_ENFORCEMENT_MODELS	2026-05-18 20:06:49 -07:00
test_prompt_caching.py	fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778 )	2026-05-12 20:46:04 -07:00
test_proxy_and_url_validation.py	fix(agent): normalize socks:// env proxies for httpx/anthropic	2026-04-21 05:52:46 -07:00
test_rate_limit_tracker.py	feat: capture provider rate limit headers and show in /usage (#6541 )	2026-04-09 03:43:14 -07:00
test_redact.py	fix(debug): redact BlueBubbles webhook secrets	2026-05-24 15:43:48 -07:00
test_save_url_image.py	fix(image_gen): cache xAI ephemeral URL responses to disk (#26942 ) (#31759 )	2026-05-24 18:10:47 -07:00
test_shell_hooks.py	fix(security): restore type safety and extract constant in shell hook block handler	2026-05-17 02:31:18 -07:00
test_shell_hooks_consent.py	fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322 )	2026-04-26 20:48:35 -07:00
test_skill_bundles.py	feat(skills): add skill bundles — alias /<name> loads multiple skills (#28373 )	2026-05-18 21:38:05 -07:00
test_skill_commands.py	test: use subprocesses for each test file (#29016 )	2026-05-21 16:40:04 +05:30
test_skill_commands_reload.py	refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool	2026-04-29 21:07:47 -07:00
test_skill_utils.py	fix(skills): load Linux-tagged skills on Termux (android sys.platform)	2026-05-21 19:08:38 -07:00
test_streaming_context_scrubber.py	🐛 fix(memory): require newline after context tag	2026-05-18 10:53:08 -07:00
test_subagent_progress.py	feat(delegate): orchestrator role and configurable spawn depth (default flat)	2026-04-21 14:23:45 -07:00
test_subagent_stop_hook.py	feat: shell hooks — wire shell scripts as Hermes hook callbacks	2026-04-20 20:53:51 -07:00
test_subdirectory_hints.py	fix(subdirectory_hints): prevent loading AGENTS.md outside workspace	2026-05-25 23:17:33 -07:00
test_system_prompt_restore.py	perf(prompt-cache): date-only timestamp + loud gateway-DB roundtrip logging	2026-05-17 23:20:37 -07:00
test_think_scrubber.py	fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924 ) (#20184 )	2026-05-05 04:33:38 -07:00
test_title_generator.py	fix: improve telegram topic mode setup	2026-05-04 12:07:17 -07:00
test_tool_dispatch_helpers.py	feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269 )	2026-05-25 14:52:24 -07:00
test_tool_guardrails.py	fix: add recovery hints to loop guard warnings	2026-05-19 00:12:12 -07:00
test_tool_result_classification.py	fix: classify landed file mutations with diagnostics	2026-05-13 06:46:23 -07:00
test_transcription_registry.py	feat(stt): add register_transcription_provider() plugin hook	2026-05-25 01:41:19 -07:00
test_tts_registry.py	feat(tts): add register_tts_provider() plugin hook (closes #30398 )	2026-05-24 18:04:54 -07:00
test_unsupported_parameter_retry.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_unsupported_temperature_retry.py	refactor(memory): remove flush_memories entirely (#15696 )	2026-04-25 08:21:14 -07:00
test_usage_pricing.py	fix(pricing): add deepseek-v4-pro to official docs pricing table	2026-05-12 16:32:57 -07:00
test_video_gen_registry.py	feat(video_gen): unified video_generate tool with pluggable provider backends (#25126 )	2026-05-13 16:39:41 -07:00
test_vision_resolved_args.py	fix(vision): preserve explicit provider auth with custom base_url	2026-05-04 05:05:43 -07:00
test_vision_routing_31179.py	fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main (#31452 )	2026-05-24 15:01:28 -07:00