mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-01 07:01:41 +00:00
* fix(codex): surface error code in Responses 'failed' status errors
When a Codex Responses turn ends with status=failed, the response carries
the failure details under `response.error` as
`{code, message, param, ...}`. The previous extractor pulled only
`message`, so users seeing a rate-limit failure got a bare "Slow down"
string indistinguishable from a generic stream truncation; an
internal_error with empty message degraded to a dict dump
("{'code': 'internal_error', 'message': ''}").
Extract a `_format_responses_error()` helper that:
- prefixes `code` when both code and message are present
(e.g. 'rate_limit_exceeded: Slow down')
- falls back to the bare `code` when message is empty
- accepts both dict and attribute-style payloads (SDK and JSON-RPC paths)
- preserves the prior status-only fallback when no error payload exists
Apply the same helper at the sibling site in
`codex_app_server_session.run_turn()` so codex-CLI subprocess turn
failures get the same treatment.
Tests:
- 8 new unit tests for `_format_responses_error` covering both shapes,
empty/missing fields, non-string fields, and the status-only fallback.
- 2 regression tests on `_normalize_codex_response` for failed status
with and without a code, asserting the exact RuntimeError message.
- All 3603 tests in tests/agent/ pass.
Adapted from anomalyco/opencode#28757.
* feat(prompt): universal task-completion guidance + local Python toolchain probe
Two cross-model failure modes get a single-line answer in the cached
system prompt. Both gated by config (default on), both add zero overhead
when not needed, both verified via real AIAgent prompt builds.
## What changed
`TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models.
Targets two failure modes observed on a real Sarasota real-estate build
task: (1) Opus stopped after writing an 85-byte stub and gave a prose
response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed
through a PEP-668 wall, then returned fabricated listings instead of
admitting the blocker. Both behaviors are model-family-agnostic, so the
guidance lives outside the existing tool_use_enforcement gate (~192
tokens, paid once per session via prefix cache).
`tools/env_probe.py` — local Python toolchain probe. Detects
python3/pip/uv/PEP-668 state and emits ONE short line in the system
prompt when something is non-default. Emits NOTHING when the env is
clean (zero token cost for normal users). Skipped entirely for remote
terminal backends (docker/modal/ssh) — they have their own probe.
Example output on a broken environment (the actual case):
Python toolchain: python3=3.11.15 (no pip module),
python=missing (use python3), pip→python3.12 (mismatch),
PEP 668=yes (use venv or uv).
## Config
Both flags live under `agent.` in config.yaml, default True:
agent:
task_completion_guidance: true # universal "finish the job" block
environment_probe: true # local Python toolchain hints
Neither addition required a `_config_version` bump — deep-merge fills
defaults in for existing user configs.
## Validation
| Test surface | Result |
|---|---|
| tests/tools/test_env_probe.py | 10/10 pass (probe unit) |
| tests/run_agent/test_run_agent.py — new classes | 8/8 pass (integration) |
| TestToolUseEnforcementConfig | 17/17 pass (no regression) |
| TestBuildSystemPrompt | 9/9 pass (no regression) |
| TestInvalidateSystemPrompt | 2/2 pass (no regression) |
| tests/agent/test_prompt_builder.py | 124/124 pass (no regression) |
| tests/hermes_cli/ | 5662/5662 pass (config defaults) |
| E2E AIAgent build (broken env) | Both blocks present, 2,178 chars |
| E2E AIAgent build (clean env) | 771-char net overhead, env probe silent |
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| conftest.py | ||
| test_413_compression.py | ||
| test_860_dedup.py | ||
| test_1630_context_overflow_loop.py | ||
| test_18028_content_policy_blocked.py | ||
| test_31273_402_not_retried.py | ||
| test_agent_guardrails.py | ||
| test_anthropic_prompt_cache_policy.py | ||
| test_anthropic_third_party_oauth_guard.py | ||
| test_anthropic_truncation_continuation.py | ||
| test_api_max_retries_config.py | ||
| test_async_httpx_del_neuter.py | ||
| test_background_review.py | ||
| test_background_review_cache_parity.py | ||
| test_background_review_summary.py | ||
| test_background_review_toolset_restriction.py | ||
| test_callable_api_key.py | ||
| test_codex_app_server_integration.py | ||
| test_codex_multimodal_tool_result.py | ||
| test_codex_no_tools_nonetype.py | ||
| test_codex_silent_hang_hint.py | ||
| test_codex_xai_oauth_recovery.py | ||
| test_commit_memory_session_context_engine.py | ||
| test_compress_focus_plugin_fallback.py | ||
| test_compression_boundary.py | ||
| test_compression_boundary_hook.py | ||
| test_compression_feasibility.py | ||
| test_compression_persistence.py | ||
| test_compression_trigger_excludes_reasoning.py | ||
| test_compressor_fallback_update.py | ||
| test_concurrent_interrupt.py | ||
| test_context_token_tracking.py | ||
| test_copilot_native_vision_headers.py | ||
| test_create_openai_client_kwargs_isolation.py | ||
| test_create_openai_client_proxy_env.py | ||
| test_create_openai_client_reuse.py | ||
| test_credential_pool_interrupt.py | ||
| test_deepseek_reasoning_content_echo.py | ||
| test_deepseek_v4_thinking_live.py | ||
| test_dict_tool_call_args.py | ||
| test_empty_response_recovery_persistence.py | ||
| test_exit_cleanup_interrupt.py | ||
| test_fallback_credential_isolation.py | ||
| test_file_mutation_verifier.py | ||
| test_image_rejection_fallback.py | ||
| test_image_shrink_recovery.py | ||
| test_init_fallback_on_exhausted_pool.py | ||
| test_interactive_interrupt.py | ||
| test_interrupt_propagation.py | ||
| test_invalid_context_length_warning.py | ||
| test_iteration_budget_race.py | ||
| test_jsondecodeerror_retryable.py | ||
| test_last_reasoning_per_turn.py | ||
| test_long_context_tier_429.py | ||
| test_materialize_data_url_cleanup.py | ||
| test_memory_nudge_counter_hydration.py | ||
| test_memory_provider_init.py | ||
| test_memory_sync_interrupted.py | ||
| test_message_sequence_repair.py | ||
| test_multimodal_tool_content_recovery.py | ||
| test_openai_client_lifecycle.py | ||
| test_partial_stream_finish_reason.py | ||
| test_percentage_clamp.py | ||
| test_plugin_context_engine_init.py | ||
| test_primary_runtime_restore.py | ||
| test_provider_attribution_headers.py | ||
| test_provider_fallback.py | ||
| test_provider_parity.py | ||
| test_real_interrupt_subagent.py | ||
| test_redirect_stdout_issue.py | ||
| test_repair_tool_call_arguments.py | ||
| test_repair_tool_call_name.py | ||
| test_retry_status_buffer.py | ||
| test_review_prompt_class_first.py | ||
| test_run_agent.py | ||
| test_run_agent_codex_responses.py | ||
| test_run_agent_multimodal_prologue.py | ||
| test_sequential_chats_live.py | ||
| test_session_id_env.py | ||
| test_session_meta_filtering.py | ||
| test_session_reset_fix.py | ||
| test_steer.py | ||
| test_stream_drop_logging.py | ||
| test_stream_interrupt_retry.py | ||
| test_streaming.py | ||
| test_streaming_tool_call_repair.py | ||
| test_strict_api_validation.py | ||
| test_strip_reasoning_tags_cli.py | ||
| test_switch_model_context.py | ||
| test_switch_model_fallback_prune.py | ||
| test_switch_model_rollback.py | ||
| test_thinking_only_sanitizer.py | ||
| test_tls_fd_recycle_corruption.py | ||
| test_token_persistence_non_cli.py | ||
| test_tool_arg_coercion.py | ||
| test_tool_call_args_sanitizer.py | ||
| test_tool_call_guardrail_runtime.py | ||
| test_tool_executor_contextvar_propagation.py | ||
| test_tool_name_db_persistence.py | ||
| test_unicode_ascii_codec.py | ||
| test_vision_aware_preprocessing.py | ||