mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-29 06:31:32 +00:00
feat(prompt): universal task-completion guidance + local Python toolchain probe (#34340)
* fix(codex): surface error code in Responses 'failed' status errors
When a Codex Responses turn ends with status=failed, the response carries
the failure details under `response.error` as
`{code, message, param, ...}`. The previous extractor pulled only
`message`, so users seeing a rate-limit failure got a bare "Slow down"
string indistinguishable from a generic stream truncation; an
internal_error with empty message degraded to a dict dump
("{'code': 'internal_error', 'message': ''}").
Extract a `_format_responses_error()` helper that:
- prefixes `code` when both code and message are present
(e.g. 'rate_limit_exceeded: Slow down')
- falls back to the bare `code` when message is empty
- accepts both dict and attribute-style payloads (SDK and JSON-RPC paths)
- preserves the prior status-only fallback when no error payload exists
Apply the same helper at the sibling site in
`codex_app_server_session.run_turn()` so codex-CLI subprocess turn
failures get the same treatment.
Tests:
- 8 new unit tests for `_format_responses_error` covering both shapes,
empty/missing fields, non-string fields, and the status-only fallback.
- 2 regression tests on `_normalize_codex_response` for failed status
with and without a code, asserting the exact RuntimeError message.
- All 3603 tests in tests/agent/ pass.
Adapted from anomalyco/opencode#28757.
* feat(prompt): universal task-completion guidance + local Python toolchain probe
Two cross-model failure modes get a single-line answer in the cached
system prompt. Both gated by config (default on), both add zero overhead
when not needed, both verified via real AIAgent prompt builds.
## What changed
`TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models.
Targets two failure modes observed on a real Sarasota real-estate build
task: (1) Opus stopped after writing an 85-byte stub and gave a prose
response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed
through a PEP-668 wall, then returned fabricated listings instead of
admitting the blocker. Both behaviors are model-family-agnostic, so the
guidance lives outside the existing tool_use_enforcement gate (~192
tokens, paid once per session via prefix cache).
`tools/env_probe.py` — local Python toolchain probe. Detects
python3/pip/uv/PEP-668 state and emits ONE short line in the system
prompt when something is non-default. Emits NOTHING when the env is
clean (zero token cost for normal users). Skipped entirely for remote
terminal backends (docker/modal/ssh) — they have their own probe.
Example output on a broken environment (the actual case):
Python toolchain: python3=3.11.15 (no pip module),
python=missing (use python3), pip→python3.12 (mismatch),
PEP 668=yes (use venv or uv).
## Config
Both flags live under `agent.` in config.yaml, default True:
agent:
task_completion_guidance: true # universal "finish the job" block
environment_probe: true # local Python toolchain hints
Neither addition required a `_config_version` bump — deep-merge fills
defaults in for existing user configs.
## Validation
| Test surface | Result |
|---|---|
| tests/tools/test_env_probe.py | 10/10 pass (probe unit) |
| tests/run_agent/test_run_agent.py — new classes | 8/8 pass (integration) |
| TestToolUseEnforcementConfig | 17/17 pass (no regression) |
| TestBuildSystemPrompt | 9/9 pass (no regression) |
| TestInvalidateSystemPrompt | 2/2 pass (no regression) |
| tests/agent/test_prompt_builder.py | 124/124 pass (no regression) |
| tests/hermes_cli/ | 5662/5662 pass (config defaults) |
| E2E AIAgent build (broken env) | Both blocks present, 2,178 chars |
| E2E AIAgent build (clean env) | 771-char net overhead, env probe silent |
This commit is contained in:
parent
75d2c081c9
commit
a4d8f0f62a
10 changed files with 819 additions and 6 deletions
|
|
@ -980,6 +980,48 @@ def _extract_responses_reasoning_text(item: Any) -> str:
|
|||
return ""
|
||||
|
||||
|
||||
def _format_responses_error(error_obj: Any, response_status: str) -> str:
|
||||
"""Build a human-readable error string from a Responses ``response.error`` payload.
|
||||
|
||||
The OpenAI Responses API carries failure details under ``response.error``
|
||||
on terminal ``response.failed`` events, in the shape
|
||||
``{"code": "rate_limit_exceeded", "message": "Slow down", "param": ...}``.
|
||||
Earlier code only surfaced ``message``, which left users staring at bare
|
||||
strings like ``"Slow down"`` while the failure mode (rate limit vs
|
||||
context-length vs internal_error vs model-overloaded) was hidden in
|
||||
``code``. We now prefix ``code`` when both are present so consumers can
|
||||
distinguish failure modes without parsing the bare message.
|
||||
|
||||
Falls back to ``code`` alone when ``message`` is empty, and to a stable
|
||||
default referencing the response status when no error payload is
|
||||
available at all. Adapted from anomalyco/opencode#28757.
|
||||
"""
|
||||
# Pull code and message from either dict or attribute-style payloads.
|
||||
code: Any = None
|
||||
message: Any = None
|
||||
if isinstance(error_obj, dict):
|
||||
code = error_obj.get("code")
|
||||
message = error_obj.get("message")
|
||||
elif error_obj is not None:
|
||||
code = getattr(error_obj, "code", None)
|
||||
message = getattr(error_obj, "message", None)
|
||||
|
||||
code_str = str(code).strip() if isinstance(code, str) else (str(code).strip() if code else "")
|
||||
message_str = str(message).strip() if isinstance(message, str) else (str(message).strip() if message else "")
|
||||
|
||||
if code_str and message_str:
|
||||
return f"{code_str}: {message_str}"
|
||||
if message_str:
|
||||
return message_str
|
||||
if code_str:
|
||||
return code_str
|
||||
if error_obj:
|
||||
# Last-resort: stringify whatever the provider sent so it's at least
|
||||
# visible in logs/UI rather than silently swallowed.
|
||||
return str(error_obj)
|
||||
return f"Responses API returned status '{response_status}'"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Full response normalization
|
||||
# ---------------------------------------------------------------------------
|
||||
|
|
@ -1023,10 +1065,7 @@ def _normalize_codex_response(
|
|||
|
||||
if response_status in {"failed", "cancelled"}:
|
||||
error_obj = getattr(response, "error", None)
|
||||
if isinstance(error_obj, dict):
|
||||
error_msg = error_obj.get("message") or str(error_obj)
|
||||
else:
|
||||
error_msg = str(error_obj) if error_obj else f"Responses API returned status '{response_status}'"
|
||||
error_msg = _format_responses_error(error_obj, response_status)
|
||||
raise RuntimeError(error_msg)
|
||||
|
||||
content_parts: List[str] = []
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue