hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-29 06:31:32 +00:00

History

Teknium a4d8f0f62a feat(prompt): universal task-completion guidance + local Python toolchain probe (#34340 ) * fix(codex): surface error code in Responses 'failed' status errors When a Codex Responses turn ends with status=failed, the response carries the failure details under `response.error` as `{code, message, param, ...}`. The previous extractor pulled only `message`, so users seeing a rate-limit failure got a bare "Slow down" string indistinguishable from a generic stream truncation; an internal_error with empty message degraded to a dict dump ("{'code': 'internal_error', 'message': ''}"). Extract a `_format_responses_error()` helper that: - prefixes `code` when both code and message are present (e.g. 'rate_limit_exceeded: Slow down') - falls back to the bare `code` when message is empty - accepts both dict and attribute-style payloads (SDK and JSON-RPC paths) - preserves the prior status-only fallback when no error payload exists Apply the same helper at the sibling site in `codex_app_server_session.run_turn()` so codex-CLI subprocess turn failures get the same treatment. Tests: - 8 new unit tests for `_format_responses_error` covering both shapes, empty/missing fields, non-string fields, and the status-only fallback. - 2 regression tests on `_normalize_codex_response` for failed status with and without a code, asserting the exact RuntimeError message. - All 3603 tests in tests/agent/ pass. Adapted from anomalyco/opencode#28757. * feat(prompt): universal task-completion guidance + local Python toolchain probe Two cross-model failure modes get a single-line answer in the cached system prompt. Both gated by config (default on), both add zero overhead when not needed, both verified via real AIAgent prompt builds. ## What changed `TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models. Targets two failure modes observed on a real Sarasota real-estate build task: (1) Opus stopped after writing an 85-byte stub and gave a prose response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed through a PEP-668 wall, then returned fabricated listings instead of admitting the blocker. Both behaviors are model-family-agnostic, so the guidance lives outside the existing tool_use_enforcement gate (~192 tokens, paid once per session via prefix cache). `tools/env_probe.py` — local Python toolchain probe. Detects python3/pip/uv/PEP-668 state and emits ONE short line in the system prompt when something is non-default. Emits NOTHING when the env is clean (zero token cost for normal users). Skipped entirely for remote terminal backends (docker/modal/ssh) — they have their own probe. Example output on a broken environment (the actual case): Python toolchain: python3=3.11.15 (no pip module), python=missing (use python3), pip→python3.12 (mismatch), PEP 668=yes (use venv or uv). ## Config Both flags live under `agent.` in config.yaml, default True: agent: task_completion_guidance: true # universal "finish the job" block environment_probe: true # local Python toolchain hints Neither addition required a `_config_version` bump — deep-merge fills defaults in for existing user configs. ## Validation \| Test surface \| Result \| \|---\|---\| \| tests/tools/test_env_probe.py \| 10/10 pass (probe unit) \| \| tests/run_agent/test_run_agent.py — new classes \| 8/8 pass (integration) \| \| TestToolUseEnforcementConfig \| 17/17 pass (no regression) \| \| TestBuildSystemPrompt \| 9/9 pass (no regression) \| \| TestInvalidateSystemPrompt \| 2/2 pass (no regression) \| \| tests/agent/test_prompt_builder.py \| 124/124 pass (no regression) \| \| tests/hermes_cli/ \| 5662/5662 pass (config defaults) \| \| E2E AIAgent build (broken env) \| Both blocks present, 2,178 chars \| \| E2E AIAgent build (clean env) \| 771-char net overhead, env probe silent \|		2026-05-28 22:26:09 -07:00
..
__init__.py	feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path	2026-05-05 13:40:01 -07:00
anthropic.py	fix(agent): only strip mcp_ prefix for OAuth-injected tools (GH-25255)	2026-05-24 15:27:45 -07:00
base.py	feat: add transport ABC + AnthropicTransport wired to all paths	2026-04-21 01:27:01 -07:00
bedrock.py	feat: add BedrockTransport + wire all Bedrock transport paths	2026-04-21 20:58:37 -07:00
chat_completions.py	fix(opencode-go): cap mimo-v2.5-pro max_tokens at 131072	2026-05-28 20:49:53 -07:00
codex.py	fix(codex): omit tools key from Codex Responses kwargs when no tools registered	2026-05-27 11:46:17 -07:00
codex_app_server.py	fix(codex): allow kanban worker board writes	2026-05-17 11:50:43 -07:00
codex_app_server_session.py	feat(prompt): universal task-completion guidance + local Python toolchain probe (#34340 )	2026-05-28 22:26:09 -07:00
codex_event_projector.py	feat(codex-runtime): optional codex app-server runtime for OpenAI/Codex models (#24182 )	2026-05-13 17:18:15 -07:00
hermes_tools_mcp_server.py	docs(hermes_tools_mcp_server): align scope docstring with EXPOSED_TOOLS (#26603 )	2026-05-15 14:44:27 -07:00
types.py	fix(transports): use PEP 604 annotation for ToolCall.extra_content	2026-05-09 02:25:37 -07:00