hermes-agent/tests/agent
Teknium 0efe7dace7
feat: add GPT/Codex execution discipline guidance for tool persistence (#5414)
Adds OPENAI_MODEL_EXECUTION_GUIDANCE — XML-tagged behavioral guidance
injected for GPT and Codex models alongside the existing tool-use
enforcement. Targets four specific failure modes:

- <tool_persistence>: retry on empty/partial results instead of giving up
- <prerequisite_checks>: do discovery/lookup before jumping to final action
- <verification>: check correctness/grounding/formatting before finalizing
- <missing_context>: use lookup tools instead of hallucinating

Follows the same injection pattern as GOOGLE_MODEL_OPERATIONAL_GUIDANCE
for Gemini/Gemma models. Inspired by OpenClaw PR #38953 and OpenAI's
GPT-5.4 prompting guide patterns.
2026-04-05 21:51:07 -07:00
..
__init__.py test: add unit tests for 8 modules (batch 2) 2026-02-26 13:54:20 +03:00
test_auxiliary_client.py fix(tests): fix 11 real test failures + major cascade poisoner (#4570) 2026-04-02 08:43:06 -07:00
test_context_compressor.py fix(compression): restore sane defaults and cap summary at 12K tokens 2026-03-24 18:48:47 -07:00
test_display_emoji.py feat(tools): centralize tool emoji metadata in registry + skin integration 2026-03-15 20:21:21 -07:00
test_external_skills.py feat(skills): support external skill directories via config (#3678) 2026-03-29 00:33:30 -07:00
test_memory_plugin_e2e.py feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623) 2026-04-02 15:33:51 -07:00
test_memory_provider.py fix(hindsight): overhaul hindsight memory plugin and memory setup wizard 2026-04-04 12:18:46 -07:00
test_model_metadata.py feat: overhaul context length detection with models.dev and provider-aware resolution (#2158) 2026-03-20 06:04:33 -07:00
test_models_dev.py feat: overhaul context length detection with models.dev and provider-aware resolution (#2158) 2026-03-20 06:04:33 -07:00
test_prompt_builder.py feat: add GPT/Codex execution discipline guidance for tool persistence (#5414) 2026-04-05 21:51:07 -07:00
test_prompt_caching.py fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter 2026-03-21 16:54:43 -07:00
test_redact.py test(redact): add regression tests for lowercase variable redaction (#4367) (#5185) 2026-04-05 00:10:16 -07:00
test_skill_commands.py fix(gateway): resolve Telegram's underscored /commands to skill/plugin keys 2026-04-05 11:59:28 -07:00
test_smart_model_routing.py fix: hermes update causes dual gateways on macOS (launchd) (#1567) 2026-03-16 12:36:29 -07:00
test_subagent_progress.py feat(api): structured run events via /v1/runs SSE endpoint 2026-04-05 12:05:13 -07:00
test_subdirectory_hints.py feat: progressive subdirectory hint discovery (#5291) 2026-04-05 12:33:47 -07:00
test_title_generator.py feat: auto-generate session titles after first exchange 2026-03-17 04:14:40 -07:00
test_usage_pricing.py feat: use endpoint metadata for custom model context and pricing (#1906) 2026-03-18 03:04:07 -07:00