hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-07-24 16:54:43 +00:00

History

synapsesx f10a330aee fix(research): keep tool_call/tool_response pairs intact when compressing trajectories ## What does this PR do? The trajectory compressor could corrupt training trajectories by cutting a conversation in the middle of a tool-call/tool-response pair. In the from/value trajectory format a `tool` turn (carrying `<tool_response>` markers) is always emitted immediately after the `gpt` turn whose `<tool_call>` it answers, so the two turns must stay together. The compressible region's end boundary, however, was chosen purely by token accumulation: the loop stopped at the first turn where the accumulated tokens met the savings target, with no regard for turn roles. For any over-budget trajectory whose savings boundary happened to land between a `gpt` turn and its `tool` turn, the `gpt` (with its `<tool_call>`) was summarised away into the replacement `human` message while the now-orphaned `tool` turn (with its `<tool_response>`) was kept verbatim in the tail — producing an unmatched marker and silently corrupting the training signal. The head boundary had the mirror problem when the first tool turn was not protected. This change snaps both compression boundaries to a clean turn boundary before the region is extracted and replaced, so the summary always covers whole gpt+tool blocks and a `tool` turn is never separated from the `gpt` turn that precedes it. The boundary is moved forward when possible (folding an orphaned tool turn into the region that already holds its gpt) and falls back to moving backward when no clean boundary exists ahead, such as when the protected tail itself begins on a tool turn. ## Related Issue N/A ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) ## Changes Made - `trajectory_compressor.py`: added `_is_boundary_clean()` and `_snap_boundary()` helpers on `TrajectoryCompressor`, and applied them to both the head and tail compression boundaries in `compress_trajectory()` and `compress_trajectory_async()`. When snapping collapses the region to nothing safe to compress, the trajectory is returned unchanged and flagged as still over the limit rather than being corrupted. - `tests/test_trajectory_compressor.py`: added `TestCompressionToolPairIntegrity` covering the sync and async paths plus direct unit tests for the boundary snapping (forward skip and backward fallback). ## How to Test 1. Run the focused tests: `pytest tests/test_trajectory_compressor.py -q`. 2. The new sync/async cases build a trajectory of gpt/tool pairs with an oversized middle gpt turn and choose a token target that forces the accumulation boundary to stop between a `<tool_call>` and its `<tool_response>`. They assert that `<tool_call>` and `<tool_response>` markers stay balanced after compression and that every kept `tool` turn is immediately preceded by a `gpt` turn (never the inserted summary or another tool turn). ## Checklist ### Code - [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md) - [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.) - [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate - [x] My PR contains only changes related to this fix/feature (no unrelated commits) - [x] I've run `pytest tests/ -q` and all tests pass - [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features) - [x] I've tested on my platform: macOS 15 (Darwin 25.5) ### Documentation & Housekeeping - [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A - [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A - [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A - [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A - [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A		2026-06-07 05:01:27 -07:00
..
acp	fix(acp): replace direct db._lock/_conn access with public update_session_meta()	2026-06-04 17:54:59 -07:00
acp_adapter	feat(azure-foundry): add Microsoft Entra ID auth	2026-05-18 10:14:38 -07:00
agent	fix(aux): honor model.default_headers on auxiliary client too (#40033 )	2026-06-07 02:02:40 -07:00
cli	Add /version slash command across CLI, gateway, TUI, and desktop.	2026-06-05 18:05:05 -07:00
cron	feat(cron): title cron sessions from the job, not the [IMPORTANT] hint	2026-06-06 12:51:12 -05:00
docker	feat(dashboard): always enable embedded chat; remove dashboard --tui flag	2026-06-04 03:03:35 -07:00
e2e	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
fakes
fixtures/plugins/example-dashboard/dashboard	feat(dashboard): nous-blue theme, bulk sessions, schedule picker (#37383 )	2026-06-02 12:37:40 -04:00
gateway	fix(simplex): accept display name in SIMPLEX_ALLOWED_USERS	2026-06-07 04:53:22 -07:00
hermes_cli	feat(dashboard): change UI font from the theme picker, independent of theme (#41145 )	2026-06-07 03:39:01 -07:00
hermes_state
honcho_plugin	test(honcho): de-flake prewarm smoke test's thread wait (#37614 )	2026-06-02 17:00:04 -07:00
integration	refactor(gateway): migrate Home Assistant adapter to bundled plugin	2026-06-06 11:46:24 -07:00
openviking_plugin	fix(openviking): add missing /agent/{agent}/ segment to memory URI — fixes #36969	2026-06-04 17:40:33 -07:00
plugins	fix(kimi): send thinking xor reasoning_effort, never both	2026-06-07 01:24:29 -07:00
providers	test(kimi): align stale parity/profile tests with thinking-xor-effort contract (#41095 )	2026-06-07 01:52:49 -07:00
run_agent	fix: harden gateway startup and turn persistence	2026-06-07 02:15:23 -07:00
scripts	feat(acp-registry): switch to uvx distribution, drop npm launcher	2026-05-14 22:27:09 -07:00
skills	fix(google-workspace): fall back to uv when venv has no pip (#39516 )	2026-06-05 13:30:02 +10:00
stress	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
tools	test(approval): regression for shell-escape denylist bypass (#36846 , #36847 )	2026-06-07 03:57:21 -07:00
tui_gateway	fix(desktop): scope in-session /model switch per-session, stop process-env leak (#41120 )	2026-06-07 02:33:28 -07:00
website	feat(skills): fix browse cap, add source links + copy buttons + category cleanup (#37143 )	2026-06-01 19:52:28 -07:00
__init__.py
conftest.py	fix: batch of small robustness/correctness fixes from @kyssta-exe	2026-06-01 19:51:03 -07:00
run_interrupt_test.py
test_account_usage.py
test_atomic_replace_symlinks.py
test_base_url_hostname.py
test_batch_runner_checkpoint.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_bitwarden_secrets.py	fix(bitwarden): prevent zip-slip path traversal when extracting bws binary (#40569 )	2026-06-06 18:33:44 -07:00
test_cli_file_drop.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_cli_manual_compress.py	fix(tests): catch up six stale tests after compression/aux/kanban changes (#28465 )	2026-05-18 21:43:59 -07:00
test_cli_skin_integration.py
test_ctx_halving_fix.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_desktop_mac_entitlements.py	test(desktop): assert macOS device entitlements are inherited	2026-06-03 07:32:00 +07:00
test_docker_home_override_scripts.py	feat(dashboard): always enable embedded chat; remove dashboard --tui flag	2026-06-04 03:03:35 -07:00
test_docker_stage2_browser_discovery.py	fix(docker): discover Playwright headless_shell browser (#35717 )	2026-06-01 16:06:44 +10:00
test_dockerfile_tini_compat_shim.py	fix(docker): add /usr/bin/tini compatibility shim for legacy wrappers (#34192 ) (#34382 )	2026-06-01 13:32:55 +10:00
test_empty_model_fallback.py	test(models): guard Nous silent default against expensive-flagship escalation	2026-06-05 02:54:34 -07:00
test_env_loader_secret_sources.py	fix(secrets): only apply external secrets once per HERMES_HOME per process (#32271 )	2026-05-25 15:18:55 -07:00
test_evidence_store.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_gateway_streaming_nested_config.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_get_tool_definitions_cache_isolation.py
test_hermes_bootstrap.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_hermes_constants.py	fix(constants): use windows native default hermes home	2026-06-03 19:37:29 -07:00
test_hermes_home_profile_warning.py
test_hermes_logging.py	fix(gateway): tolerate Unicode in stderr log handlers on Windows	2026-06-06 19:57:44 -07:00
test_hermes_state.py	fix(cron): bound the desktop run-history query to one job (#41088 )	2026-06-07 02:41:01 -07:00
test_hermes_state_compression_locks.py	fix(compression): prevent session-id fork from concurrent compressions (#34351 )	2026-05-28 21:40:39 -07:00
test_hermes_state_wal_fallback.py	fix(kanban): skip redundant WAL pragma on already-WAL connections	2026-05-27 14:31:55 -07:00
test_honcho_client_config.py	fix(honcho): harden self-hosted setup paths	2026-05-29 22:29:48 -07:00
test_honcho_session_context.py	fix(honcho): align user context peer perspective	2026-05-27 10:49:33 -07:00
test_honcho_startup_fail_open.py	fix: make Honcho startup fail open	2026-06-01 20:13:42 -07:00
test_install_sh_browser_install.py
test_install_sh_pythonpath_sanitization.py
test_install_sh_root_fhs_uv_python_path.py	test(install): harden uv-python-path regression test against future drift	2026-05-27 13:55:51 -07:00
test_install_sh_setup_wizard_tty_probe.py
test_install_sh_symlink_stomp.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_install_sh_termux_network_prereqs.py
test_ipv4_preference.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_lazy_session_regressions.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_lint_config.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_live_system_guard_self_test.py
test_mcp_serve.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_minisweagent_path.py
test_model_picker_scroll.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_model_tools.py	feat(middleware): add adaptive execution intercepts	2026-06-03 11:22:06 -07:00
test_model_tools_async_bridge.py	fix(web): run URL SSRF checks off the event loop in async paths	2026-06-04 18:04:47 -07:00
test_ollama_num_ctx.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_output_cap_parsing.py	fix(stream+output-cap): guard empty streams and parse OpenRouter output-cap errors (#40589 )	2026-06-07 03:52:09 -07:00
test_package_json_lazy_deps.py
test_packaging_metadata.py	fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383 )	2026-06-03 12:00:27 -07:00
test_plugin_skills.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_process_loop_event_loop_warning.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_project_metadata.py	fix(deps): exclude dev tooling from all extra	2026-06-04 08:54:38 -07:00
test_retry_utils.py
test_run_tests_parallel.py	test: use subprocesses for each test file (#29016 )	2026-05-21 16:40:04 +05:30
test_sanitize_tool_error.py
test_sql_injection.py
test_subprocess_home_isolation.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_termux_all_extra_compat.py	fix: add termux-all install profile and safe fallbacks	2026-05-07 13:04:08 -07:00
test_timezone.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_toolset_distributions.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_toolsets.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_trajectory_compressor.py	fix(research): keep tool_call/tool_response pairs intact when compressing trajectories	2026-06-07 05:01:27 -07:00
test_trajectory_compressor_async.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_transform_llm_output_hook.py
test_transform_tool_result_hook.py	test: stub has_hook in transform_tool_result hook tests	2026-06-03 06:36:46 -07:00
test_tui_gateway_server.py	fix(desktop): scope in-session /model switch per-session, stop process-env leak (#41120 )	2026-06-07 02:33:28 -07:00
test_utils_truthy_values.py
test_wheel_locales_e2e.py	fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383 )	2026-06-03 12:00:27 -07:00
test_yuanbao_integration.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_yuanbao_markdown.py
test_yuanbao_pipeline.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00
test_yuanbao_proto.py	chore: prune unused imports and duplicate import redefinitions	2026-05-28 22:26:25 -07:00